How to use a grammar

Japanese version

This chapter explains the method of parsing with a grammar developed with the MAYZ toolkit.


How to use "UP"

While the MAYZ toolkit supports the development of a lexicon and templates, we need a parser for the parsing of sentences with a grammar developed by MAYZ. The package of MAYZ includes "UP", an efficient general-purpose parser for unification-based grammars. With implements several interfaces required by UP, you can parse sentences with the developed grammar.

To use UP, interfaces for accessing a grammar and probabilistic models must be implemented. The interfaces are defined in "mayz/parser.lil".

The interfaces of UP at least required for parsing are as follows. Grammar writers need to implement all of them.

sentence_to_word_lattice(+$Input, -$WordLattice)
$Inputinput sentence
$WordLatticelist of extent
Splits an input sentence $Input into words, and returns a word lattice $WordLattice.
lexical_entry(+$Word, -$LexName)
$Wordinput word
$LexNamename of a lexical entry
Returns the name of a lexical entry assigned to $Word. A word can have multiple $LexName.
lexical_entry_sign(+$LexName, -$Sign)
$LexNamename of a lexical entry
$Signsign of a lexical entry
Returns the sign of a lexical entry. A unique sign must be assigned to $LexName.
id_schema_unary(+$SchemaName, +$Dtr, -$Mother, -$DCP)
$SchemaNameschema name
$Dtrsign of the daughter
$Mothersign of the mother
$DCPLiLFeS program executed after schema application
Applies a unary schema. If your grammar does not require unary rules, this need not be implemented.
id_schema_binary(+$SchemaName, +$Left, +$Right, -$Mother, -$DCP)
$SchemaNameschema name
$Leftsign of the left daughter
$Rightsign of the right daughter
$Mothersign of the mother
$DCPLiLFeS program executed after schema application
Applies a binary schema.
root_sign($Sign)
$Signsign of the root node
Condition of a root node.
reduce_sign(+$InSign, -$OutSign, -$SignPlus)
$InSignthe sign of the mother of schema application
$OutSigna reduced sign
$SignPlusinformation removed from $OutSign
This predicate is applied to the mother sign after the success of schema application. In a following process of parsing, $OutSign is used instead of $InSign. By removing unnecessary information from $InSign (e.g. daughter structures), equivalent $OutSigns are factored and regarded as a unique sign in the following process. $SignPlus can have the information removed from the sign, and it is stored in SIGN_PLUS of 'edge_link'.

"mayz/sample_hpsg.lil" is an example grammar of HPSG and includes a sample implementation of the above interfaces.

Since the above interfaces do not have access to probabilistic models, a parser cannot invoke disambiguation. If you use UP with the grammar with the above interfaces only, run UP with the option "-nofom". For example, when you use "mayz/sample_hpsg.lil", run the following command.

% up -i -nofom -l mayz/sample_hpsg

When you need disambiguation, the following interfaces must be implemented. With implementing the followings, UP computes figures-of-merit (FOM) during parsing, and we can obtain the best analysis using 'best_fom_sign/2' etc. Since FOMs are summed up, log-probabilities should be used when you apply probabilistic models.

fom_root(+$Sign, -$FOM)
$Signsign of the root node
$FOMFOM of the root node
Returns FOM of the root node.
fom_binary(+$RuleName, +$LeftDtr, +$RightDtr, +$MotherSign, +$SignPlus, -$FOM)
$RuleNameschema name
$LeftDtrsign of the left daughter
$RightDtrsign of the right daughter
$MotherSignsign of the mother
$SignPlus3rd argument of 'reduce_sign/3'
$FOMFOM
Returns FOM of binary schema application.
fom_unary(+$RuleName, +$Dtr, +$MotherSign, +$SignPlus, -$FOM)
$RuleNameschema name
$Dtrsign of the daughter
$MotherSignsign of the mother
$SignPlus3rd argument of 'reduce_sign/3'
$FOMFOM
Returns FOM of unary schema application.
fom_terminal(+$LexName, +$Sign, +$SignPlus, -$FOM)
$LexNameLEX_NAME (the second argument of 'lexical_entry/3')
$Signsign of a lexical entry
$SignPlus3rd argument of 'reduce_sign/3'
$FOMFOM
Returns FOM of a terminal sign.
fom_lexical entry(+$Word, +$LexName, -$FOM)
$Wordword
$LexNameLEX_NAME (the second argument of 'lexical_entry/3')
$FOMFOM
Returns FOM of a lexical entry

When you use UP with the grammar with the above interfaces, run UP with the option "-fom" or "-iter". For example, when the grammar file is "mygrammar.lil", execute the following command.

% up -i -iter -l mygrammar

See the manual of UP for other functions of UP.


How to use a lexicon and templates

MAYZ provides functions only for getting a lexicon and templates from a database. Grammar developers are supposed to implement the interfaces of UP. For details, see "How to use UP".

MAYZ provides the following tools for accessing the databases of a lexicon and templates. They are implemented in "mayz/grammar.lil". MAYZ also provides a tool for employing an external tagger.

import_lexicon($LexFile, $TemplateFile)
$LexFilefile name of a lexicon
$TemplateFilefile name of a template database
Imports a lexicon and a template database.
lookup_lexicon(+$Word, -$TempNameList)
$Worda feature structure representing a "word"
$TempNameLista list of lex_template
Returns a list of template names assigned to a word by looking up a lexicon.
lookup_template(+$TempName, -$Template)
$TempNamelex_template
$Templatea feature structure
Returns a feature structure of a lexical entry template by looking up a template database.

To use the above tools, you need to implement the following interfaces.

lexicon_lookup_key(+$Word, -$Key)
$Worda feature structure representing a "word"
$Keya key for looking up a lexicon
Given a feature structure representing a "word" (an element of the list returned by 'sentence_to_word_lattice/2'), this interface returns a key for looking up a lexicon (corresponding to the third argument of 'inverse_lexical_rule/5' and the fourth argument of 'lexical_rule/5').
unknown_word_lookup_key(+$Word, -$Key)
$WordA feature structure representing a "word"
$Keya key for looking up a lexicon
Given a feature structure representing a "word", this interface returns a key for looking up a lexicon for an unknown word.

When making a lexical entry in 'lexical_entry/2' and 'lexical_entry_sign/2', the tools "lookup_lexicon/2" and "lookup_template/2" will be used.

How to use a probabilistic model

Probabilistic models developed using unimaker, or forestmaker can be used as a figure-of-merit (FOM) model in UP. MAYZ provides a parser, mayzup, specialized for the probabilistic models developed with MAYZ. This parser provides builtin-predicates for computing FOM (log probability) using interfaces used in the development of probabilistic models, i.e., extract_XXX_event and feature_mask/3.

The following predicates are provided only in mayzup.

init_amis_model(+$ModelName, +$ModelFile)
$ModelNamemodel name
$ModelFilename of the parameter file
Initializes a model with reading parameters from $ModelFile, and also incorporates corresponding feature_masks.
delete_amis_model(+$ModelName)
$ModelNamemodel name
delete a model created by 'init_amis_model/2'.
amis_event_weight(+$ModelName, +$Category, +$Event, -$FOM)
$ModelNamemodel name
$Categorycategory name
$Eventevent (list of strings)
$FOMFOM of the event (log probability)
Returns FOM (log probability) of the event represented as a list of strings. 'feature_mask/3' of the category $Category is used.
amis_log_probability(+$ModelName, +$Category, +$EventList, -$FOM)
$ModelNamemodel name
$Categorycategory name
$EventListlist of events (list of lists of events)
$FOMlist of FOMs
Computes a weight of each event in $EventList, and computes its probability by normalizing weights.

FOM of an event can be computed using the above built-in predicates. Computed FOMs are passed to a parser using the interfaces introduced in "How to use UP".

The usage of mayzup is almost the same as up. For example, when you use "mygrammar.lil", run the following command.

% mayzup -i -iter -l mygrammar

MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)