LiLFeS modules

Japanese version

In addition to the tools explained above, MAYZ provides LiLFeS modules to support grammar development. These can be used by loading from (by "-l" option) or from a parser.


Marking head, argument, and modifier

"mayz/markhead.lil" is a program for annotating a head, argument, or modifier mark toeach node in a tree. By implementing several rules for marking, it automatically annotate marks to all nodes in a tree.

The interfaces for marking heads are as follows. The first two refer to the MOD feature, while the other referes to the SYM feature to determine heads.

head_tag(+$Tag)
The node will be marked as a head if the MOD feature includes $Tag.
nonhead_tag(+$Tag)
The node will be marked as a non-head (argument or modifier) if the MOD feature includes $Tag. Arguments and modifiers are distinguished by other rules.
head_table(+$Sym, +$Dir, +$SymList)
$SymSymbol of the parent node
$DirDirection of searching a head ("left" or "right")
$SymListList of symbols that should be marked as a head
When the symbol the parent node is $Sym, child nodes are searched in the direction $Dir (if "left", left-to-right, and "right", right-to-left), and the node labeled with the first element of $SymList is marked as a head. If the first element is not found in the child nodes, the node labeled with the next element is searched. If an element of $SymList is a list, the node labeled with a symbol in the list is marked as a head. If no symbol is found, the left most node is marked as a head if $Dir is "left", and the right most one if "right".

The following predicate marks heads in a parse tree using the above interfaces.

mark_head(+$Tree)
$Treeparse tree
Annotates a head mark in a parse tree using the following algorithm.
  • If one of the daughters is assigned "head", exit.
  • If one of the daughters is assigned a modifier tag specified in 'head_tag/1', mark the node as a head.
  • If a daughter is assigned a modifier tag specified in 'nonhead_tag/1', the node is ignored.
  • Determine a head according to 'head_table/3'.

The interfaces for marking modifiers and arguments are as follows. The program assumes that head marks are already assigned. The first two refer to the MOD feature, while the rests refer to the SYM feature.

argument_tag(+$Tag)
If the MOD feature includes $Tag, the node is marked as an argument.
modifier_tag(+$Tag)
If the MOD feature includes $Tag, the node is marked as a modifier.
head_argument_table(+$HeadSym, +$SymList)
$HeadSymsymbol of the head
$SymListlist of symbols
If the symbol of the head is $HeadSym, a sibling node is marked as an argument if its symbol is included in $SymList.
argument_table(+$Sym, +$SymList)
$Symsymbol of the mother
$SymListlist of symbols
If the symbol of the mother is $Sym, a sibling node is marked as an argument if its symbol is included in $SymList.
left_argument_table(+$Sym, +$SymList)
$Symsymbol of the mother
$SymListlist of symbols
If the symbol of the mother is $Sym, a sibling node is marked as an argument if the node is in the left of the head and its symbol is included in $SymList.
right_argument_table(+$Sym, +$SymList)
$Symsymbol of the mother
$SymListlist of symbols
If the symbol of the mother is $Sym, a sibling node is marked as an argument if the node is in the right of the head and its symbol is included in $SymList.

Using the above interface, the following predicate assigns argument or modifier marks to all nodes in a parse tree.

mark_modifier(+$Tree)
$Treeparse tree
Nodes in $Tree are marked as a modifier or a argument using the following algorithm.
  • If the node has a tag specified by 'argument_tag/1', it is marked as "argument".
  • If the node has a tag specified by 'modifier_tag/1', it is marked as "modifier".
  • Using 'head_argument_table/2', argument marks are assigned.
  • Using 'argument_table/2', argument marks are assigned.
  • Using 'left_argument_table/2', argument marks are assigned.
  • Using 'right_argument_table/2', argument marks are assigned.
  • All the remaining nodes are assigned "modifier".

The above predicate ignored the nodes already assigned some marks. This means that you can assign marks to exceptional constructions before using the above tools. User can also use the following interface for the marking of exceptional trees. The following interface is used when the above predicate try to assign a mark to each node.

mark_exceptional(+$Tree)
$Treeparse tree
A user marks $Tree.

Binarizing a tree

"mayz/binarizer.lil" provides a tool to binarize a tree annotated with head, modifier, and argument marks.

tree_binarize(+$Tree, -$BinTree)
$Treeinput tree
$BinTreebinarized tree
$Tree is binarized into $BinTree.

This predicate binarizes a tree where the head is centered and the right nodes of the head are in the lower part and the left ones are in the higher part. If you need an exceptional binarization strategy, the following interface can be used. It is called for each node in a tree.

binarizer_preprocess(+$Tree, -$BinTree)
$Treeinput tree
$BinTreebinarized tree

Pattern matching of trees

"mayz/treematch.lil" provides predicates for pattern matching of parse trees. It is useful when you use "treetrans" to convert parse trees. You can match and substitute parse trees using patten rules.

While ap pattern of a parse tree is represented with a feature structure representation of a parse tree (i.e., 'tree' type), you can additionally use 'tree_any' type. It matches with zero or more than zero parse trees. For example, the following pattern,

(tree &
 TREE_NODE\SYM\"S" &
 TREE_DTRS\[tree_any,
            (tree & TREE_NODE\SYM\"VP"),
            tree_any])
matches a tree in which the top node is labeled with "S" and it has at least one daughter labeled with "VP". It matches a tree even when the tree has more than zero daughters on the left and/or the right of the "VP" tree. The trees that matched with 'tree_any' are stored in the feature ANY_TREES\.

The following predicates are provided for the matching and the substitution of parse trees using patterns.

tree_match(+$Patten, +$Tree)
$Patternpattern on a parse tree ('tree' or 'tree_any')
$Treeinput parse tree ('tree')
Succeeds when the pattern matches with the parse tree.
> ?- tree_match((tree &
                 TREE_NODE\SYM\"SBAR" &
                 TREE_DTRS\[TREE_NODE\(SYM\"RB" & WORD\SURFACE\"rather"),
                            TREE_NODE\(SYM\"IN" & WORD\SURFACE\"than"),
                            TREE_NODE\(SYM\"NP")]),
                (tree &
                 TREE_DTRS\[tree_any & ANY_TREES\[_|_],
                            tree & TREE_NODE\(SYM\"IN" & WORD\SURFACE\"than"),
                            tree & TREE_NODE\HEAD_MARK\argument])).
yes
tree_substitution(+$OutPattern, -$OutTree)
$InPatternpattern on a parse tree ('tree' or 'tree_any')
$OutTreeoutput ('tree')
Convert a pattern on a parse tree (including 'tree_any') into an ordinary parse tree (without 'tree_any').
tree_subst(+$InPattern, +$OutPattern, +$InTree, -$OutTree)
$InPatternpattern on an input parse tree ('tree' or 'tree_any')
$OutPatternpattern on an output parse tree ('tree' or 'tree_any')
$InTreeinput parse tree (tree)
$OutTreeoutput (tree)
An input pattern is mathced with an input parse tree, and if it succeeds, the output pattern is converted into an output parse tree. That is, it is equivalent to the following operations.
tree_match($InPattern, $InTree),
tree_substitution($OutPattern, $OutTree).
See the manual of "treetrans" for an example.

Looking up a lexicon, and templates

"mayz/grammar.lil" provides tools for looking up a lexicon and template in databases.

import_lexicon(+$LexiconFile, +$TemplateFile)
$LexiconFilefile name of a lexicon
$TemplateFilefile name of a template database
Imports a lexicon and a template database.
lookup_lexicon(+$Word, -$TempNameList)
$Wordinput word
$TempNameListlist of lex_template
Looks up a lexicon, and return a list of template names.
lookup_template(+$TempName, -$Sign)
$TempNamelex_template
$Signfeature structure
Looks up a template in a template database, and returns a feature structure of a template.

To use lookup_lexicon/2, the following interface must be implemented to get a database key from an input word.

lexicon_lookup_key(+$Word, -$Key)
$Wordinput word
$Keykey of a lexicon database
unknown_word_lookup_key(+$Word, -$Key)
$Wordinput word
$Keykey of a lexicon database for an unknown word

Using an external tagger

"mayz/tagger.lil" provides tools for using an external tagger. The following predicates are used for the initialization and termination of an external tagger.

initialize_external_tagger(+$Name, +$Arguments)
$Namecommand name of a tagger (string)
$Argumentscommand-line arguments of a tagger (list of strings)
Initializes an external tagger.
terminate_external_tagger
Terminates an external tagger.
is_external_tagger_initialized
Succeeds if a tagger is already initialized.

After the initialization, the following predicates are used for turning on/off the tagger.

enable_external_tagger
Turns on the tagger.
disable_external_tagger
Turns off the tagger.
is_external_tagger_enabled
Succeeds if a tagger is turned on.

The following predicates passes an input sentence to a tagger, and the resulting string is returned.

external_tagger(+$Input, -$Output)
$Inputinput string
$Outputoutput string
When a tagger is turned on, $Input is passed to a tagger, and the output of the tagger is returned. When a tagger is off, $Input is just returned to $Output.

Browsing the process of tree transformation and grammar extraction

"mayz/morivtrans.lil" is a module for browsing the process of tree transformation (treetrans) and lexicon extraction (lexextract). Using a web browser supporting XHTML and XSLT (e.g. FireFox) or MoriV, you can browse tree structures and feature structures in the process of grammar development.

This module works as an HTTP server and a CGI. First, load this module together with modules for tree transformation and lexicon extraction.

% lilfes -l tree_transformation_module -l lexicon_extraction_module -l mayz/morivtrans
Next, invoke "cgi" command.
> ?- cgi.
Then, an HTTP server starts, and waits for a connection. From your browser, access to the 27109 port of "/cgi-lilfes/moriv?" of the host where you are running the lilfes.
http://server_host:27109/cgi-lilfes/moriv?

Input a Penn Treebank-style tree to the form, and press the "Input" button. You will see a menu in the lower-left area, and a parse tree in the lower-right area. You can browse trees and feature structures using the lower-left menu.


Browsing the results of parsing

"mayz/morivparser.lil" is a module for browsing the results of parsing with a grammar and a disambiguation model developed with MAYZ. Using a web browser supporting XHTML and XSLT (e.g. FireFox) or href="http://www-tsujii.is.s.u-tokyo.ac.jp/moriv/">MoriV, you can browse parse trees and signs of parse results.

To use this module, you need to implement the following interfaces in order to give a symbol to show a brief parse tree of a parse result. They are defined in "mayz/display.lil".

sign_label(+$Sign, -$Symbol)
$Signsign
$Symbolstring
Returns a symbol representing the sign.
lexname_label(+$LexName, -$Symbol)
$LexNameLEX_NAME (the 2nd argument of lexical_entry/3)
$Symbolstring
Returns a symbol representing LEX_NAME.
schema_edge_label_unary(+$SchemaName, -$Label)
$SchemaNameschema name
$Labeledge symbol
Returns a symbol assigned to the edge of unary schema application.
schema_edge_label_binary(+$SchemaName, -$LeftLabel, -$RightLabel
$SchemaNameschema name
$LeftLabelsymbol of the left edge
$RightLabelsymbol of the right edge
Returns symbols assigned to the edges of binary schema application.
schema_label(+$SchemaName, -$Label
$SchemaNameschema name
$Labelsymbol
Returns a symbol representing a schema name.
lex_template_label(+$LexTemplate, -$Label
$LexTemplatelex_template
$Labelsymbol
Returns a symbol representing a template name.
word_label(+$Word, -$Label)
$Wordword
$Labelsymbol
Returns a symbol representing a word.
extent_label(+$Extent, -$Label)
$Extentextent
$Labelsymbol
Returns a symbol representing an extent (an element of the 2nd argument of 'sentence_to_word_lattice/2').

This module works as an HTTP server and a CGI. When you run a parser, load "mayz/morivparser.lil", and execute the "cgi" command. For example, when you use "mayzup",

% mayzup -l grammar_module -l mayz/movirparser -e cgi
Then, an HTTP server starts, and waits for a connection. Using your browser, access to the 27109 port of "/cgi-lilfes/moriv?" of the host where lilfes is running.
http://server_host:27109/cgi-lilfes/moriv?

Enter a sentence in the form, and press the "Input" button. You will see the brief result of parsing and a menu in the lower-left area. You can browse parse trees and feature structures using the menu.


Browsing a parse chart

"mayz/morivchart.lil" is a module for browsing a parse chart (CKY table). Using a web browser supporting XHTML and XSLT (e.g. FireFox) or MoriV, you can brose internal parse results generated during parsing.

To use this module, you need to implement the interfaces for getting the symbols of parse trees. The interfaces are defined in "mayz/display.lil". For details, see Browsing the results of parsing.

When you run a parser, load "mayz/morivchart", and execute the "cgi" command to run an HTTP server. Then, access to the server using your browser. Enter a sentence in the form, and you will see the chart in the lower-left area. By clicking a chart cell, you will get the edges in the cell in the lower-right area.


Browsing lexical entries

"mayz/morivgrammar.lil" is a module for browsing a lexicon using a web browser supporting XHTML and XSLT (e.g. FireFox) or MoriV. You can browse a list lexical entries assigned to a word and their feature structures.

To use this module, you need to implement interfaces defined in "display.lil". For details, see Browsing the results of parsing.

When you run a parser, load "mayz/morivgrammar", and execute the "cgi" command to run an HTTP server. Then, access to the server using your browser. Enter a word/POS in the form, and you will see a list of lexical entries. Click the link in the list, and you will see the feature structure of a lexical entry in the lower-right frame.


Evaluating coverage

"mayz/coverage.lil" is a module to measure the coverage obtained by a grammar developed with MAYZ. Together with a grammar module, load "mayz/coverage.lil", and execute the following predicate.

eval_coverage(+$Lexbank, +$Lexicon, +$Template, +$OutputFile)
$Lexbankname of a lexbank used for the evaluation
$Lexiconfile name of a lexicon
$Templatesfile name of a template database
$OutputFilefile name of outputting results
For the evaluation of coverage, a lexbank of an unseen corpus is used. Before the evaluation, you need to make a lexbank using "treetrans" and "lexextract".

Evaluating parse accuracy

"mayz/evalparse.lil" is a module for evaluating the accuracy of parsing with a grammar and a probabilistic model developed with MAYZ. By implementing an interface to measure the number of correct answers for a sentence, you can measure the accuracy for the whole test corpus.

For the evaluation, the following interface is required to be implemented.

eval_parse(+$Best, +$Correct, +$TermList, -$NumAnswers, -$NumOutputs, -$NumCorrects, -$NumPartials, -$Errors)
$Bestparse_tree output by a parser
$Correctcorrect parse_tree
$TermListlist of terminal nodes of a derivation (corresponding to a lexbank)
$NumAnswersNumber of answers
$NumOutputsNumber of outputs
$NumCorrectsNumber of exactly correct outputs
$NumPartialsNumber of partially correct outputs
$Errorslist of strings (each element is output to the result file)

When you run a parser, load "mayz/evalparse.lil", and execute the following predicate. The result of evaluation is output to a file.

eval_parse_file(+$Derivbank, +$OutputFile)
$Derivbankname of a derivbank
$OutputFilename of an output file
The accuracy of parsing is measured against $Derivbank, and the result is output to $OutputFile.

Store parse results in a database

"mayz/parseall.lil" is a LiLFeS module to store parse results into LiLFeS database (lildb). Each line of the input text is parsed, and the results are stored in a database. The key of the database is the line number of the input. If parsing fails, the result shows the reason of the failure with the type parse_error and its subtypes.

In this module, the following predicates are avaiable.

parse_all(+$Input, +$Output)
$InputName of input file
$OutputName of database
Parse each line of the input file $Input, and store the results in the database $Output.
parse_all(+$Output)
$OutputName of database
Parse each line of the standard input, and store the results in the database $Output.

MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)