Advanced usage

Japanese version

This section introduces the advanced usage of Enju.


Access to parse results

Enju uses UP, which is an efficient parser for unification-based grammars. UP is included in the MAYZ toolkit. Several interface in UP allows for getting the access to various information on parse results. For example, you can obtain HPSG signs, time required for parsing, and the number of edges. By writing LiLFeS programs by yourself, you can get your own output of parsing.

In fact, the default output of Enju (dependencies and XML outputs) are computed by LiLFeS programs. The source programs are provided in the package ("enju/grammar/{outputdep.lil,outputxml.lil}"), see these files for details. The CGI for GUI browsing of parse results are also written in LiLFeS (see "enju/grammar/moriv.lil").

For details of UP, see the manual of UP.


Making grammar from scratch

The source package of Enju includes programs for making a grammar and probabilistic models from the Penn Treebank. By modifying the programs, users can improve or extend the grammar. The rebuilt of the grammar and probabilistic models require a certain machine power and time (around one day with 2.2 GHz Xeon, 2 GByte memory).

The programs for grammar making exploits the MAYZ toolkit. See the manual of the toolkit for details. README introduces a brief overview of the source programs. Amis 3.0 or above is also required to be installed.

As input resources, you require ".mrg" files of Penn Treebank II (POS and tree structures are combined) and WordNet data files (index.noun, index.verb, noun.exc, verb.exc) for stemming. Put these files in "mayz/enju/Corpus/"

First, convert Penn Treebank data into the input format for the MAYZ toolkit. The toolkit supports the ".trees" format, where each line represents exactly one tree. A Perl script for converting ".mrg" files into ".trees" is provided ("tools/mrg2trees.prl"). For example, if you make a grammar using Section 02 of the Penn Treebank, run the following command in "mayz/enju/".

perl tools/mrg2trees.prl Corpus/02/*.mrg > Corpus/02.trees

By default, Makefile supposes "Corpus/02-21.trees" as an input of grammar construction.

Next, specify --with-enju-grmmar when you "configure" the MAYZ toolkit.

./configure --with-enju-grammar

With this option, Makefile includes the target to make a grammar. Run make in "mayz/enju/", the grammar and probabilistic models will be rebuilt.


Enju Manual Enju Home Page Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)