How to use Enju

Japanese version

Running Enju

Run the command "enju", and the parser starts reading data files and waits for your input.

% enju
Enju 2.0      by Yusuke Miyao and Tsujii Lab., Tokyo Univ.
Loading grammar module "enju/grammar"... done.
Loading FOM module "enju/synmodel"... done.
Loading parser module "up/pcky"... done.
Loading application module "enju/outputdep"... done.
Initializing parser...
  Loading stemming database: /usr/local/share/liblilfes/enju/DATA/Enju.dict
  Loading grammar database: /usr/local/share/liblilfes/enju/DATA/Enju.lexicon
/usr/local/share/liblilfes/enju/DATA/Enju.templates
  Initializing external tagger: uptagger
  Loading Unigram FOM model: /usr/local/share/liblilfes/enju/DATA/Enju-lex.output
  Loading Syntax FOM model: /usr/local/share/liblilfes/enju/DATA/Enju-syn.output
done.
Ready

Input a sentence in a line, and you will get the parse result in the standard output. The following example is an output for the sentence "Enju is an efficient HPSG parser."

ROOT    ROOT    ROOT    ROOT    -1      ROOT    is      be      VBZ     VB     1
is      be      VBZ     VB      1       ARG1    Enju    enju    NNP     NNP    0
is      be      VBZ     VB      1       ARG2    parser  parser  NN      NN     5
an      an      DT      DT      2       ARG1    parser  parser  NN      NN     5
efficient       efficient       JJ      JJ      3       ARG1    parser  parser NN       NN      5
HPSG    hpsg    NNP     NNP     4       MOD     parser  parser  NN      NN     5


Output format

The output is a set of dependencies between words. Each line represents one dependency, and an empty line shows the end of the sentence. Columns of a line are separated with tabs, and express the following information.

The position of a word is represented with an integer starting from zero. In the example, the position of "Enju" is 0, "is" is 1, ... and "parser" is 6. Words whose POS is "." (e.g. "." and "?") are ignored.

The label of a relation is represented with one of "MOD", "ARG1", ..., and "ARG5". "ARG1" is for the subject of a verb, the target of modification by modifiers (such as modifiers and prepositions), etc. "ARG2" represents the object of verbs, prepositions, etc. The other "ARGx" represents objects and complements of verbs, etc. "MOD" represents the modifiee of noun-noun modification and the matrix verb in participle constructions.

The first line represents the root predicate of the sentence. In this line, the head is represented as "ROOT" and the label of the relation is also represented as "ROOT". If the argument of a predicate is missing (for example, a logical subject in a passive expression without "by" phrase), it is shown as "UNKNOWN". If parsing fails, the parser shows "Parsing failure" and its reason.

Simple format

Enju supports other output formats. When you specify "-s" option, Enju outputs predicate-argument relations in a simple format. In the simple format, auxiliaries and determiners are not output. Prepositions are output as the label of a relation. For example, "a book on the table" is output as "PP_on book table."

ARG1    be/VB(1)        enju/NNP(0)
ARG2    be/VB(1)        parser/NN(5)
ARG1    efficient/JJ(3) parser/NN(5)
MOD     parser/NN(5)    hpsg/NNP(4)

XML format

Enju supports the output in XML and stand-off XML formats. The parse results are output in the XML format when specifying "-xml" option, while in the stand-off XML format with "-so" option. These format represents not only dependeices of words but also phrase structures.

In the XML format, phrase structure and predicate-argument structure are printed with XML tags and their attributes. The structure of a sentence is shown in a line. The following example is the output of parsing "Enju is an efficient HPSG parser." (the actual output is in one line).

<phrase cat="s" head="4" id="0"><phrase cat="np" head="5" id="10"><phrase cat="np"
head="5" id="12"><word pos="NNP" base="enju" surf="enju" id="5">Enju</word></phrase>
</phrase><phrase cat="vp" head="4" id="13"> <phrase cat="vp" head="4" id="14"><word
pos="VBZ" base="be" surf="is" id="4" arg1="10" arg2="15">is</word></phrase><phrase
cat="np" head="6" id="15"> <phrase cat="dt" head="7" id="18"><word pos="DT" base="an" surf="an"
id="7" arg1="15">an</word></phrase><phrase cat="np" head="6" id="19"> <phrase
cat="aj" head="8" id="22"><word pos="JJ" base="efficient" surf="efficient" id="8" arg1="15">
efficient</word></phrase><phrase cat="np" head="6" id="23"> <phrase cat="np" head="9"
id="27"><word pos="NNP" base="hpsg" surf="hpsg" id="9" mod="15">HPSG</word></phrase>
<phrase cat="np" head="6" id="28"><word pos="NN" base="parser" surf="parser" id="6">parser
</word></phrase></phrase></phrase></phrase></phrase></phrase>.

Phrase structures are represented with <phrase>. A constituent is bracketed by <phrase>, and the attribute "cat" represents the phrase symbol of the constituent. For example, a noun phrase, "HPSG parser", is represented as "<phrase cat="np">HPSG parser</phrase>". Phrase symbols are listed below.

ssentence (including interrogatives, etc.)
vpverb phrase
npnoun phrase
dtspecifier phrase (determiners, quantifiers, etc.)
ajadjective phrase
avadverbial phrase
ppprepositional phrase
plparticiple
pupunctuation
cmcomma
cjcoorinate conjunction
cpcomplementizer phrase
scsubordinate conjunction

Each word is bracketed by <word>. The attributes "pos" and "base" represent a part-of-speech and a base form.

ID numbers (unique in a sentence) are assigned to all "phrase" and "word". ID numbers are represented with the attribute "id". The tags "phrase" include the attributes "head", which represent the head daughter of the phrase.

Predicate-argument dependencies of words are represented with the attributes "mod", "arg1", ..., "arg5" in "word". A predicate word has some of the above attributes, each of which represent the ID number of an argument phrase. In the above example, the "word" tag for "is" has arg1="10" arg2="15", and they represent the ID numbers of "Enju" and "an efficient HPSG parser", respectively.

Stand-off format

In the stand-off format, the span of each tag is represented with the position in the original input sentence. Each line represents a tag. An empty line indicates the end of a sentence. The above XML-format output is represented with the following stand-off format.

STDIN   0       4       word pos="NNP" base="enju" surf="enju" id="5"
STDIN   0       4       phrase cat="np" head="5" id="12"
STDIN   0       4       phrase cat="np" head="5" id="10"
STDIN   5       7       word pos="VBZ" base="be" surf="is" id="4" arg1="10" arg2="15"
STDIN   5       7       phrase cat="vp" head="4" id="14"
STDIN   8       10      word pos="DT" base="an" surf="an" id="7" arg1="15"
STDIN   8       10      phrase cat="dt" head="7" id="18"
STDIN   11      20      word pos="JJ" base="efficient" surf="efficient" id="8" arg1="15"
STDIN   11      20      phrase cat="aj" head="8" id="22"
STDIN   21      25      word pos="NNP" base="hpsg" surf="hpsg" id="9" mod="15"
STDIN   21      25      phrase cat="np" head="9" id="27"
STDIN   26      32      word pos="NN" base="parser" surf="parser" id="6"
STDIN   26      32      phrase cat="np" head="6" id="28"
STDIN   21      32      phrase cat="np" head="6" id="23"
STDIN   11      32      phrase cat="np" head="6" id="19"
STDIN   8       32      phrase cat="np" head="6" id="15"
STDIN   5       32      phrase cat="vp" head="4" id="13"
STDIN   0       32      phrase cat="s" head="4" id="0"

Elements of a line are seperated with tabs. The first colum represents the name of an input file. In this case, we are using the standard input and "STDIN" is printed. The second and the third columns represent the start and the end position, respectively. The last represents the content of a tag. The label of a tag ("phrase" or "word") is output first, and the rest represents the attributes.

Others

You can also browse parse results with GUI. For details, see "Browsing parse results with GUI" in "LiLFeS modules" section.

By writing LiLFeS programs by yourself, you can format the output of parsing as you like. The dependencies and XML outputs described above are actually formatted by the LiLFeS programs (outputdep.lil, outputxml.lil). For details, see "Advanced usage".


Command-line arguments

Enju accepts the following options and command-line arguments.

enju [options] [-a arguments]
Arguments following "-a" are passed to LiLFeS programs as command-line arguments.
Options
-hShow help message
-D directorySpecify a directory of grammar files
-L directorySpecify a directory of LiLFeS modules (the directory is added to the beginning of "LILFES_PATH".)
-t taggerSpecify a POS tagger
-dOutput in dependency format
-sOutput in simple format
-xmlOutput in XML format
-soOutput in stand-off format
-cgiStart CGI server
-morivStart MoriV server
-W numberLimit number of words
-E numberLimit number of edges
-l moduleLoad LiLFeS program
-e commandExecute LiLFeS command
-iGo into interactive mode (show lilfes prompt)
-nNon-interactive mode

For details of the CGI/MoriV server, see "Browsing parse results with GUI" in "LiLFeS modules" section.

When LiLFeS modules are specified with "-l", the modules are loaded to the parser. If LiLFeS commands are specified with "-e", Enju executes the specified lilfes commands. After the execution of the commands, Enju runs programs for dependency-format or XML-format as described above when "-d", "-xml", etc. are specified. If no options of output format are specified, Enju does nothing. Next, if "-i" is not specified, the execution of Enju finishes. With the "-i" option, Enju shows a lilfes prompt and waits for the input of lilfes programs. "Ctrl-D" ends the interactive mode.


Environment variables

When you have installed grammar data and/or LiLFeS modules in non-default directories, you need to set the following environment variables to tell Enju the installation directories. Environment variables are overwritten by command-line arguments.

VariableDescription
ENJU_DIRSpecify the directory of grammar data files
ENJU_TAGGERSpecify a POS tagger
LILFES_PATHSpecify search paths of LiLFeS modules

Enju Manual Enju Home Page Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)