Run the command "enju", and the parser starts reading data files and waits for your input.
% enju Enju 2.0 by Yusuke Miyao and Tsujii Lab., Tokyo Univ. Loading grammar module "enju/grammar"... done. Loading FOM module "enju/synmodel"... done. Loading parser module "up/pcky"... done. Loading application module "enju/outputdep"... done. Initializing parser... Loading stemming database: /usr/local/share/liblilfes/enju/DATA/Enju.dict Loading grammar database: /usr/local/share/liblilfes/enju/DATA/Enju.lexicon /usr/local/share/liblilfes/enju/DATA/Enju.templates Initializing external tagger: uptagger Loading Unigram FOM model: /usr/local/share/liblilfes/enju/DATA/Enju-lex.output Loading Syntax FOM model: /usr/local/share/liblilfes/enju/DATA/Enju-syn.output done. Ready
Input a sentence in a line, and you will get the parse result in the standard output. The following example is an output for the sentence "Enju is an efficient HPSG parser."
ROOT ROOT ROOT ROOT -1 ROOT is be VBZ VB 1 is be VBZ VB 1 ARG1 Enju enju NNP NNP 0 is be VBZ VB 1 ARG2 parser parser NN NN 5 an an DT DT 2 ARG1 parser parser NN NN 5 efficient efficient JJ JJ 3 ARG1 parser parser NN NN 5 HPSG hpsg NNP NNP 4 MOD parser parser NN NN 5
The output is a set of dependencies between words. Each line represents one dependency, and an empty line shows the end of the sentence. Columns of a line are separated with tabs, and express the following information.
The position of a word is represented with an integer starting from zero. In the example, the position of "Enju" is 0, "is" is 1, ... and "parser" is 6. Words whose POS is "." (e.g. "." and "?") are ignored.
The label of a relation is represented with one of "MOD", "ARG1", ..., and "ARG5". "ARG1" is for the subject of a verb, the target of modification by modifiers (such as modifiers and prepositions), etc. "ARG2" represents the object of verbs, prepositions, etc. The other "ARGx" represents objects and complements of verbs, etc. "MOD" represents the modifiee of noun-noun modification and the matrix verb in participle constructions.
The first line represents the root predicate of the sentence. In this line, the head is represented as "ROOT" and the label of the relation is also represented as "ROOT". If the argument of a predicate is missing (for example, a logical subject in a passive expression without "by" phrase), it is shown as "UNKNOWN". If parsing fails, the parser shows "Parsing failure" and its reason.
Enju supports other output formats. When you specify "-s" option, Enju outputs predicate-argument relations in a simple format. In the simple format, auxiliaries and determiners are not output. Prepositions are output as the label of a relation. For example, "a book on the table" is output as "PP_on book table."
ARG1 be/VB(1) enju/NNP(0) ARG2 be/VB(1) parser/NN(5) ARG1 efficient/JJ(3) parser/NN(5) MOD parser/NN(5) hpsg/NNP(4)
Enju supports the output in XML and stand-off XML formats. The parse results are output in the XML format when specifying "-xml" option, while in the stand-off XML format with "-so" option. These format represents not only dependeices of words but also phrase structures.
In the XML format, phrase structure and predicate-argument structure are printed with XML tags and their attributes. The structure of a sentence is shown in a line. The following example is the output of parsing "Enju is an efficient HPSG parser." (the actual output is in one line).
<phrase cat="s" head="4" id="0"><phrase cat="np" head="5" id="10"><phrase cat="np" head="5" id="12"><word pos="NNP" base="enju" surf="enju" id="5">Enju</word></phrase> </phrase><phrase cat="vp" head="4" id="13"> <phrase cat="vp" head="4" id="14"><word pos="VBZ" base="be" surf="is" id="4" arg1="10" arg2="15">is</word></phrase><phrase cat="np" head="6" id="15"> <phrase cat="dt" head="7" id="18"><word pos="DT" base="an" surf="an" id="7" arg1="15">an</word></phrase><phrase cat="np" head="6" id="19"> <phrase cat="aj" head="8" id="22"><word pos="JJ" base="efficient" surf="efficient" id="8" arg1="15"> efficient</word></phrase><phrase cat="np" head="6" id="23"> <phrase cat="np" head="9" id="27"><word pos="NNP" base="hpsg" surf="hpsg" id="9" mod="15">HPSG</word></phrase> <phrase cat="np" head="6" id="28"><word pos="NN" base="parser" surf="parser" id="6">parser </word></phrase></phrase></phrase></phrase></phrase></phrase>.
Phrase structures are represented with <phrase>. A constituent is bracketed by <phrase>, and the attribute "cat" represents the phrase symbol of the constituent. For example, a noun phrase, "HPSG parser", is represented as "<phrase cat="np">HPSG parser</phrase>". Phrase symbols are listed below.
s | sentence (including interrogatives, etc.) |
vp | verb phrase |
np | noun phrase |
dt | specifier phrase (determiners, quantifiers, etc.) |
aj | adjective phrase |
av | adverbial phrase |
pp | prepositional phrase |
pl | participle |
pu | punctuation |
cm | comma |
cj | coorinate conjunction |
cp | complementizer phrase |
sc | subordinate conjunction |
Each word is bracketed by <word>. The attributes "pos" and "base" represent a part-of-speech and a base form.
ID numbers (unique in a sentence) are assigned to all "phrase" and "word". ID numbers are represented with the attribute "id". The tags "phrase" include the attributes "head", which represent the head daughter of the phrase.
Predicate-argument dependencies of words are represented with the attributes "mod", "arg1", ..., "arg5" in "word". A predicate word has some of the above attributes, each of which represent the ID number of an argument phrase. In the above example, the "word" tag for "is" has arg1="10" arg2="15", and they represent the ID numbers of "Enju" and "an efficient HPSG parser", respectively.
In the stand-off format, the span of each tag is represented with the position in the original input sentence. Each line represents a tag. An empty line indicates the end of a sentence. The above XML-format output is represented with the following stand-off format.
STDIN 0 4 word pos="NNP" base="enju" surf="enju" id="5" STDIN 0 4 phrase cat="np" head="5" id="12" STDIN 0 4 phrase cat="np" head="5" id="10" STDIN 5 7 word pos="VBZ" base="be" surf="is" id="4" arg1="10" arg2="15" STDIN 5 7 phrase cat="vp" head="4" id="14" STDIN 8 10 word pos="DT" base="an" surf="an" id="7" arg1="15" STDIN 8 10 phrase cat="dt" head="7" id="18" STDIN 11 20 word pos="JJ" base="efficient" surf="efficient" id="8" arg1="15" STDIN 11 20 phrase cat="aj" head="8" id="22" STDIN 21 25 word pos="NNP" base="hpsg" surf="hpsg" id="9" mod="15" STDIN 21 25 phrase cat="np" head="9" id="27" STDIN 26 32 word pos="NN" base="parser" surf="parser" id="6" STDIN 26 32 phrase cat="np" head="6" id="28" STDIN 21 32 phrase cat="np" head="6" id="23" STDIN 11 32 phrase cat="np" head="6" id="19" STDIN 8 32 phrase cat="np" head="6" id="15" STDIN 5 32 phrase cat="vp" head="4" id="13" STDIN 0 32 phrase cat="s" head="4" id="0"
Elements of a line are seperated with tabs. The first colum represents the name of an input file. In this case, we are using the standard input and "STDIN" is printed. The second and the third columns represent the start and the end position, respectively. The last represents the content of a tag. The label of a tag ("phrase" or "word") is output first, and the rest represents the attributes.
You can also browse parse results with GUI. For details, see "Browsing parse results with GUI" in "LiLFeS modules" section.
By writing LiLFeS programs by yourself, you can format the output of parsing as you like. The dependencies and XML outputs described above are actually formatted by the LiLFeS programs (outputdep.lil, outputxml.lil). For details, see "Advanced usage".
Enju accepts the following options and command-line arguments.
enju [options] [-a arguments] | |
Arguments following "-a" are passed to LiLFeS programs as command-line arguments. | |
Options | |
-h | Show help message |
-D directory | Specify a directory of grammar files |
-L directory | Specify a directory of LiLFeS modules (the directory is added to the beginning of "LILFES_PATH".) |
-t tagger | Specify a POS tagger |
-d | Output in dependency format |
-s | Output in simple format |
-xml | Output in XML format |
-so | Output in stand-off format |
-cgi | Start CGI server |
-moriv | Start MoriV server |
-W number | Limit number of words |
-E number | Limit number of edges |
-l module | Load LiLFeS program |
-e command | Execute LiLFeS command |
-i | Go into interactive mode (show lilfes prompt) |
-n | Non-interactive mode |
For details of the CGI/MoriV server, see "Browsing parse results with GUI" in "LiLFeS modules" section.
When LiLFeS modules are specified with "-l", the modules are loaded to the parser. If LiLFeS commands are specified with "-e", Enju executes the specified lilfes commands. After the execution of the commands, Enju runs programs for dependency-format or XML-format as described above when "-d", "-xml", etc. are specified. If no options of output format are specified, Enju does nothing. Next, if "-i" is not specified, the execution of Enju finishes. With the "-i" option, Enju shows a lilfes prompt and waits for the input of lilfes programs. "Ctrl-D" ends the interactive mode.
When you have installed grammar data and/or LiLFeS modules in non-default directories, you need to set the following environment variables to tell Enju the installation directories. Environment variables are overwritten by command-line arguments.
Variable | Description |
---|---|
ENJU_DIR | Specify the directory of grammar data files |
ENJU_TAGGER | Specify a POS tagger |
LILFES_PATH | Specify search paths of LiLFeS modules |