Version 2.1
Enju is an implementation of a parsing algorithm for probabilistic unification-based grammars. With a wide-coverage probabilistic HPSG grammar [1,2,3] and an efficient parsing algorithm [4,5], this parser can effectively analyze the syntactic/semantic structure of an English sentence and provide the user with predicate-argument relations among the words. Those outputs would be especially useful for high-level NLP applications including information extraction, automatic summarization, and question answering, where the "meaning" of a sentence plays a central role. The main features of the parser are the following:
> su
> rpm -i enju-X.X-PLATFORM.rpm
Expand the archive into a directory you would like to install Enju ($DIR indicates the directory in what follows).
> cd $DIR
> tar xvzf enju-X.X-PLATFORM.tar.gz
If you are using bash,
> export LILFES_PATH=$DIR/share/liblilfes
> export ENJU_DIR=$DIR/share/liblilfes/enju
If you are using tcsh,
> setenv LILFES_PATH $DIR/share/liblilfes
> setenv ENJU_DIR $DIR/share/liblilfes/enju
Also, you need to make sure that $DIR/bin is included in the execution paths.
> tar xvzf mayz-x.x.tar.gz
> cd mayz-x.x
> ./configure
> make install-enju
> tar xvzf enju-x.x-data.tar.gz
> mv DATA /usr/local/share/liblilfes/enju/
If you want to parse tokenized texts with part-of-speech tags,
> enju -t cat < TAGGEEDTEXT > RESULTS
Enju has a tokenizer and a general-purpose part-of-speech tagger, so if you want to parse raw texts (having one sentence per line),
> enju -t uptagger < RAWTEXT > RESULTS
The default output of the parser is a set of predicate-argument relations. Alternatively, you can get both the phrase structures and predicate-argument relations either in a quasi-XML format or in a standoff format.
> enju -t uptagger -xml < RAWTEXT > RESULTS
> enju -t uptagger -so < RAWTEXT > RESULTS
Enju also has a part-of-speech tagger specifically trained for biomedical texts such as MEDLINE abstracts. If you want to parse such texts,
> enju -t geniatagger < RAWTEXT > RESULTS
For further details on how to use Enju, see the manuals, which you can find in $DIR/share/liblilfes/enju/manual/ ($DIR is the directory where you have installed Enju).
Unlike conventional parsers using CFGs, the default output of the parser is a set of predicate-argument relations, so the user can easily acquire the semantic relations among the words in an input sentence without the burden of analyzing its deep-syntactic structure.
The following is a set of parsing examples. Each line in the output represents a predicate-argument relation between two words. For instance, the second line in the first example indicates that there is an "ARG1 (logical subject)" relation between the predicate "runs" and the argument "he". Note that the same semantic relations holding among the three words "he", "run", and "company" are obtained from the sentences written in three different syntactic structures.
ROOT | ROOT | ROOT | ROOT | -1 | ROOT | runs | run | VBZ | VB | 1 |
runs | run | VBZ | VB | 1 | ARG1 | He | he | PRP | PRP | 0 |
runs | run | VBZ | VB | 1 | ARG2 | company | company | NN | NN | 3 |
the | the | DT | DT | 2 | MODIFY | company | company | NN | NN | 3 |
ROOT | ROOT | ROOT | ROOT | -1 | ROOT | is | be | VBZ | VB | 2 |
is | be | VBZ | VB | 2 | ARG1 | company | company | NN | NN | 1 |
is | be | VBZ | VB | 2 | ARG2 | run | run | VBN | VB | 3 |
run | run | VBN | VB | 3 | ARG1 | him | him | PRP | PRP | 5 |
run | run | VBN | VB | 3 | ARG2 | company | company | NN | NN | 1 |
The | the | DT | DT | 0 | MODIFY | company | company | NN | NN | 1 |
ROOT | ROOT | ROOT | ROOT | -1 | ROOT | is | be | VBZ | VB | 5 |
is | be | VBZ | VB | 5 | ARG1 | company | company | NN | NN | 1 |
is | be | VBZ | VB | 5 | ARG2 | small | small | JJ | JJ | 6 |
The | the | DT | DT | 0 | MODIFY | company | company | NN | NN | 1 |
runs | run | VBZ | VB | 4 | ARG1 | he | he | PRP | PRP | 3 |
runs | run | VBZ | VB | 4 | ARG2 | company | company | NN | NN | 1 |