Developed at:
University of Tokyo, Department of Computer Science,
Tsujii laboratory

Version 2.1

Overview

Enju is an implementation of a parsing algorithm for probabilistic unification-based grammars. With a wide-coverage probabilistic HPSG grammar [1,2,3] and an efficient parsing algorithm [4,5], this parser can effectively analyze the syntactic/semantic structure of an English sentence and provide the user with predicate-argument relations among the words. Those outputs would be especially useful for high-level NLP applications including information extraction, automatic summarization, and question answering, where the "meaning" of a sentence plays a central role. The main features of the parser are the following:

How to install Enju

Installation with RPM packages

1. Download the latest package for your particular platform from here (enju-X.X-PLATFORM.rpm). Currently, the following platforms are supported:

2. Install the package.

> su
> rpm -i enju-X.X-PLATFORM.rpm

Installation with binary packages

1. Download the latest package for your particular platform from here (enju-X.X-PLATFORM.tar.gz). Currently, the following platforms are supported:

2. Install the package.

Expand the archive into a directory you would like to install Enju ($DIR indicates the directory in what follows).

> cd $DIR
> tar xvzf enju-X.X-PLATFORM.tar.gz

3. Set environmetal variables.

If you are using bash,

> export LILFES_PATH=$DIR/share/liblilfes
> export ENJU_DIR=$DIR/share/liblilfes/enju

If you are using tcsh,

> setenv LILFES_PATH $DIR/share/liblilfes
> setenv ENJU_DIR $DIR/share/liblilfes/enju

Also, you need to make sure that $DIR/bin is included in the execution paths.

Installation with the source package

1. You need to install LiLFeS before installing Enju.

2. Download the latest package of Mayz from here.

3. Install Enju.

> tar xvzf mayz-x.x.tar.gz
> cd mayz-x.x
> ./configure
> make install-enju

4. Download the grammar file from here.

5. Install the grammar data.

> tar xvzf enju-x.x-data.tar.gz
> mv DATA /usr/local/share/liblilfes/enju/

How to use Enju

If you want to parse tokenized texts with part-of-speech tags,

> enju -t cat < TAGGEEDTEXT > RESULTS

Enju has a tokenizer and a general-purpose part-of-speech tagger, so if you want to parse raw texts (having one sentence per line),

> enju -t uptagger < RAWTEXT > RESULTS

The default output of the parser is a set of predicate-argument relations. Alternatively, you can get both the phrase structures and predicate-argument relations either in a quasi-XML format or in a standoff format.

> enju -t uptagger -xml < RAWTEXT > RESULTS
> enju -t uptagger -so < RAWTEXT > RESULTS

Enju also has a part-of-speech tagger specifically trained for biomedical texts such as MEDLINE abstracts. If you want to parse such texts,

> enju -t geniatagger < RAWTEXT > RESULTS

For further details on how to use Enju, see the manuals, which you can find in $DIR/share/liblilfes/enju/manual/ ($DIR is the directory where you have installed Enju).

Parsing samples

Unlike conventional parsers using CFGs, the default output of the parser is a set of predicate-argument relations, so the user can easily acquire the semantic relations among the words in an input sentence without the burden of analyzing its deep-syntactic structure.

The following is a set of parsing examples. Each line in the output represents a predicate-argument relation between two words. For instance, the second line in the first example indicates that there is an "ARG1 (logical subject)" relation between the predicate "runs" and the argument "he". Note that the same semantic relations holding among the three words "he", "run", and "company" are obtained from the sentences written in three different syntactic structures.

Sentence 1: "He runs the company."

Output

ROOT ROOT ROOT ROOT -1 ROOT runs run VBZ VB 1
runs run VBZ VB 1 ARG1 Hehe PRP PRP 0
runs run VBZ VB 1 ARG2 company company NN NN 3
the the DT DT 2 MODIFY company company NN NN 3

Sentence 2: "The company is run by him."

Output

ROOTROOTROOTROOT-1ROOT isbeVBZVB2
isbeVBZVB2ARG1company companyNNNN1
isbeVBZVB2ARG2run runVBNVB3
runrunVBNVB3ARG1him himPRPPRP5
runrunVBNVB3ARG2company companyNNNN1
ThetheDTDT0MODIFYcompany companyNNNN1

Sentence 3: "The company that he runs is small."

Output

ROOTROOTROOTROOT-1 ROOTisbeVBZVB5
isbeVBZVB5ARG1 companycompanyNNNN1
isbeVBZVB5ARG2 smallsmallJJJJ6
ThetheDTDT0MODIFY companycompanyNNNN1
runsrunVBZVB4ARG1 hehePRPPRP3
runsrunVBZVB4ARG2 companycompanyNNNN1

References

[1] MIYAO Yusuke, and TSUJII Jun'ichi. 2005. Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing. In Proceedings of ACL-2005, pp. 83-90.

[2] MIYAO Yusuke, NINOMIYA Takashi, TSUJII Jun'ichi. 2004. Corpus-oriented Grammar Development for Acquiring a Head-driven Phrase Structure Grammar from the Penn Treebank. In Proceedings of IJCNLP-04.

[3] MIYAO Yusuke, and TSUJII Jun'ichi. 2004. Probabilistic modeling of argument structures including non-local dependencies. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) 2003, pp. 285-291

[4] TSURUOKA Yoshimasa, MIYAO Yusuke, and TSUJII Jun'ichi. 2003. Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. In Proceedings of IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.

[5] NINOMIYA Takashi, TSURUOKA Yoshimasa, MIYAO Yusuke, and TSUJII Jun'ichi. 2005. Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing . In Proceedings of the 9th International Workshop on Parsing Technologies (IWPT 2005).


This page is maintained by Yoshimasa Tsuruoka