Enju - A practical HPSG parser

Developed at:
University of Tokyo, Department of Computer Science,
Tsujii laboratory

Version 2.1

Overview

Enju is an implementation of a parsing algorithm for probabilistic unification-based grammars. With a wide-coverage probabilistic HPSG grammar [1,2,3] and an efficient parsing algorithm [4,5], this parser can effectively analyze the syntactic/semantic structure of an English sentence and provide the user with predicate-argument relations among the words. Those outputs would be especially useful for high-level NLP applications including information extraction, automatic summarization, and question answering, where the "meaning" of a sentence plays a central role. The main features of the parser are the following:

includes a wide-coverage HPSG grammar and its probabilistic model (for details see this page).
uses an efficient parsing algorithm.
allows the user to customize the grammar.
accepts raw texts as well as tokenized texts with part-of-speech tags.

How to install Enju

Installation with RPM packages

1. Download the latest package for your particular platform from here (enju-X.X-PLATFORM.rpm). Currently, the following platforms are supported:

Fedora Core 3
Fedora Core 4
Turbolinux 8 for AMD 64

2. Install the package.


> su 

> rpm -i enju-X.X-PLATFORM.rpm

Installation with binary packages

1. Download the latest package for your particular platform from here (enju-X.X-PLATFORM.tar.gz). Currently, the following platforms are supported:

Fedora Core 3
Fedora Core 4
Turbolinux 8 for AMD 64

2. Install the package.

Expand the archive into a directory you would like to install Enju ($DIR indicates the directory in what follows).


> cd $DIR 

> tar xvzf enju-X.X-PLATFORM.tar.gz

3. Set environmetal variables.

If you are using bash,


> export LILFES_PATH=$DIR/share/liblilfes

> export ENJU_DIR=$DIR/share/liblilfes/enju

If you are using tcsh,


> setenv LILFES_PATH $DIR/share/liblilfes

> setenv ENJU_DIR $DIR/share/liblilfes/enju

Also, you need to make sure that $DIR/bin is included in the execution paths.

Installation with the source package

1. You need to install LiLFeS before installing Enju.

2. Download the latest package of Mayz from here.

3. Install Enju.


> tar xvzf mayz-x.x.tar.gz 

> cd mayz-x.x

> ./configure

> make install-enju

4. Download the grammar file from here.

5. Install the grammar data.


> tar xvzf enju-x.x-data.tar.gz 

> mv DATA /usr/local/share/liblilfes/enju/

How to use Enju

If you want to parse tokenized texts with part-of-speech tags,


> enju -t cat < TAGGEEDTEXT > RESULTS

Enju has a tokenizer and a general-purpose part-of-speech tagger, so if you want to parse raw texts (having one sentence per line),


> enju -t uptagger < RAWTEXT > RESULTS

The default output of the parser is a set of predicate-argument relations. Alternatively, you can get both the phrase structures and predicate-argument relations either in a quasi-XML format or in a standoff format.


> enju -t uptagger -xml < RAWTEXT > RESULTS

> enju -t uptagger -so < RAWTEXT > RESULTS

Enju also has a part-of-speech tagger specifically trained for biomedical texts such as MEDLINE abstracts. If you want to parse such texts,


> enju -t geniatagger < RAWTEXT > RESULTS

For further details on how to use Enju, see the manuals, which you can find in $DIR/share/liblilfes/enju/manual/ ($DIR is the directory where you have installed Enju).

Parsing samples

Unlike conventional parsers using CFGs, the default output of the parser is a set of predicate-argument relations, so the user can easily acquire the semantic relations among the words in an input sentence without the burden of analyzing its deep-syntactic structure.

The following is a set of parsing examples. Each line in the output represents a predicate-argument relation between two words. For instance, the second line in the first example indicates that there is an "ARG1 (logical subject)" relation between the predicate "runs" and the argument "he". Note that the same semantic relations holding among the three words "he", "run", and "company" are obtained from the sentences written in three different syntactic structures.

Sentence 1: "He runs the company."

Output

ROOT	ROOT	ROOT	ROOT	-1	ROOT	runs	run	VBZ	VB	1
runs	run	VBZ	VB	1	ARG1	He	he	PRP	PRP	0
runs	run	VBZ	VB	1	ARG2	company	company	NN	NN	3
the	the	DT	DT	2	MODIFY	company	company	NN	NN	3

Sentence 2: "The company is run by him."

Output

ROOT	ROOT	ROOT	ROOT	-1	ROOT	is	be	VBZ	VB	2
is	be	VBZ	VB	2	ARG1	company	company	NN	NN	1
is	be	VBZ	VB	2	ARG2	run	run	VBN	VB	3
run	run	VBN	VB	3	ARG1	him	him	PRP	PRP	5
run	run	VBN	VB	3	ARG2	company	company	NN	NN	1
The	the	DT	DT	0	MODIFY	company	company	NN	NN	1

Sentence 3: "The company that he runs is small."

Output

ROOT	ROOT	ROOT	ROOT	-1	ROOT	is	be	VBZ	VB	5
is	be	VBZ	VB	5	ARG1	company	company	NN	NN	1
is	be	VBZ	VB	5	ARG2	small	small	JJ	JJ	6
The	the	DT	DT	0	MODIFY	company	company	NN	NN	1
runs	run	VBZ	VB	4	ARG1	he	he	PRP	PRP	3
runs	run	VBZ	VB	4	ARG2	company	company	NN	NN	1

References

[1] MIYAO Yusuke, and TSUJII Jun'ichi. 2005. Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing. In Proceedings of ACL-2005, pp. 83-90.

[2] MIYAO Yusuke, NINOMIYA Takashi, TSUJII Jun'ichi. 2004. Corpus-oriented Grammar Development for Acquiring a Head-driven Phrase Structure Grammar from the Penn Treebank. In Proceedings of IJCNLP-04.

[3] MIYAO Yusuke, and TSUJII Jun'ichi. 2004. Probabilistic modeling of argument structures including non-local dependencies. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) 2003, pp. 285-291

[4] TSURUOKA Yoshimasa, MIYAO Yusuke, and TSUJII Jun'ichi. 2003. Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. In Proceedings of IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.

[5] NINOMIYA Takashi, TSURUOKA Yoshimasa, MIYAO Yusuke, and TSUJII Jun'ichi. 2005. Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing . In Proceedings of the 9th International Workshop on Parsing Technologies (IWPT 2005).

This page is maintained by Yoshimasa Tsuruoka