Enju - A fast, accurate, and deep parser for English

Developed at:
The University of Tokyo, Department of Computer Science,
Tsujii laboratory

Version 2.4.2 is available since Jun. 16th, 2011

Online demo is available!

Overview
How to download and install Enju
How to use Enju
Demo and web interface
Documentation
Publications

Overview

Enju is a syntactic parser for English. With a wide-coverage probabilistic HPSG grammar [1-7] and an efficient parsing algorithm [8-11], this parser can effectively analyze syntactic/semantic structures of English sentences and provide a user with phrase structures and predicate-argument structures. Those outputs would be especially useful for high-level NLP applications, including information extraction, automatic summarization, and question answering, where the "meaning" of a sentence plays a central role.

The main features of the Enju parser are:

Accurate deep analysis — the parser can output both phrase structures and predicate-argument structures. The accuracy of predicate-argument relations is around 90% for newswire articles and biomedical papers.
High speed — parsing speed is less than 500 msec. per sentence by default (faster than most Penn Treebank parsers), and less than 50 msec. when using the high-speed setting ("mogura").

Other useful features are:

Output parse results in an XML format: specify the option "-xml". The parser adds XML tags to an original text, and it is useful when parse results are merged with other processing results (e.g. named entities). A stand-off format is also available (specify "-so").
Use a parsing model for the biomedical domain: specify the option "-genia".
Use a parsing model for the literature domain: specify the option "-brown".
Use a supertagger: run "mogura -super"
Convert Enju XML output into Penn Treebank-style output [15,16]: run "enju2ptb/convert < ENJU_XML_OUTPUT > PTB_STYLE_OUTPUT"
Let a POS tagger output ambigous POS tags: specify the option "-A". Parsing accuracy improves, while parsing speed gets slower.
Output n-best parse results: specify the option "-N". This is an experimental function, and parsing speed gets slower.

For any inquiry, contact us.

How to download & install Enju

The source package and pre-trained models of Enju are available at GitHub

You can try Enju before download via online demo.

How to install Enju (on Linux or Mac OS)

See INSTALL in the source package.

How to install Enju (on Windows)

See INSTALL.win in the source package.

How to use Enju

To parse sentences, put a file (having one sentence per line) to the standard input. For example, when you have the file "RAWTEXT" that contains:


He runs the company.

The company that he runs is small.

Run the following command.


> enju < RAWTEXT > RESULTS

Parsing results are output to the file "RESULTS". "Demo and web interface" shows you some examples of parsing results.

You can alternatively use a high-speed parser by using the command "mogura"


> mogura < RAWTEXT > RESULTS

These commands work in mostly the same way.

When you want to parse texts already tagged with Penn Treebank-style POS tags,


> enju -nt < TAGGEDTEXT > RESULTS

The default output of the parser is a set of predicate-argument relations. Alternatively, you can get both the phrase structures and predicate-argument relations either in a quasi-XML format or in a stand-off format.


> enju -xml < RAWTEXT > RESULTS

> enju -so < RAWTEXT > RESULTS

You can also use Enju as a CGI server.


> enju -cgi PORT_NUMBER

You can access to the port PORT_NUMBER with a CGI query, and receive parsing results in the XML format.


http://localhost:PORT_NUMBER/cgi-lilfes/enju?sentence=he+runs+the+company

For further details on the output formats, see the manuals and the technical report.

Demo and web interface

Unlike conventional parsers using CFGs, the default output of the parser is a set of predicate-argument relations, so the user can easily acquire semantic relations among words in an input sentence without the burden of analyzing its deep-syntactic structure.

Parsing examples are shown below. Each line in the output represents a predicate-argument relation between two words. For instance, the second line in the first example indicates that there is an "ARG1 (logical subject)" relation between the predicate "run" and the argument "he". Note that the same semantic relations holding among the three words, "he", "run", and "company", are obtained from sentences written in different syntactic structures.

Sentence 1: He runs the company.

ROOT	ROOT	ROOT	ROOT	-1	ROOT	ROOT	runs	run	VBZ	VB	1
runs	run	VBZ	VB	1	verb_arg12	ARG1	He	he	PRP	PRP	0
runs	run	VBZ	VB	1	verb_arg12	ARG2	company	company	NN	NN	3
the	the	DT	DT	2	det_arg1	ARG1	company	company	NN	NN	3

Sentence 2: The company that he runs is small.

ROOT	ROOT	ROOT	ROOT	-1	ROOT	ROOT	is	be	VBZ	VB	5
is	be	VBZ	VB	5	verb_arg12	ARG1	company	company	NN	NN	1
is	be	VBZ	VB	5	verb_arg12	ARG2	small	small	JJ	JJ	6
small	small	JJ	JJ	6	adj_arg1	ARG1	company	company	NN	NN	1
The	the	DT	DT	0	det_arg1	ARG1	company	company	NN	NN	1
that	that	IN	IN	2	relative_arg1	ARG1	company	company	NN	NN	1
runs	run	VBZ	VB	4	verb_arg12	ARG1	he	he	PRP	PRP	3
runs	run	VBZ	VB	4	verb_arg12	ARG2	company	company	NN	NN	1

Enju can also output both phrase structures and predicate-argument structures in a quasi-XML format. The following pages show the phrase structure and the predicate argument structure for the sentence "It's falling like a stone, said Danny Linger, a pit trader who was standing outside the London International Financial Futures Exchange."

Note: Firefox shows a graphical view, while Internet Explorer shows a bare XML document.

The online demo is available to see how Enju works.

UIMA Web Interface for Enju is also available. You can embed Enju in UIMA workflows.

Documentation

Enju Manual (in English)
Enju Manual (in Japanese)
Enju Output Specifications (Details of the output formats)
Enju XML Format (Example-based explanation of the Enju XML format for various linguistic constructions)

Publications

[1] Yusuke Miyao and Jun'ichi Tsujii. 2002. Maximum Entropy Estimation for Feature Forests. In Proceedings of HLT 2002.

[2] Yusuke Miyao and Jun'ichi Tsujii. 2003. Probabilistic modeling of argument structures including non-local dependencies. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) 2003, pp. 285-291

[3] Yusuke Miyao, Takashi Ninomiya, and Jun'ichi Tsujii. 2004. Corpus-oriented Grammar Development for Acquiring a Head-driven Phrase Structure Grammar from the Penn Treebank. In Proceedings of IJCNLP-04.

[4] Yusuke Miyao and Jun'ichi Tsujii. 2005. Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing. In Proceedings of ACL-2005, pp. 83-90.

[5] Takashi Ninomiya, Takuya Matsuzaki, Yoshimasa Tsuruoka, Yusuke Miyao and Jun'ichi Tsujii. 2006. Extremely Lexicalized Models for Accurate and Fast HPSG Parsing. In Proceedings of EMNLP 2006.

[6] Takashi Ninomiya, Takuya Matsuzaki, Yusuke Miyao, and Jun'ichi Tsujii. 2007. A log-linear model with an n-gram reference distribution for accurate HPSG parsing. In Proceedings of IWPT 2007.

[7] Yusuke Miyao and Jun'ichi Tsujii. 2008. Feature Forest Models for Probabilistic HPSG Parsing. Computational Linguistics. 34(1). pp. 35--80, MIT Press.

[8] Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Tsujii. 2003. Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. In Proceedings of IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.

[9] Takashi Ninomiya, Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Tsujii. 2005. Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing . In Proceedings of IWPT 2005.

[10] Takashi Ninomiya, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura and Jun'ichi Tsujii. 2006. Fast and Scalable HPSG Parsing. Traitement automatique des langues (TAL). 46(2). Association pour le Traitement Automatique des Langues.

[11] Takuya Matsuzaki, Yusuke Miyao, and Jun'ichi Tsujii. 2007. Efficient HPSG Parsing with Supertagging and CFG-filtering. In Proceedings of IJCAI 2007.

[12] Tadayoshi Hara, Yusuke Miyao, and Jun'ichi Tsujii. 2005. Adapting a probabilistic disambiguation model of an HPSG parser to a new domain . In Proceedings of IJCNLP 2005.

[13] Tadayoshi Hara, Yusuke Miyao, and Jun'ichi Tsujii. 2007. Evaluating Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser. In Proceedings of IWPT 2007.

[14] Kenji Sagae, Yusuke Miyao, and Jun'ichi Tsujii. 2007. HPSG Parsing with Shallow Dependency Constraints. In Proceedings of ACL 2007.

[15] Takuya Matsuzaki and Jun'ichi Tsujii. 2008. Comparative Parser Performance Analysis across Grammar Frameworks through Automatic Tree Conversion using Synchronous Grammars. In Proceedings COLING 2008.

[16] Yusuke Miyao, Rune Saetre, Kenji Sagae, Takuya Matsuzaki, and Jun'ichi Tsujii. 2008. Task-Oriented Evaluation of Syntactic Parsers and Their Representations. In Proceedings of ACL-08:HLT.

Contents