Enju

Developed at:
The University of Tokyo, Department of Computer Science,
Tsujii laboratory

Version 2.4.2 is available since Jun. 16th, 2011

Online demo is available!

Japanese page

Contents

Overview

Enju is a syntactic parser for English. With a wide-coverage probabilistic HPSG grammar [1-7] and an efficient parsing algorithm [8-11], this parser can effectively analyze syntactic/semantic structures of English sentences and provide a user with phrase structures and predicate-argument structures. Those outputs would be especially useful for high-level NLP applications, including information extraction, automatic summarization, and question answering, where the "meaning" of a sentence plays a central role.

The main features of the Enju parser are:

Other useful features are:

For any inquiry, contact us.

How to download & install Enju

Binary packages of Enju 2.4.2 are avaiable at Tsujii Laboratory software download page. Currently, following packages are available for download.

You can try Enju before download via online demo. Please contact us when you need a binary package for other platforms, or a source package.

How to install Enju (on Linux or Mac OS)

1. Download the latest package for your particular platform (enju-X.Y-PLATFORM.tar.gz).

2. Untar the archive into a directory where you would like to install Enju.

> tar xvzf enju-X.Y-PLATFORM.tar.gz

Run "enju-X.Y/enju" to invoke enju.

How to install Enju (on Windows)

1. Download the latest package for Windows (enju-X.Y-win32.zip).

2. Unzip the archive into a directory where you would like to install Enju.

Run "enju-win/enju.bat" to invoke enju.

How to use Enju

To parse sentences, put a file (having one sentence per line) to the standard input. For example, when you have the file "RAWTEXT" that contains:

He runs the company.
The company that he runs is small.

Run the following command.

> enju < RAWTEXT > RESULTS

Parsing results are output to the file "RESULTS". "Demo and web interface" shows you some examples of parsing results.

You can alternatively use a high-speed parser by using the command "mogura"

> mogura < RAWTEXT > RESULTS

These commands work in mostly the same way.

When you want to parse texts already tagged with Penn Treebank-style POS tags,

> enju -nt < TAGGEDTEXT > RESULTS

The default output of the parser is a set of predicate-argument relations. Alternatively, you can get both the phrase structures and predicate-argument relations either in a quasi-XML format or in a stand-off format.

> enju -xml < RAWTEXT > RESULTS
> enju -so < RAWTEXT > RESULTS

You can also use Enju as a CGI server.

> enju -cgi PORT_NUMBER

You can access to the port PORT_NUMBER with a CGI query, and receive parsing results in the XML format.

http://localhost:PORT_NUMBER/cgi-lilfes/enju?sentence=he+runs+the+company

For further details on the output formats, see the manuals and the technical report.

Demo and web interface

Unlike conventional parsers using CFGs, the default output of the parser is a set of predicate-argument relations, so the user can easily acquire semantic relations among words in an input sentence without the burden of analyzing its deep-syntactic structure.

Parsing examples are shown below. Each line in the output represents a predicate-argument relation between two words. For instance, the second line in the first example indicates that there is an "ARG1 (logical subject)" relation between the predicate "run" and the argument "he". Note that the same semantic relations holding among the three words, "he", "run", and "company", are obtained from sentences written in different syntactic structures.

Sentence 1: He runs the company.

ROOTROOTROOTROOT-1ROOTROOTrunsrunVBZVB1
runsrunVBZVB1verb_arg12ARG1HehePRPPRP0
runsrunVBZVB1verb_arg12ARG2companycompanyNNNN3
thetheDTDT2det_arg1ARG1companycompanyNNNN3

Sentence 2: The company that he runs is small.

ROOTROOTROOTROOT-1ROOTROOTisbeVBZVB5
isbeVBZVB5verb_arg12ARG1companycompanyNNNN1
isbeVBZVB5verb_arg12ARG2smallsmallJJJJ6
smallsmallJJJJ6adj_arg1ARG1companycompanyNNNN1
ThetheDTDT0det_arg1ARG1companycompanyNNNN1
thatthatININ2relative_arg1ARG1companycompanyNNNN1
runsrunVBZVB4verb_arg12ARG1hehePRPPRP3
runsrunVBZVB4verb_arg12ARG2companycompanyNNNN1

Enju can also output both phrase structures and predicate-argument structures in a quasi-XML format. The following pages show the phrase structure and the predicate argument structure for the sentence "It's falling like a stone, said Danny Linger, a pit trader who was standing outside the London International Financial Futures Exchange."

Note: Firefox shows a graphical view, while Internet Explorer shows a bare XML document.

The online demo is available to see how Enju works.

UIMA Web Interface for Enju is also available. You can embed Enju in UIMA workflows.

Documentation

Publications

[1] Yusuke Miyao and Jun'ichi Tsujii. 2002. Maximum Entropy Estimation for Feature Forests. In Proceedings of HLT 2002.

[2] Yusuke Miyao and Jun'ichi Tsujii. 2003. Probabilistic modeling of argument structures including non-local dependencies. In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) 2003, pp. 285-291

[3] Yusuke Miyao, Takashi Ninomiya, and Jun'ichi Tsujii. 2004. Corpus-oriented Grammar Development for Acquiring a Head-driven Phrase Structure Grammar from the Penn Treebank. In Proceedings of IJCNLP-04.

[4] Yusuke Miyao and Jun'ichi Tsujii. 2005. Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing. In Proceedings of ACL-2005, pp. 83-90.

[5] Takashi Ninomiya, Takuya Matsuzaki, Yoshimasa Tsuruoka, Yusuke Miyao and Jun'ichi Tsujii. 2006. Extremely Lexicalized Models for Accurate and Fast HPSG Parsing. In Proceedings of EMNLP 2006.

[6] Takashi Ninomiya, Takuya Matsuzaki, Yusuke Miyao, and Jun'ichi Tsujii. 2007. A log-linear model with an n-gram reference distribution for accurate HPSG parsing. In Proceedings of IWPT 2007.

[7] Yusuke Miyao and Jun'ichi Tsujii. 2008. Feature Forest Models for Probabilistic HPSG Parsing. Computational Linguistics. 34(1). pp. 35--80, MIT Press.

[8] Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Tsujii. 2003. Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. In Proceedings of IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.

[9] Takashi Ninomiya, Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Tsujii. 2005. Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing . In Proceedings of IWPT 2005.

[10] Takashi Ninomiya, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura and Jun'ichi Tsujii. 2006. Fast and Scalable HPSG Parsing. Traitement automatique des langues (TAL). 46(2). Association pour le Traitement Automatique des Langues.

[11] Takuya Matsuzaki, Yusuke Miyao, and Jun'ichi Tsujii. 2007. Efficient HPSG Parsing with Supertagging and CFG-filtering. In Proceedings of IJCAI 2007.

[12] Tadayoshi Hara, Yusuke Miyao, and Jun'ichi Tsujii. 2005. Adapting a probabilistic disambiguation model of an HPSG parser to a new domain . In Proceedings of IJCNLP 2005.

[13] Tadayoshi Hara, Yusuke Miyao, and Jun'ichi Tsujii. 2007. Evaluating Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser. In Proceedings of IWPT 2007.

[14] Kenji Sagae, Yusuke Miyao, and Jun'ichi Tsujii. 2007. HPSG Parsing with Shallow Dependency Constraints. In Proceedings of ACL 2007.

[15] Takuya Matsuzaki and Jun'ichi Tsujii. 2008. Comparative Parser Performance Analysis across Grammar Frameworks through Automatic Tree Conversion using Synchronous Grammars. In Proceedings COLING 2008.

[16] Yusuke Miyao, Rune Saetre, Kenji Sagae, Takuya Matsuzaki, and Jun'ichi Tsujii. 2008. Task-Oriented Evaluation of Syntactic Parsers and Their Representations. In Proceedings of ACL-08:HLT.