The XHPSG System XHPSG-logo

Last updated on January 10th, 2001

Note: The XHPSG project is obsoleted by the RenTAL project.
The XHPSG system is a wide-coverage parsing system for English based on the HPSG formalism. Tsujii Lab. is devloping the XHPSG system as a general-purpose parsing system, which can be used by various NLP applications. Currently, the system is under development yet and we have just started the application to information extraction.

Architecture

The architecture of the XHPSG system is described in the following image.

Architecture of the XHPSG system
As can be seen, The system is composed of a grammar, preprocessors, and parsers. The grammar is converted from the XTAG English grammar by Dr. Yuka Tateisi [1]. The preprocessors are designed for processing various texts, and users can incorporate a new preprocessor. The parsers are general HPSG parsers, including the TNT parser, common to SLUNG and the LinGO grammar on LiLFeS. The correspondence between the XHPSG system and the XTAG system is described below.
Correspondence between XHPSG and XTAG

Grammar

The XHPSG grammar is manually converted from the XTAG English grammar, which is based on the TAG formalism [1]. A tree in the XTAG grammar is converted into a lexical entry in XHPSG, and ten schemata are defined for emulating two operations in TAG, namely substitution and adjunction.

TAG describes a grammar with a set of elementary trees, which are categorized into initial trees and auxiliary tree. In the XTAG grammar, words are categorized into tree families and trees, and a set of trees is assigned to each of them. Then, trees are combined by substitution and adjunction, and construct a derived/derivation tree. On the other hand, HPSG describes a grammar with a set of lexical entries, and schemata. A derived tree is constructed by recursively applying schemata to lexical/phrasal feature structures.

The XHPSG grammar is obtained by converting initial trees and auxiliary trees assigned to tree families and trees. Initial trees are converted by putting leaf nodes in the initial trees into subcategorization frames in lexical entries in HPSG. Auxiliary trees are converted to modifying an adjoining node. The translation scheme is illustrated as the following pictures.

Translation of an initial tree
Translation of an auxiliary tree

Preprocessors

This will be updated soon... (by yusuke)

Parsers

Parsers on LiLFeS can be applied to all grammars which are (i) described with lexical entries and grammar rules, and (ii) both of them are represented with typed feature structures. In the LiLFeS project, an interface between a parser and a grammar is defined, and all parsers and grammars are designed to support the interface. Consequently, we can use any combination of a parser and a grammar. The XHPSG system also follows the interface, and two parsers can be used now. One of them is an efficient HPSG parser with CFG filtering, which is called the TNT parser

Specifications

The following table shows specifications of the XHPSG system, which help to browse the current status of the system. We should note that some of them are inherited from the XTAG grammar, which the XHPSG grammar depends on.

# of lexicons 317324
# of trees (and tree families) 428
# of schemata 8

Performance

The XHPSG system is implemented in the LiLFeS language, and runs on Pentium, Sun, and Alpha machines. The following results are measured on a Pentium III 500 mhz CPU with 4 gigabytes of memory. Test corpora are ATIS and Wall Street Journal from Penn Treebank. Parsers are the naive implementation of the CKY-style parser (Naive) and the TNT parser (TNT).

Corpus # of sentences Avg. length (words) Coverage (%) Mean # of edges Parsing Time (msec.)
Naive TNT
ATIS 500 7.42 61.8 372.2 14270 357
Wall Street Journal --- --- --- --- --- ---

Screen Shots

There are some screen shots of using the XHPSG system through the will interface.

Manuals

I am afraid but there is only a Japanese manual available, and it is somewhat obsolete :)

References

[1] Yuka Tateisi, Kentaro Torisawa, Yusuke Miyao and Jun-ichi Tsujii. Translating the XTAG English grammar to HPSG. pp. 172-175. In Proceedings of the TAG+4 Workshop. 1998.
Tsujii Laboratory Department of Information Science University of Tokyo
This page has just started and is always under construction. All questions and suggestions are welcome. Please mail to:
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)