The XHPSG System 
Last updated on January 10th, 2001
Note: The XHPSG project is obsoleted by the RenTAL
project.
The XHPSG system is a wide-coverage parsing system for
English based on the HPSG formalism. Tsujii Lab. is
devloping the XHPSG system as a general-purpose parsing system, which
can be used by various NLP applications. Currently, the system is
under development yet and we have just started the application to
information extraction.
Architecture
The architecture of the XHPSG system is described in the following
image.
As can be seen, The system is composed of a grammar,
preprocessors, and parsers. The grammar is converted from the XTAG English grammar by
Dr. Yuka Tateisi [1].
The preprocessors are designed for processing various texts, and users
can incorporate a new preprocessor. The parsers are general HPSG
parsers, including the TNT parser,
common to SLUNG and the LinGO grammar
on LiLFeS. The correspondence between the XHPSG system and the
XTAG system is described below.
Grammar
The XHPSG grammar is manually converted from the XTAG English grammar,
which is based on the TAG formalism [1]. A tree in the XTAG grammar
is converted into a lexical entry in XHPSG, and ten schemata are
defined for emulating two operations in TAG, namely substitution and
adjunction.
TAG describes a grammar with a set of elementary trees, which are
categorized into initial trees and auxiliary tree. In the XTAG
grammar, words are categorized into tree families and trees, and a set
of trees is assigned to each of them. Then, trees are combined by
substitution and adjunction, and construct a derived/derivation tree.
On the other hand, HPSG describes a grammar with a set of lexical
entries, and schemata. A derived tree is constructed by recursively
applying schemata to lexical/phrasal feature structures.
The XHPSG grammar is obtained by converting initial trees and
auxiliary trees assigned to tree families and trees. Initial trees
are converted by putting leaf nodes in the initial trees into
subcategorization frames in lexical entries in HPSG. Auxiliary trees
are converted to modifying an adjoining node. The translation scheme
is illustrated as the following pictures.
Preprocessors
This will be updated soon... (by yusuke)
Parsers
Parsers on LiLFeS can be applied to all grammars which are (i)
described with lexical entries and grammar rules, and (ii) both of
them are represented with typed feature structures. In the LiLFeS
project, an interface between a parser and a grammar is defined, and
all parsers and grammars are designed to support the interface.
Consequently, we can use any combination of a parser and a grammar.
The XHPSG system also follows the interface, and two parsers can be
used now. One of them is an efficient HPSG parser with CFG filtering,
which is called the TNT parser
Specifications
The following table shows specifications of the XHPSG system, which
help to browse the current status of the system. We should note that
some of them are inherited from the XTAG grammar, which the XHPSG
grammar depends on.
| # of lexicons |
317324 |
| # of trees (and tree families) |
428 |
| # of schemata |
8 |
Performance
The XHPSG system is implemented in the LiLFeS
language, and runs on Pentium, Sun, and Alpha machines. The
following results are measured on a Pentium III 500 mhz CPU with 4
gigabytes of memory. Test corpora are ATIS and Wall
Street Journal from Penn
Treebank. Parsers are the naive implementation of the CKY-style
parser (Naive) and the TNT parser
(TNT).
| Corpus |
# of sentences |
Avg. length (words) |
Coverage (%) |
Mean # of edges |
Parsing Time (msec.) |
| Naive |
TNT |
| ATIS |
500 |
7.42 |
61.8 |
372.2 |
14270 |
357 |
| Wall Street Journal |
--- |
--- |
--- |
--- |
--- |
--- |
Screen Shots
There are some screen shots of using the XHPSG system through the will interface.
Manuals
I am afraid but there is only a Japanese manual available, and it is
somewhat obsolete :)
References
[1] Yuka Tateisi, Kentaro Torisawa, Yusuke Miyao and Jun-ichi Tsujii.
Translating the XTAG English grammar to HPSG. pp. 172-175.
In Proceedings of the TAG+4 Workshop. 1998.
Tsujii Laboratory
Department of Information Science
University of Tokyo
This page has just started and is always under construction. All
questions and suggestions are welcome. Please mail to:
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)