A fast CFG parser with chunk parsing



Developed at:
University of Tokyo, Department of Computer Science,
Tsujii laboratory

Version 1.1

Overview

This CFG parser offers a reasonable peformance (an f-score of 85%) with high-speed parsing (71 sentences/sec). If you need to parse a huge collection of documents such as a Web corpus, or to build an interactive (real-time) information extraction system, this parser could be useful. For details of the parser, see [1].

If you are looking for a high-precision CFG parser, try Charniak parser or Collins parser.

If you are looking for a parser that gives a deeper analysis, try Enju.

How to use the parser

The parser is currently tested only on linux and gcc.

1. Download the latest version of the parser

2. Expand the archive

> tar xvzf chunkparser.tar.gz

3. Make

> cd chunkparser/
> make

4. Parse sentences

Prepare a POS-tagged text containing one sentence per line (use a part-of-speech tagger to produce such text), then,

> ./parser < TAGGEDTEXT > PARSEDTEXT

If you use -s option, the parser performs search, which significantly improves the recall (about 2.0 points in f-score) but makes parsing several times slower.

> ./parser -s < TAGGEDTEXT > PARSEDTEXT

Example

> echo "He/PRP opened/VBD the/DT window/NN ./." | ./parser
(TOP (S (NP (PRP He) ) (VP (VBD opened) (NP (DT the) (NN window) ) ) (. .) ) )
>

References

[1] Yoshimasa Tsuruoka and Jun'ichi Tsujii, Chunk Parsing Revisited. In Proceedings of the 9th International Workshop on Parsing Technologies (IWPT 2005), pp. 133-140.


This page is maintained by Yoshimasa Tsuruoka