Version 1.0
Tagging speed is crucial in large-scale information extraction and real-time NLP applications. This part-of-speech (POS) tagger offers fast tagging (2400 tokens/sec) with a state-of-the-art accuracy (97.10% on the WSJ corpus). The tagger uses an extension of Maximum Entropy Markov Models (MEMM), in which tags are determined in the easiest-first mannar. For details of the algorithm and performance, see [1].
Note: This page is no longer maintained. Click here for a more accurate and trainable version of the tagger.
The tagger is tested only on linux and gcc.
> tar xvzf postagger.tar.gz
> cd postagger/
> make
> ./tagger < TEXTFILE > TAGGEDTEXT
> echo "He opened the window." | ./tagger
He/PRP opened/VBD the/DT window/NN ./.
>