Frequently Asked Questions

Which tagger should I use?

A part-of-speech tagger consists of an algorithm (e.g., Viterbi algorithm) on top of a model (e.g., Hidden Markov Model) typically constructed from a corpus (e.g., Penn-Treebank) and algorithm. Thus, these features determine the applicability and performance of a tagger. Currently, TerMine supports GENIA tagger (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/postagger/) and TreeTagger (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/). Trained on the GENIA, PennBioIE, and Wall Street Journal corpora, GENIA tagger is suitable for processing bio-medical text. TreeTagger is suitable for a general English text such as newspaper articles.

Back