Research and Development Work on UIMA tools

One of the core challenges facing text mining and natural language processing (NLP) researchers and tool developers is the general lack of interoperability between different tools and resources. At NaCTeM we have looked to solve this by adapting our tools to function within the UIMA framework enabling direct interaction with those tools provided by other groups around the world.

Our UIMA work is widely recognised, and Dr. Sophia Ananiadou, the director of NaCTeM, has received IBM UIMA Innovation Awards successively for the years of 2006 and 2007.

We continually incorporate new NLP and Text Mining tools being developed at our centre into our UIMA package, and co-operate with partner institutes to support the development of large scale text mining systems. A number of these tools are described below and will be updated regularly. For further information on our work in this area or any of the individual tools mentioned you can contact our team at

Main NaCTeM Local UIMA Tools

The following tools form the core of the current nactem UIMA package which interoperate via NaCTeM UIMA type system (The UIMA tools are not publically available yet).

Name Developer Description
Sentence Annotator BOOTStrep Project, NaCTeM, UoM A high-performance heuristic rule-based tool.
Termine NaCTeM, UoM A domain independent term recognition tool using C-Value metric which is based on both linguistic and statistical analyses.
Genia Tagger and Chunker Tsujii Lab, UoT This tool anotates Tokens with POS information and Chunks with phrase types.
Gene/Protein Name Recognition Tool NaCTeM, UoM This tool recognises and annotates gene and protein names in the input text.

Co-operation on UIMA Tools Development

We link our UIMA components with partners' tools to support comparison and development of large scale systems.


Our components are integrated into U-Compare. This is an integrated text mining/natural language processing system based on the UIMA Framework, which provides access to a large collection of ready-to-use interoperable natural language processing components, currently the world's largest UIMA component repository. U-Compare allows users to build complex NLP workflows via an easy drag-and-drop interface, and makes visualization and comparison of the outputs of these workflows simple.

U-Compare is the result of a collaboration between the Tsujii Laboratory at the University of Tokyo, the Center for Computational Pharmacology at the University of Colorado, and the National Centre for Text Mining. For further details, please see the U-Compare website

Related Publications

  • Kano, Yoshinobu, Ngan Nguyen, Rune S\aetre, Keiichiro Fukamachi, Kazuhiro Yoshida, Yusuke Miyao, Yoshimasa Tsuruoka, Sophia Ananiadou and Jun'ichi Tsujii. Sharable type system design for tool inter-operability and combinatorial comparison. In the Proceedings of the First International Conference on Global Interoperability for Language Resources (ICGL). Hong Kong, January 2008.
  • Kano, Yoshinobu, Ngan Nguyen, Rune Sætre, Kazuhiro Yoshida, Keiichiro Fukamachi, Yusuke Miyao, Yoshimasa Tsuruoka, Sophia Ananiadou and Jun'ichi Tsujii (2008) Towards Data And Goal Oriented Analysis: Tool Inter-Operability And Combinatorial Comparison. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. Hyderabad, India, January 2008.
  • Kano, Y., Nguyen, N., Saetre, R., Yoshida, K., Miyao, Y., Tsuruoka, Y., Matsubayashi, Y., Ananiadou, S. and Tsujii, J. (2008) Filling the gaps between tools and users: a tool comparator, using protein-protein interaction as an example, in PSB 2008, Hawaii.
  • Hahn, Udo, Ekaterina Buyko, Katrin Tomanek, Scott Piao, John McNaught, Yoshimasa Tsuruoka and Sophia Ananiadou (2007). An Annotation Type System for a Data-Driven NLP Pipeline. Accepted. The Linguistic Annotation Workshop (LAW), ACL, Prague, Czech Republic.
  • Piao, Scott, Sophia Ananiadou and John McNaught (2007). Integrating Annotation Tools into UIMA for Interoperability. In Proceedings of the UK e-Science AHM Conference 2007, Nottingham, UK, pp. 575-582.
  • Piao, Scott, Ekaterina Buyko, Yoshimasa Tsuruoka , Katrin Tomanek, Jin-Dong Kim, John McNaught, Udo Hahn, Jian Su and Sophia Ananiadou (2007). BOOTStrep Annotation Scheme – Encoding Information for Text Mining. Corpus Linguistics Conference, Birmingham.