Research and Development Work on UIMA tools
One of the core challenges facing text mining and natural language processing (NLP) researchers and tool developers is the general lack of interoperability between different tools and resources. At NaCTeM we have looked to solve this by adapting our tools to function within the UIMA framework enabling direct interaction with those tools provided by other groups around the world.
We continually incorporate new NLP and Text Mining tools being developed at our centre into our UIMA package, and co-operate with partner institutes to support the development of large scale text mining systems. A number of these tools are described below and will be updated regularly. For further information on our work in this area or any of the individual tools mentioned you can contact our team at email@example.com.
Main NaCTeM Local UIMA Tools
The following tools form the core of the current nactem UIMA package which interoperate via NaCTeM UIMA type system (The UIMA tools are not publically available yet).
|Sentence Annotator||BOOTStrep Project, NaCTeM, UoM||A high-performance heuristic rule-based tool.|
|Termine||NaCTeM, UoM||A domain independent term recognition tool using C-Value metric which is based on both linguistic and statistical analyses.|
|Genia Tagger and Chunker||Tsujii Lab, UoT||This tool anotates Tokens with POS information and Chunks with phrase types.|
|Gene/Protein Name Recognition Tool||NaCTeM, UoM||This tool recognises and annotates gene and protein names in the input text.|
Co-operation on UIMA Tools DevelopmentWe link our UIMA components with partners' tools to support comparison and development of large scale systems.
Our components are integrated into U-Compare. This is an integrated text mining/natural language processing system based on the UIMA Framework, which provides access to a large collection of ready-to-use interoperable natural language processing components, currently the world's largest UIMA component repository. U-Compare allows users to build complex NLP workflows via an easy drag-and-drop interface, and makes visualization and comparison of the outputs of these workflows simple.
U-Compare is the result of a collaboration between the Tsujii Laboratory at the University of Tokyo, the Center for Computational Pharmacology at the University of Colorado, and the National Centre for Text Mining. For further details, please see the U-Compare website
- Kano, Yoshinobu, Ngan Nguyen, Rune S\aetre, Keiichiro Fukamachi, Kazuhiro Yoshida, Yusuke Miyao, Yoshimasa Tsuruoka, Sophia Ananiadou and Jun'ichi Tsujii. Sharable type system design for tool inter-operability and combinatorial comparison. In the Proceedings of the First International Conference on Global Interoperability for Language Resources (ICGL). Hong Kong, January 2008.
- Kano, Yoshinobu, Ngan Nguyen, Rune Sætre, Kazuhiro Yoshida, Keiichiro Fukamachi, Yusuke Miyao, Yoshimasa Tsuruoka, Sophia Ananiadou and Jun'ichi Tsujii (2008) Towards Data And Goal Oriented Analysis: Tool Inter-Operability And Combinatorial Comparison. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. Hyderabad, India, January 2008.
- Kano, Y., Nguyen, N., Saetre, R., Yoshida, K., Miyao, Y., Tsuruoka, Y., Matsubayashi, Y., Ananiadou, S. and Tsujii, J. (2008) Filling the gaps between tools and users: a tool comparator, using protein-protein interaction as an example, in PSB 2008, Hawaii.
- Hahn, Udo, Ekaterina Buyko, Katrin Tomanek, Scott Piao, John McNaught, Yoshimasa Tsuruoka and Sophia Ananiadou (2007). An Annotation Type System for a Data-Driven NLP Pipeline. Accepted. The Linguistic Annotation Workshop (LAW), ACL, Prague, Czech Republic.
- Piao, Scott, Sophia Ananiadou and John McNaught (2007). Integrating Annotation Tools into UIMA for Interoperability. In Proceedings of the UK e-Science AHM Conference 2007, Nottingham, UK, pp. 575-582.
- Piao, Scott, Ekaterina Buyko, Yoshimasa Tsuruoka , Katrin Tomanek, Jin-Dong Kim, John McNaught, Udo Hahn, Jian Su and Sophia Ananiadou (2007). BOOTStrep Annotation Scheme – Encoding Information for Text Mining. Corpus Linguistics Conference, Birmingham.
- Participation in event on copyright and the case of text and data mining at European Parliament
- New paper and resources to support anatomical entity recognition at literature scale
- Keynote speech Pharma Documentation Ring special meeting in Bruges
- COLING 2014
- NaCTeM success at BioCreative IV
- Participation in Workshop on Text and Data Mining for Data Driven Innovation - Highlights available
- NaCTeM student selected to participate in Global Young Scientists Summit
Other News & Events
- NaCTeM papers accepted at ACL
- New paper on integrating and ranking textual evidence for biochemical pathways
- UK Government publishes draft legislation on copyright exception for data analysis
- ICHI - Call for Participation
- New paper on wide-coverage event extraction using multiple partially overlapping corpora