Research and Development Work on UIMA tools
One of the core challenges facing text mining and natural language processing (NLP) researchers and tool developers is the general lack of interoperability between different tools and resources. At NaCTeM we have looked to solve this by adapting our tools to function within the UIMA framework enabling direct interaction with those tools provided by other groups around the world.
We continually incorporate new NLP and Text Mining tools being developed at our centre into our UIMA package, and co-operate with partner institutes to support the development of large scale text mining systems. A number of these tools are described below and will be updated regularly. For further information on our work in this area or any of the individual tools mentioned you can contact our team at firstname.lastname@example.org.
Main NaCTeM Local UIMA Tools
The following tools form the core of the current nactem UIMA package which interoperate via NaCTeM UIMA type system (The UIMA tools are not publically available yet).
|Sentence Annotator||BOOTStrep Project, NaCTeM, UoM||A high-performance heuristic rule-based tool.|
|Termine||NaCTeM, UoM||A domain independent term recognition tool using C-Value metric which is based on both linguistic and statistical analyses.|
|Genia Tagger and Chunker||Tsujii Lab, UoT||This tool anotates Tokens with POS information and Chunks with phrase types.|
|Gene/Protein Name Recognition Tool||NaCTeM, UoM||This tool recognises and annotates gene and protein names in the input text.|
Co-operation on UIMA Tools DevelopmentWe link our UIMA components with partners' tools to support comparison and development of large scale systems.
Our components are integrated into U-Compare. This is an integrated text mining/natural language processing system based on the UIMA Framework, which provides access to a large collection of ready-to-use interoperable natural language processing components, currently the world's largest UIMA component repository. U-Compare allows users to build complex NLP workflows via an easy drag-and-drop interface, and makes visualization and comparison of the outputs of these workflows simple.
U-Compare is the result of a collaboration between the Tsujii Laboratory at the University of Tokyo, the Center for Computational Pharmacology at the University of Colorado, and the National Centre for Text Mining. For further details, please see the U-Compare website
- Kano, Yoshinobu, Ngan Nguyen, Rune S\aetre, Keiichiro Fukamachi, Kazuhiro Yoshida, Yusuke Miyao, Yoshimasa Tsuruoka, Sophia Ananiadou and Jun'ichi Tsujii. Sharable type system design for tool inter-operability and combinatorial comparison. In the Proceedings of the First International Conference on Global Interoperability for Language Resources (ICGL). Hong Kong, January 2008.
- Kano, Yoshinobu, Ngan Nguyen, Rune Sætre, Kazuhiro Yoshida, Keiichiro Fukamachi, Yusuke Miyao, Yoshimasa Tsuruoka, Sophia Ananiadou and Jun'ichi Tsujii (2008) Towards Data And Goal Oriented Analysis: Tool Inter-Operability And Combinatorial Comparison. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. Hyderabad, India, January 2008.
- Kano, Y., Nguyen, N., Saetre, R., Yoshida, K., Miyao, Y., Tsuruoka, Y., Matsubayashi, Y., Ananiadou, S. and Tsujii, J. (2008) Filling the gaps between tools and users: a tool comparator, using protein-protein interaction as an example, in PSB 2008, Hawaii.
- Hahn, Udo, Ekaterina Buyko, Katrin Tomanek, Scott Piao, John McNaught, Yoshimasa Tsuruoka and Sophia Ananiadou (2007). An Annotation Type System for a Data-Driven NLP Pipeline. Accepted. The Linguistic Annotation Workshop (LAW), ACL, Prague, Czech Republic.
- Piao, Scott, Sophia Ananiadou and John McNaught (2007). Integrating Annotation Tools into UIMA for Interoperability. In Proceedings of the UK e-Science AHM Conference 2007, Nottingham, UK, pp. 575-582.
- Piao, Scott, Ekaterina Buyko, Yoshimasa Tsuruoka , Katrin Tomanek, Jin-Dong Kim, John McNaught, Udo Hahn, Jian Su and Sophia Ananiadou (2007). BOOTStrep Annotation Scheme – Encoding Information for Text Mining. Corpus Linguistics Conference, Birmingham.
- Invited Talk at the U.S. National Library of Medicine
- BioTxtM - papers, presentations and posters available for download
- Talk at OpenAIRE-COAR Conference
- EC Independent Text and Data Mining Expert Group Report recommends changes to EU copyright law
- Keynote talk at NLDB 2014
- OSSMETER at ICT 2013
- Funding Success for NaCTeM
- BioNLP 2014
- COLING 2014
Other News & Events
- Invited Talk at LOUHI 2014
- Copyright exception legislation for text mining
- Invited Talk at ICUH 2014
- Call for Papers - BioTxtM2014 - DEADLINE EXTENSION - 17th February 2014
- New Research Associate position available at NaCTeM