Research and Development Work on Text Mining Infrastructures

One of the core challenges facing text mining and natural language processing (NLP) researchers and tool developers is the general lack of interoperability between different tools and resources. At NaCTeM we have attempted to solve this by adapting our tools to function within the UIMA framework, and by developing text mining infrastructures that build upon this framework.

We frequently incorporate new NLP and Text Mining tools being developed at NaCTeM into our UIMA inventory, and co-operate with partner institutes to support the development of large scale text mining systems.

Our work on UIMA has most recently been focussed on the development of two different infrastructures that allow construction and evaluation of workflows constructed of UIMA components

  • U-Compare - an integrated text mining/natural language processing system implemented as a Java application. The graphical user interface provides drag-and drop facilities for rapidly creating workflows. Evaluation facilities are also built in. New UIMA components can easilty be imported for use in the system, and complete workflows can be exported for use by other U-Compare users.
  • Argo - A web-based collaborative environment for the development of text-proceesing workflows. Complex processing workflows can be created, which can include multiple braching and merging points. Features include user-interactive components, such as an annotation editor, which will pause a workflow and wait for input from the user, and sharing of workflows and documents between users.

Our UIMA work is widely recognised, as evidenced by the IBM UIMA Innovation Awards that were received by Prof. Sophia Ananiadou, director of NaCTeM, in the years 2006, 2007 and 2008.

Recent Publications on UIMA Work