NaCTeM

Seminar — Yoshinobu Kano

Speaker: Yoshinobu Kano, The University of Tokyo
Title: U-Compare: an integrated text mining platform based on UIMA
Date: Friday 17th July 2009 at 11:00
Location: Room MLG.001 (Lecture Theatre) in the MIB Building
Abstract:

U-Compare project URL: http://u-compare.org/ Paper: http://bioinformatics.oxfordjournals.org/cgi/reprint/btp289v1

I will introduce the U-Compare project using demonstrations and example applications. Recent works for the BioNLP shared task related project and linking U-Compare to Taverna will be discussed as well.

U-Compare is an integrated text mining/natural language processing system based on the UIMA Framework. UIMA is an open framework to provide the interoperability, while U-Compare provides higher level of interoperability on top of UIMA. U-Compare provides the world largest type system compatible UIMA component repository, and a platform for any UIMA component which allows users to build complex NLP workflows via an easy drag-and-drop interface, and makes visualization and comparison of the outputs of these workflows simple. Users can create, run, compare/evaluate/visualize the results, for most types of the commonly seen text mining applications without any programming.

U-Compare is a joint work between the Tsujii Laboratory at the University of Tokyo, the Center for Computational Pharmacology at the University of Colorado, and the National Centre for Text Mining at the University of Manchester.

U-Compare platform
The U-Compare platform is implemented in pure Java, can be launched by a single click without any explicit installation operation. Users need not necessarily learn about UIMA nor do any programming, in order to create a workflow from existing components. Just one more click to run the created workflow, then statistics and instance level visualizations will be created automatically.

U-Compare Parallel Component allows users to create a parallel workflow, then U-Compare generates possible combinations of the specified components and compare/evaluate the results of components and annotated corpus if any. Such a comparison/evaluation is performed by a UIMA component, which is pluggable with user's own component.

U-Compare component repository and type system
Our ready-to-use components in the U-Compare repository are currently over 50, still growing, originally developed by different groups for a variety of domains (syntactic, semantic, biological, etc.), includes many famous tools and corpora. These components are fully compatible with the U-Compare type system, which covers a broad range of concepts.

U-Compare components can also be used within any UIMA workflow without the U-Compare platform. U-Compare also provides an API to embed a UIMA workflow in a non-UIMA application, via the standard in/out streams. There is a command-line way provided to run a workflow without GUIs as well.