Chemistry Using Text Annotations
This project (CheTA) will integrate Cambridge's chemical text mining tool OSCAR with the U-Compare workflow infrastructure developed by NaCTeM and others. This integration adds chemistry to the world's largest public collection of interoperable text mining tools and will be highly valued by influential stakeholders both in the JISC community and the wider chemistry community. After a baseline study (UCC and RSC) and the integration have been accomplished, the project will use the CheTA tools to index a corpus of documents of different types and provenance. CheTA will develop a rigorous evaluation framework with annotation studies for a formal scientific evaluation of the system ('Are we extracting metadata correctly' - RSC/NaCTeM), user requirements studies for the metadata needs of 'real world users' ('What metadata is useful?' - RSC/UCC) and comparing extracted metadata against the usefulness (all project partners). Finally, the economic cost of metadata generation by both human indexers and robots will be quantified.
It is expected that the application of professionally maintained, automated and sustainable text mining services, enabled by CheTA to public information sources such as PubMed, will lead to significant future enhancements in resource discovery.
As part of CheTA, OSCAR has been refactored into different workflows (a sequence of individual components to perform a certain task, in this case named entity recognition of chemical elements).
A talk about OSCAR and U-Compare, presented at the OSCAR4 launch event, is available to watch online.
System requirements to run the workflows:
- Download the workflows listed below and save them on your machine
- U-Compare; load the U-Compare interface by clicking here
- Read about loading and running workflows in U-Compare
- Load the workflows in U-Compare to annotate your files
- Your input files *must* be in a text or an xml format
Workflows
If you are using Safari, you might want to right click (ctrl+click) to download the workflows.
Web-based demo
Please note that this service is a beta and contact kollurub AT cs DOT man DOT ac DOT uk for more information
Publications
- BalaKrishna Kolluru, Lezan Hawizy, Peter Murray-Rust, Junichi Tsujii, Sophia Ananiadou. Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry. PLoS ONE 6(5): e20181.
Featured News
- New paper on dimensionality reduction for multi-label classification
- New homepage for the GENIA project and biomedical annotated corpora
- Detection and classification of anatomical entities - new resources, tools and paper
- Third Workshop on Building and Evaluating Resources for Biomedical Text Mining - Call for Papers
- Detecting Structure in Scholarly Discourse - Call for papers
- NaCTeM to join forces with Elsevier to develop SciVerse Applications
- Prof. Ananiadou to give keynote speech at IHI 2012 - Call for participation
Other News & Events
- Event at House of Commons to discuss Hargreaves Review
- Computational Intelligence special issue on BioNLP Shared Task 2009 published
- Special issue of BMC Bioinformatics on BioCreative III
- Invited talk at STM Innovations Seminar 2011
- Invited talk at IPRC Workshop "Copyright exceptions in the UK: time for reform?"





