Research and Development Work on Text Mining Infrastructures

One of the core challenges facing text mining and natural language processing (NLP) researchers and tool developers is the general lack of interoperability between different tools and resources. At NaCTeM we have attempted to solve this by adapting our tools to function within the UIMA framework, and by developing text mining infrastructures that build upon this framework.

We frequently incorporate new NLP and Text Mining tools being developed at NaCTeM into our UIMA inventory, and co-operate with partner institutes to support the development of large scale text mining systems.

Our work on UIMA has most recently been focussed on the development of two different infrastructures that allow construction and evaluation of workflows constructed of UIMA components

U-Compare - an integrated text mining/natural language processing system implemented as a Java application. The graphical user interface provides drag-and drop facilities for rapidly creating workflows. Evaluation facilities are also built in. New UIMA components can easilty be imported for use in the system, and complete workflows can be exported for use by other U-Compare users.
Argo - A web-based collaborative environment for the development of text-proceesing workflows. Complex processing workflows can be created, which can include multiple braching and merging points. Features include user-interactive components, such as an annotation editor, which will pause a workflow and wait for input from the user, and sharing of workflows and documents between users.

Our UIMA work is widely recognised, as evidenced by the IBM UIMA Innovation Awards that were received by Prof. Sophia Ananiadou, director of NaCTeM, in the years 2006, 2007 and 2008.

Recent Publications on UIMA Work

Batista-Navarro, R., Carter, J. and Ananiadou, S. (2016).
Argo: Enabling the development of bespoke workflows and services for disease annotation. Database: The Journal of Biological Databases and Curation, 2016
Batista-Navarro, R., Hammock, J., Ulate, W. and Ananiadou, S. (2016).
A Text Mining Framework for Accelerating the Semantic Curation of Literature. Proceedings of the 20th International Conference on Theory and Practice of Digital Libraries (TPDL 2016), pp. 459-462, Springer
Batista-Navarro, R., Soto, A., Ulate, W. and Ananiadou, S. (2016).
Text Mining Workflows for Indexing Archives with Automatically Extracted Semantic Metadata. Proceedings of the 20th International Conference on Theory and Practice of Digital Libraries (TPDL 2016), pp. 471–473, Springer
Batista-Navarro, R., Zerva, C. and Ananiadou, S. (2016).
Construction of a biodiversity knowledge repository using a text mining-based framework. Proceedings of the 3rd Annual International Symposium on Information Management and Big Data (SIMBig 2016), pp. 22.25.
Shardlow, M., Przybyła, P., Batista-Navarro, R., Carter, J., McNaught, J. and Ananiadou, S. (2016).
Facilitating and promoting web annotation with Argo. Proceedings of I Annotate 2016
Batista-Navarro, R., Carter, J. and Ananiadou, S. (2015).
Development of bespoke machine learning and biocuration workflows in a BioC-supporting text mining workbench. Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, Seville, Spain, pp. 51-56
Batista-Navarro, R., Carter, J. and Ananiadou, S. (2015).
Semi-automatic curation of chronic obstructive pulmonary disease phenotypes using Argo. Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, Seville, Spain, pp. 403-408
Fu, X., Batista-Navarro, R., Rak, R. and Ananiadou, S. (2015).
Supporting the Annotation of Chronic Obstructive Pulmonary Disease (COPD) Phenotypes with Text Mining Workflows. Journal of Biomedical Semantics, 6:8
Rak, R., Batista-Navarro, R. T. B., Carter, J., Rowley, A. and Ananiadou, S. (2014).
Processing Biological Literature with Customisable Web Services Supporting Interoperable Formats. Database: The Journal of Biological Databases and Curation
Rak, R., Batista-Navarro, R. T. B., Rowley, A., Carter, J. and Ananiadou, S. (2014).
Text Mining-assisted Biocuration Workflows in Argo. Database: The Journal of Biological Databases and Curation
Rak, R., Rowley, A., Carter, J., Batista-Navarro, R. T. B. and Ananiadou, S. (2014).
Interoperability and Customisation of Annotation Schemata in Argo. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pp. 3837-3842.
Rosner, M., Attard, A., Thompson, P., Gatt, A. and Ananiadou, S. (2014).
Extending a Tool Resource Framework with U-Compare. Human Language Technology Challenges for Computer Science and Linguistics, Lecture Notes in Computer Science, Vol 8387, pages 315-326, Springer
Batista-Navarro, R. T. B., Kontonatsios, G., Mihaila, C., Thompson, P., Rak, R., Nawaz, R., Korkontzelos, I. and Ananiadou, S. (2013).
Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, Vol 7816, pages 559-571, Springer.
Kontonatsios, G., Korkontzelos, I., Kolluru, B., Thompson, P. and Ananiadou, S. (2013).
Deploying and Sharing U-Compare Workflows as Web Services. Journal of Biomedical Semantics, 4:7
Kontonatsios, G., Thompson, P., Batista-Navarro, R. T. B., Mihaila, C., Korkontzelos, I. and Ananiadou, S. (2013).
Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Sofia, Bulgaria, pp. 43-48
Mihaila, C., Kontonatsios, G., Batista-Navarro, R. T. B., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2013).
Towards a Better Understanding of Discourse: Integrating Multiple Discourse Annotation Perspectives Using UIMA. Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Association for Computational Linguistics, pp. 79-88 (LAW Challenge Award)
Rak, R. and Ananiadou, S. (2013).
Making UIMA Truly Interoperable with SPARQL. Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 88-97.
Rak, R., Batista-Navarro, R. T. B., Rowley, A., Carter, J. and Ananiadou, S. (2013).
Customisable Curation Workflows in Argo. Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, pp. 270-278/
Rak, R., Rowley, A., Carter, J. and Ananiadou, S. (2013).
Development and Analysis of NLP Pipelines in Argo. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 115-120/
Kontonatsios, G., Korkontzelos, I. & Ananiadou, S. (2012).
Developing Multilingual Text Mining Workflows in UIMA and U-Compare. Proceedings of the 17th International conference on Applications of Natural Language Processing to Information Systems, pp. 82 - 93.
Rak, R., Rowley, A. & Ananiadou, S. (2012). Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 2971-2976
Rak, R., Rowley, A., Black, W.J. and Ananiadou, S. (2012). Argo: an integrative, interactive, text mining-based workbench supporting curation. Database: The Journal of Biological Databases and Curation
Ananiadou, S., Thompson, P., Kano, Y., McNaught, J., Attwood, T. K., Day, P. J. R., Keane, J., Jackson, D. & Pettifer, S. (2011)
Towards Interoperability of European Language Resources. Ariadne, 67
Kano, Y., Miwa, M., Cohen, K. B., Hunter, L., Ananiadou, S. & Tsujii, J. (2011)
U-Compare: a modular NLP workflow construction and evaluation system. IBM Journal of Research and Development, 55(3), 11:1 - 11:10
Kolluru, B., Hawizy, L., Murray-Rust, P., Tsujii, J. & Ananiadou, S. (2011)
Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry. PLoS ONE, 6(5), e20181
Kontonatsios, G., Korkontzelos, I., Kolluru, B. & Ananiadou, S. (2011)
Adding Text Mining Workflows as Web Services to the BioCatalogue. Proceedings of the 4th International Workshop on Semantic Web Aplications and Tools for the Life Sciences (SWAT4LS).
Thompson, P., Kano, Y., McNaught, J., Pettifer, S., Attwood, T. K., Keane, J. & Ananiadou, S. (2011) Promoting Interoperability of Resources in META-SHARE. Proceedings of the IJCNLP Workshop on Language Resources, Technology and Services in the Sharing Paradigm (LRTS), pp. 50-58
Kano, Y, Baumgartner Jr., W.A, McCrohon, L., Ananiadou, S., Cohen, K.B., Hunter, L. & Tsujii, J. (2009)
U-Compare: share and compare text mining tools with UIMA. Bioinformatics, 25(15), 1997-1998.