NaCTeM

Europe PMC

Overview

This is a collaboration with the Text-Mining group at the European Bioinformatics Institute (EBI) and MIMAS forming a work package in the Europe PMC project (formerly UKPMC) hosted and coordinated by the British Library. Europe PMC, as a whole, forms a European version of the PuBMed Central paper repository, in collaboration with the National Institutes of Health (NIH) in the United States. Europe PMC is funded by a consortium of key funding bodies. Our contribution to this major project is in the application of text mining solutions to enhance information retrieval and knowledge discovery. As such this is an application of technology developed in other NaCTeM projects on a large scale and in a prominent resource for the Biomedicine community.

Challenges

This project is a scale up of existing text mining applications as it applies to the full text of the papers in the Europe PMC collection (currently 2.2m and growing). It also provides this functionality to an clearly targeted and equally expanding user community. In addition to the increase in the amount of text in the corpus to be annotated, the structure of the documents as research papers must be taken into account. The semantic search capabilities must be made both accessible and intuitive to the users, while maintaining both the efficiency and quality of the results.

Objectives

The specific project objectives are:
  1. Deliver content from annotated documents (e.g., identified concepts, links to databases, relations amongst concepts) to the "related arts" segment in UKPMC.
  2. Customize and implement cutting edge, high performance named entity recognisers for selected semantic types, and disambiguation modules for named entity types, prioritised for end-users
  3. Customize and implement cutting edge, high performance linguistic analyzers for extracting a variety of biomedical facts of interest to the users.
  4. Annotate PMC documents with biomedical named entities, concepts and facts (using 1,2,3) and provide improved document representations where the contained concepts are linked to relevant biomedical databases to facilitate easy navigation between databases and the literature.
  5. Develop UKPMC search functionality. Use the concept annotations to index documents for information retrieval and provide automatic comparisons of documents to find related information (using 1,2,3). Rank facts and documents based on user interest and queries.
  6. Make generated resources publicly available wherever possible.

EvidenceFinder

EvidenceFinder, a search tool based on text mining technology, is now available to test on the Europe PMC Labs website. EvidenceFinder presents the user with a list of questions relating to their query terms. For example, given the search term "IL-2", EvidenceFinder will present questions such as What inhibits IL-2 receptor?, What binds to IL-2 receptor?, etc. These questions allow statements in the text to be located that discuss the search topic in specific ways. This allows information to be located that might otherwise be missed, and to quickly establish which articles do and do not contain information being sought.

  • New! - EUPMC Evidence Finder for Anantomical entities with meta-knowledge – This tool offers similar functionality to the standard EvidenceFinder tool, but it allows searches for anatomical entitites. A unique feature of the tool is the ability to filter facts according to various aspects of their interpretation (or meta-knowledge). Such aspects include negation, certainly level, fact type (e.g., analysis, experimental observation, definite fact) and novelty.

Project Team

Principal Investigator: Prof. Sophia Ananiadou
Co-investigators: Mr John McNaught
Project Team (NaCTeM): Mr. Jacob Carter.
Past team members (NaCTeM): Mr. William Black, Dr. Makoto Miwa, Dr. Rafal Rak, Dr. Andrew Rowley.

Further Information

Information about the latest Europe PMC developments can be found via the following channels: Europe PMC blog and two Europe PMC Twitter accounts provides more information about latest developments:

Publications

The Europe PMC Consortium. (2015). Europe PMC: a full-text literature database for the life sciences and platform for innovation. Nucleic Acids Research, 43(D1), D1042-D1048

Batista-Navarro, R. T. B., Rak, R. and Ananiadou, S. (2015). Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics. Journal of Cheminformatics, 7(Suppl 1), S6

Pyysalo, S. and Ananiadou, S. (2014). Anatomical Entity Mention Recognition at Literature Scale. Bioinformatics, 30(6), 868-875

Rak, R., Batista-Navarro, R. T. B., Carter, J., Rowley, A. and Ananiadou, S. (2014). Processing Biological Literature with Customisable Web Services Supporting Interoperable Formats. Database: The Journal of Biological Databases and Curation

Rak, R., Batista-Navarro, R. T. B., Rowley, A., Carter, J. and Ananiadou, S. (2014). Text Mining-assisted Biocuration Workflows in Argo. Database: The Journal of Biological Databases and Curation

Rak, R., Rowley, A., Carter, J., Batista-Navarro, R. T. B. and Ananiadou, S. (2014). Interoperability and Customisation of Annotation Schemata in Argo. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, pp. 3837-3842, European Language Resources Association (ELRA)

Batista-Navarro, R. T. B., Rak, R. and Ananiadou, S. (2013). Chemistry-specific Features and Heuristics for Developing a CRF-based Chemical Named Entity Recogniser. In Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, Bethesda, Maryland, USA, pp. 55-59

Black, W.J., Rupp, C. J., Nobata, C.,McNaught, J., Tsujii, J. and Ananiadou, S. (2010). High-Precision Semantic Search by Generating and Testing Questions. In Proceedings of the UK e-Science All Hands Meeting 2010.

McEntyre, J. R., Ananiadou, S., Andrews, S., Black, W.J., Boulderstone, R., Buttery, P., Chaplin, D., Chevuru, S., Cobley, N., Coleman, L., Davey, P., Gupta, B., Haji-Gholam, L., Hawkins, C., Horne, A., Hubbard, S. J., Kim, J. -H., Lewin, I., Lyte, V., MacIntyre, R., Mansoor, S., Mason, L., McNaught, J., Newbold, E., Nobata, C., Ong, E., Pillai, S., Rebholz-Schuhmann, D., Rosie, H., Rowbotham, R., Rupp, C. J., Stoehr, P. and Vaughan, P. (2010). UKPMC: a full text article resource for the life sciences. Nucleic Acids Research, 39 (Suppl. 1), D58-D65.

Rupp, C. J., Thompson, P., Black, W.J., McNaught, J. and Ananiadou, S. (2010). A Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain. In Proceedings of Interdisciplinary Workshop on Verbs: The Identification and Representation of Verb Features (Verb 2010).