UKPMC
Overview
This is a collaboration with the Text-Mining group at the European Bioinformatics Institute (EBI) and MIMAS forming a work package in the UKPMC project hosted and coordinated by the British Library. UKPMC, as a whole, forms a UK-based version of the PuBMed Central paper repository, in collaboration with the National Institutes of Health (NIH) in the United States. UKPMC is funded by a consortium of key funding bodies from the biomedical research funders. Our contribution to this major project is in the application of text mining solutions to enhance information retrieval and knowledge discovery. As such this is an application of technology developed in other NaCTeM projects on a large scale and in a prominent resource for the Biomedicine community.Challenges
This project is a scale up of existing text mining applications as it applies to the full text of the papers in the UKPMC collection (currently 1.3m and growing). It also provides this functionality to an clearly targeted and equally expanding user community. In addition to the increase in the amount of text in the corpus to be annotated, the structure of the documents as research papers must be taken into account. The semantic search capabilities must be made both accessible and intuitive to the users, while maintaining both the efficiency and quality of the results.Objectives
The specific project objectives are:- Deliver content from annotated documents (e.g., identified concepts, links to databases, relations amongst concepts) to the “related arts” segment in UKPMC.
- Customize and implement cutting edge, high performance named entity recognisers for selected semantic types, and disambiguation modules for named entity types, prioritised for end-users
- Customize and implement cutting edge, high performance linguistic analyzers for extracting a variety of biomedical facts of interest to the users.
- Annotate PMC documents with biomedical named entities, concepts and facts (using 1,2,3) and provide improved document representations where the contained concepts are linked to relevant biomedical databases to facilitate easy navigation between databases and the literature.
- Develop UKPMC search functionality. Use the concept annotations to index documents for information retrieval and provide automatic comparisons of documents to find related information (using 1,2,3). Rank facts and documents based on user interest and queries.
- Make generated resources publicly available wherever possible.
Project Team
Principal Investigator: Dr Sophia AnaniadouCo-investigators: Mr John McNaught and Professor Jun'ichi Tsujii
Project Team (NaCTeM): Dr Chikashi Nobata and Dr C.J. Rupp
Featured News
- Call for papers - BioNLP 2010
- Text Mining for Publishers event - 11th May 2010, London
- Launch of new features on UKPMC website
- Species disambiguation of biomedical named entities- release of software, corpus and article
- Call for papers - 2nd Workshop on Building and Evaluating Resources for Biomedical Text Mining
- New Biomedical Event Corpus (GREC) released
- ELRA Distribution Agreement signed for BioLexicon



