DECA
Overview
DECA (Disease Extraction with Concept Association) was a one year project funded by Pfizer. It concerns automatically extracting associations between concepts in the biomedical domain, such as diseases and symptoms, from collections of biomedical texts (e.g., MEDLINE). A considerable amount of research was put into lexical disambiguation of the biomedical names.Motivation
Manually searching pieces of information in the ocean of research papers can be very difficult and time-consuming. This task becomes even more demanding if one performs searches on biomedical texts, which tend to involve large numbers of named entities in specialised domains. In addition, biologists are often interested in knowing certain types of associations between the entities, such as interactions between proteins and relations between symptoms and diseases. Traditional information retrieval techniques bring little help in speeding up such search tasks, because they do not recognise and index biomedical named entities (e.g., proteins, genes and diseases), and neither the links among them.
Therefore, it is desirable to build a software tool that uses natural language processing and text mining technologies to automatically recognise and disambiguate biomedical named entities and find their associations. Then, a search engine based on that would hopefully make searches more efficient and enjoyable.
Challenges
The challenges of this project are, among others :- Recognising different types of biomedical named entities. Named entity recognition systems can achieve relatively good performance in terms precision and recall providing they have sufficient resources such as human-annotated training data and comprehensive dictionaries for development. However, such resources are scarce for many types of entities, and recognising which remain challenging.
- Resolving lexical ambiguity in biomedical named entities. For example, the same text string can refer to different types of entities and/or to the same type of entity but different species (e.g., human or mouse). Distinguishing their meanings according to the context that they occur in can be very tricky.
- As the sheer size of amount of text to be processed, making the software tool efficient is not a trivial task.
More Details
Please click here for more details and a Web demonstration of DECA.Project Team
Principal Investigator: Prof. Sophia AnaniadouCo-investigator: Prof. Jun'ichi Tsujii
Researcher: Dr Xinglong Wang and Dr Chikashi Nobata
Publications
Yutaka Sasaki, Xinglong Wang and Sophia Ananiadou. (In press). Extracting Secondary Bio-Event Arguments with Extraction Constraints. Computational Intelligence.
Xinglong Wang, Jun'ichi Tsujii and Sophia Ananiadou. (2010). Disambiguating the Species of Biomedical Named Entities Using Natural Language Parsers. Bioinformatics, 26(5):661-667; doi: 10.1093/bioinformatics/btq002
Xinglong Wang, Jun'ichi Tsujii and Sophia Ananiadou. (2009). Classifying Relations for Biomedical Named Entity Disambiguation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore.
Featured News
- New paper on dimensionality reduction for multi-label classification
- New homepage for the GENIA project and biomedical annotated corpora
- Detection and classification of anatomical entities - new resources, tools and paper
- Third Workshop on Building and Evaluating Resources for Biomedical Text Mining - Call for Papers
- Detecting Structure in Scholarly Discourse - Call for papers
- NaCTeM to join forces with Elsevier to develop SciVerse Applications
- Prof. Ananiadou to give keynote speech at IHI 2012 - Call for participation
Other News & Events
- Event at House of Commons to discuss Hargreaves Review
- Computational Intelligence special issue on BioNLP Shared Task 2009 published
- Special issue of BMC Bioinformatics on BioCreative III
- Invited talk at STM Innovations Seminar 2011
- Invited talk at IPRC Workshop "Copyright exceptions in the UK: time for reform?"





