Anatomy Corpora
Anatomical entities are central to much of biomedical discourse and must be considered in any attempt to fully analyse biomedical scientific text. However, while a wealth of tools and resources have been introduced in domain natural language processing efforts for the recognition of molecular level entity (gene, protein, chemical) and organism name mentions in text, there has been little study of the recognition of mentions of anatomical entities such as tissues and organs.
To address this issue and to facilitate more detailed and comprehensive analysis of biomedical scientific text, our aim has been to to establish a fine-grained, species-independent anatomical entity mention detection task.
We have developed a number of manually-annotated corpora to support the above aim, as follows:
- Multi-Level Event Extraction (MLEE) corpus - abstracts of publications on angiogenesis, annotated with entity mentions and events across multiple levels of biological organization from the molecular to the organ system level. Over 8,000 entities with fine-grained types and over 6,000 structured events are annotated.
- AnEM corpus - a domain- and species-independent resource, annotated with anatomical entity mentions using a fine-grained classification system. The corpus consists of 500 documents (over 90,000 words) selected randomly from citation abstracts and full-text papers with the aim of making the corpus representative of the entire available biomedical scientific literature. The corpus annotation covers mentions of both healthy and pathological anatomical entities and contains over 3,000 annotated mentions.
- Extended Anatomical Entity Mention (AnatEM) corpus - 1212 documents (approx. 250,000 words) annotated with over 13,000 mentions of anatomical entities. Each annotation is assigned one of 12 granularity-based types such as Cellular component, Tissue and Organ, defined with reference to the Common Anatomy Reference Ontology. The corpus builds in part on the AnEM and MLEE corpora.
Featured News
- 24-month postdoctoral research position in Athens, Greece
- PhD opportunity in collaboration with Athens Univ. of Economics and Business
- iCASE EPSRC funded PhD- multimodal NLP - UoM & BAE - Application deadline 30th April 2024
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- CFP: BIONLP 2024 and Shared Tasks @ ACL 2024
- Advances in Data Science and Artificial Intelligence Conference 2024
Other News & Events
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- New review article on emotion detection for misinformation
- BioNLP 2024 accepted as workshop at ACL 2024
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
- Chinese Government AwardAward for PhD student Tianlin Zhang