New paper and resources to support anatomical entity recognition at literature scale


We are pleased to announce that a new paper has been published, presenting a number of new tools and resources supporting the recognition of anatomical entities:

  • AnatomyTagger, a new machine learning-based system for anatomical entity mention recognition that has been applied to annotate to automatically annotate the entire Open Access scientific domain literature.
  • AnatEM: a corpus of 1200 documents manually annotated for 13,700 anatomical entity mentions
  • Results of tagging all of the 600,000 PMC OA full-text documents, identifying 48M anatomical entity mentions
These and other resources are available at the AnatomyTagger homepage


Sampo Pyysalo and Sophia Ananiadou (2013). Anatomical Entity Mention Recognition at Literature Scale. Bioinformatics.



Anatomical entities ranging from sub-cellular structures to organ systems are central to biomedical science, and mentions of these entities are essential to understanding the scientific literature. Despite extensive efforts to automatically analyse various aspects of biomedical text, there have been only few studies focusing on anatomical entities, and no dedicated methods for learning to automatically recognize anatomical entity mentions in free-form text have been introduced.


We present AnatomyTagger, a machine learning-based system for anatomical entity mention recognition. The system incorporates a broad array of approaches proposed to benefit tagging, including the use of UMLS- and OBO-based lexical resources, word representations induced from unlabelled text, statistical truecasing, and non-local features. We train and evaluate the system on a newly introduced corpus that substantially extends on previously available resources, and apply the resulting tagger to automatically annotate the entire Open Access scientific domain literature. The resulting analyses have been applied to extend services provided by the Europe PMC literature database.


All tools and resources introduced in this work are available from

Previous itemNext item
Back to news summary page