Annotated Corpora

  • ACE Meta-knowledge – An enrichment of the English ACE 2005 corpus, relating to news, in which information pertaining to various aspects of event interpretation has been added.
  • Anatomy Corpora – A collection of corpora manually annotated with fine-grained, species-independent anatomical entities, to facilitate the development of text mining systems that can carry out detailed and comprehensive analyses of biomedical scientific text.
  • BioCause – A collection of 19 full text biomedical articles, in which previously added entity and event annotations have been enriched with causality annotations.
  • GENIA – A collection of 2000 biomedical abstracts, which various levels of syntactic and semantic annotations.
  • GENIA Meta-Knowledge – An enrichment of the GENIA event corpus, with various aspects of information pertaining to the interpretation of events
  • GREC – A collection of 240 MEDLINE abstracts, annotated with events pertaining to gene regulation.
  • HIMERA – A corpus of published historical medical documents manually annotated with semantic information relevant to the study of medical history and public health.
  • Metabolite and Enzyme Corpus – A corpus of Medline abstracts annotated by experts with metabolite and enzyme names.
  • PHAEDRA – A semantically annotated corpus for pharmacovigilence. The corpus includes five different levels of information, which allow detailed information about drug effects to be encoded.
  • PhenoCHF – A corpus consisting of biomedical articles and clinical records, annotated with phenotypic information related with congestive heart failure (CHF). Various levels of anonotation are included, i.e., entity mentions, their normalisation to concept IDs in the UMLS Metathesarus, and relations involving entity mentions.


  • Time-sensitive medical inventory – A collection of terms relevant to the study of medical history, each linked to other semantically-related terms
  • Biodiversity inventory – A collection of terms relevant to the study of biodiversity, each linked to other semantically-related terms. Each term is also linked to its URI, UUID and LSID indexed by Global Names.


A large-scale terminological resource to support text mining in the biomedical domain.

Anatomy Resources

A collection of tools and lexical resources, making use of anatomy domain ontologies available in the OBO Foundry collection of Open Biological and Biomedical Ontologies to facilitate anatomical entity mention detection and classification.