New homepage for the GENIA project and biomedical annotated corpora


We are pleased to announce a new website for the GENIA project:

The GENIA project has been running since 1998, and the new website contains information about the following:

  • The GENIA corpus - the primary resource created by the GENIA project. The corpus is intended to support the development and evaluation of information extraction and text mining systems for the domain of molecular biology. It consists of 1,999 MEDLINE abstracts, which have been annotated with various levels of linguistic and semantic information, i.e. parts-of-speech, syntax, terms, events, relations and coreference. The corpus can be downloaded from the website.
  • Shared tasks - The GENIA project initiated the BioNLP Shared Task series and has organised a number of tasks in 3 different shared task events, i.e. the BioNLP/JNLPBA Shared Task 2004, and the BioNLP Shared Tasks of 2009 and 2011.
  • Other GENIA project corpora - A number of additional corpora have been annotated using extensions of the GENIA/BioNLP Shared Task event representation. These consist of event corpora of protein post-translational modifications (PTM), Type IV secretion systems, DNA methylation, mTOR pathways and "Exhaustive PTM".
  • Efforts that are related to the GENIA project. These include the meta-knowledge corpus - an extension of the GENIA event corpus which adds annotation about how events are to be interpreted according to their textual context.

Information about tools developed to perform automatic annotation, through training on the GENIA corpus, will be added to the site shortly.

