GENIA Corpus with meta-knowledge annotation

The Meta-knowledge_GENIA_corpus directory contains a version of the entire GENIA event corpus, which has been enriched with meta-knowledge annotation. A more detailed description of this annotation, together with access to the annotation guidelines, is available here.

The use of the corpus is subject to the terms and conditions of the licences provided in the LICENCES directory.

If you use the corpus, please cite the following paper:

Thompson, P., Nawaz, R., McNaught, J. and Ananiadou, S. (2011). Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics, 12:393.

The Meta-knowledge_GENIA_corpus directory contains 3 subdirectories:

The XML annotation of the corpus follows GENIA event annotation format, with additions to allow meta-knowledge to be encoded.

Two levels of annotation of the target text are expressed within each file, i.e.

An example of an annotated sentence within the XML file is shown below:


<sentence id="S9">Nuclear transcription studies in vitro showed that 
	<term id="T28" lex="LTB4" sem="Organic_compound_other">LTB4</term> 
	increased the transcription of the 
	<term id="T29" lex="c-fos_gene" sem="DNA_domain_or_region">c-fos gene</term> 
	7-fold and the 
	<term id="T30" lex="c-jun_gene" sem="DNA_domain_or_region">c-jun gene</term> 
	1.4-fold.
</sentence>
<event KT="Analysis" Manner="High" id="E30">
	<type class="Positive_regulation"/>
	<theme idref="E32"/>
	<cause idref="T28"/>
	<clue><clueExperiment>Nuclear transcription studies in vitro</clueExperiment>
	<clueKT>showed</clueKT> that LTB4 <clueType>increased</clueType> the transcription of the 
	c-fos gene <clueManner>7-fold</clueManner> and the c-jun gene 1.4-fold.</clue>
</event>
<event KT="Analysis" Manner="Low" id="E31">
	<type class="Positive_regulation"/>
	<theme idref="E33"/>
	<cause idref="T28"/>
	<clue><clueExperiment>Nuclear transcription studies in vitro</clueExperiment> 
	<clueKT>showed</clueKT> that LTB4 <clueType>increased</clueType> the transcription of the 
	c-fos gene 7-fold and the c-jun gene <clueManner>1.4-fold</clueManner>.</clue>
</event>
<event KT="Other" id="E32">
	<type class="Transcription"/>
	<theme idref="T29"/>
	<clue>Nuclear transcription studies in vitro showed that LTB4 increased the 	
	<clueType>transcription</clueType> <linkTheme>of</linkTheme> the c-fos gene 7-fold and 
	the c-jun gene 1.4-fold.</clue>
</event>
<event KT="Other" id="E33">
	<type class="Transcription"/>
	<theme idref="T30"/>
	<clue>Nuclear transcription studies in vitro showed that LTB4 increased the 
	<clueType>transcription</clueType> <linkTheme>of</linkTheme> the c-fos gene 7-fold 
	and the c-jun gene 1.4-fold.</clue>
</event>

Below, we provide below a brief description of the above XML representation, in terms of the orginal GENIA annition, and the information added to represent meta-knowledge.

Original GENIA annotation

Each sentence of the abstract is contained within a <sentence> element. Biological concepts are annotated inline, indicated by <term> elements. Each <term> element has the following attributes:

Following the <sentence> element, the events in the sentence are listed, each within an <event> element. Each event has a unique id, starting with an "E". Within the <event> element, there are the following elements:

Meta-knowledge annotation

Meta-knowledge annotation is encoded in two places in the XML: