The extraction of various relations stated to hold between biomolecular entities is one of the most frequently addressed information extraction tasks in domain studies. Typical relation extraction targets involve protein-protein interactions or gene regulatory relations. However, in the GENIA corpus, such associations involving change in the state or properties of biomolecules are captured in the event annotation.
The GENIA corpus relation annotation aims to complement the event annotation of the corpus by capturing (primarily) static relations, relations such as part-of that hold between entities without (necessarily) involving change.
The most recent version of the GENIA Relation corpus was released as the REL task dataset of the BioNLP Shared Task 2011. This data is available in the standoff format introduced on the BioNLP ST'11 format page.
As part of the GENIA Relation annotation effort, we introduced a relation ontology that aims to provide a set of relations which define a detailed and broadly applicable set of relation types based on accepted domain standard concepts for use in corpus annotation and domain information extraction approaches. To ensure that the meaning of the relationships is explicit, the relations are specified in OWL (see download section). We integrate categories and relations from several domain ontologies including IAO, OBI, GO and the GENIA ontology for maximal compatibility. The basic relations between individuals are organized as displayed in the figure below, where R stands for reflexivity, S for symmetry, T for transitivity, Anti for anti-symmetry and AS for asymmetry.
The development of the GENIA relation ontology also resulted in two novel ontology design patterns that are particularily suited for applications in text mining where the exact referent of a term cannot always be reliably determined. We refer to the publication "Applying ontology design patterns" for more information on these aspects of the annotation.
The latest revision of the GENIA relation annotation is available as the BioNLP Shared Task 2011 REL task corpus. This data is split into visible training and test sets and a "blind" test set. (For evaluation on the test set, please see the task homepage.)
Tomoko Ohta: GENIA corpus relation annotation coordinator
See also GENIA Project acknowledgments page