GENIA corpus - Linguistic and Semantic Annotation of Biomedical Literature
Jin-Dong Kim
(GENIA, University of Tokyo)
The GENIA corpus is a collection of text documents which are abstracts
of journal articles on molecular biology. The corpus has been
annotated for a wide spectrum of information represented in the text.
This has been done from two perspectives. First, biological knowledge
pieces delivered by the text have been annotated, covering biological
entities and events. Second, linguistic structures underlying the text
have been annotated. This type of annotation includes part-of-speech
of words and syntactic structure of sentences. It is expected that by
approaching from the two perspectives, linguistic structures encoding
knowledge pieces could be figured out. In this presentation, the GENIA
corpus is introduced with a primary focus on semantic annotation.