The Alchemy of Annotation: When Biologists Disagree
Ewan Klein
(School of Informatics, University of Edinburgh)
Many current approaches to text processing for the biomedical
literature involve some kind of manual annotation of text. This is
necessary for those techniques that involve supervised machine
learning. And even approaches which manage to do without training data
typically require manually annotated `Gold standard' test data.
The quality of the manually annotated data is a crucial factor in
the quality of the models learned from training data and for the
accuracy of measuring systems against test data, and this quality is
usually assessed in terms of Inter-annotater Agreement (IAA). Within
the research literature on biomedical text mining, IAA for named
entities is frequently reported and is usually fairly good (though
less high than for Newswire). By contrast, IAA for relation extraction
is less frequently reported, and when it is reported, tends to be
considerably lower than for named entities.
This paper will report on the IAA obtained from an exercise in which
four biologists annotated 750 abstracts and 150 full-text papers for
protein-protein interactions. A sample of 5% of these documents were
doubly-annotated. In addition, 16 documents were marked up by all four
annotators in an initial training phase. We present an analysis of
where the annotators disagreed, both on entities and on relations
between those entities. Although a variety of factors conspire to
lower agreement for relations, we will show that most of them involve
an interplay between ontological judgements by the biologist,
linguistic characteristics of the text, the clarity of the annotation
guidelines, and ergonomic aspects of the annotation task. Although
annotating relations is intrinsically hard, we believe that an
analysis of this sort can point to ways of improving IAA .