Report on
Bio-Entity Recognition Task at BioNLP/NLPBA 2004

This is the report on the shared task of biomedical entity recognition that was held from March to April 2004. The results were presented in the joint workshop of BioNLP/NLPBA 2004.

Task Definition

The task aims to identify and classify technical terms in the domain of molecular biology that correspond to instances of concepts that are of interest to biologists.

Data

The training data used in the task came from the GENIA version 3.02 corpus, This was formed from a controlled search on MEDLINE using the MeSH terms 'human', 'blood cells' and 'transcription factors'. From this search, 2,000 abstracts were selected and hand annotated according to a small taxonomy of 48 classes based on a chemical classification. Among the classes, 36 terminal classes were used to annotate the GENIA corpus. For the shared task we decided however to simplify the 36 classes and used only the classes protein, DNA, RNA, cell line and cell type. The first three incorporate several subclasses from the original taxonomy while the last two are interesting in order to make the task realistic for post-processing by a potential template filling application. The publication year of the training set ranges over 1990~1999.

For testing purposes we used a new annotated collection of MEDLINE abstracts from the GENIA project. 404 abstracts were used that were annotated for the same classes of entities. Most parts of the test set include abstracts retrieved with the same set of MeSH terms, and their publication year ranges over 1978~2001. To see the effect of publication year, the test set was roughly divided into four subsets: 1978-1989 set (which represents an old age from the viewpoint of the models that will be trained using the training set), 1990-1999 set (which represents the same age as the training set), 2000-2001 set (which represents a new age compared to the training set) and S/1998-2001 set (which represents roughly a new age in a super domain). The last subset represents a super domain and the abstracts was retrieved with MeSH terms, `blood cells' and `transcription factors' (without `human'). (The S/1998-2001 set includes the whole 2000-2001 set.) The following table shows the size of the data sets

 

# of abstracts

# of sentences

# of tokens

Training Set

2,000

20,546 (10.27/abs)

472,006 (236.00/abs) (22.97/sen)

Test Set

Total

404

4,260 (10.54/abs)

96,780 (239.55/abs) (22.72/sen)

1978-1989

104

991 ( 9.53/abs)

22,320 (214.62/abs) (22.52/sen)

1990-1999

106

1,115 (10.52/abs)

25,080 (236.60/abs) (22.49/sen)

2000-2001

130

1,452 (11.17/abs)

33,380 (256.77/abs) (22.99/sen)

S/1998-2001

204

2,254 (11.05/abs)

51,628 (253.08/abs) (22.91/sen)

Evaluation

To simplify the annotation task to a simple linear sequential analysis problem, embedded structures have been removed leaving only the outermost structures (i.e. the longest tag sequence). Consequently, a group of coordinated entities involving ellipsis are annotated as one structure like in the following example:

... in [lymphocytes] and [T- and B- lymphocyte] count in ...

In the example, "T- and B-lymphocyte" is annotated as one structure but involves two entity names, "T-lymphocyte" and "B-lymphocyte", whereas "lymphocytes" is annotated as one and involves as many entity names.

Results are given as F-scores using a modifies version of the CoNLL evaluation script and are defined as F=(2PR)/(P+R), where P denotes Precision and R Recall. P is the ratio of the number of correctly found NE chunks to the number of found NE chunks, and R is the ratio of the number of correctly found NE chunks to the number of true NE chunks. The script outputs three sets of F-scores according to exact boundary match, right and left boundary matching. In the right boundary matching only right boundaries of entities are considered without matching left boundaries and vice versa.

Performances

The following table lists entity recognition performance of each participating system on each test set.

 

 1978-1989 set

 1990-1999 set

 2000-2001 set

 S/1998-2001 set

 Total

[Zho04]

75.3 / 69.5 / 72.3

77.1 / 69.2 / 72.9

75.6 / 71.3 / 73.8

75.8 / 69.5 / 72.5

76.0 / 69.4 / 72.6

[Fin04]

66.9 / 70.4 / 68.6

73.8 / 69.4 / 71.5

72.6 / 69.3 / 70.9

71.8 / 67.5 / 69.6

71.6 / 68.6 / 70.1

[Set04]

63.6 / 71.4 / 67.3

72.2 / 68.7 / 70.4

71.3 / 69.6 / 70.5

71.3 / 68.8 / 70.1

70.3 / 69.3 / 69.8

[Son04]

60.3 / 66.2 / 63.1

71.2 / 65.6 / 68.2

69.5 / 65.8 / 67.6

68.3 / 64.0 / 66.1

67.8 / 64.8 / 66.3

[Zha04]

63.2 / 60.4 / 61.8

72.5 / 62.6 / 67.2

69.1 / 60.2 / 64.7

69.2 / 60.3 / 64.4

69.1 / 61.0 / 64.8

[Rös04]

59.2 / 60.3 / 59.8

70.3 / 61.8 / 65.8

68.4 / 61.5 / 64.8

68.3 / 60.4 / 64.1

67.4 / 61.0 / 64.0

[Par04]

62.8 / 55.9 / 59.2

70.3 / 61.4 / 65.6

65.1 / 60.4 / 62.7

65.9 / 59.7 / 62.7

66.5 / 59.8 / 63.0

[Lee04]

42.5 / 42.0 / 42.2

52.5 / 49.1 / 50.8

53.8 / 50.9 / 52.3

52.3 / 48.1 / 50.1

50.8 / 47.6 / 49.1

BL

47.1 / 33.9 / 39.4

56.8 / 45.5 / 50.5

51.7 / 46.3 / 48.8

52.6 / 46.0 / 49.1

52.6 / 43.6 / 47.7

The baseline model (BL) utilizes lists of entities of each class collected from the training set, and performs longest match search for entities through the test set. Frequency of each entity with each class is referred to break ties.

Downloads

Introductory Paper
The paper presented for JNLPBA workshop which gives introduction to the task and overall analysis of participating systems.
Training Data
2,000 Medline abstracts with term annotation
Evaluation Data
404 Medline abstracts. One with term annotation and one without. Evaluation tool is also included in it.
Tagging Results
Tagging results by participating systems. Note that the original submissions has been cleaned (to remove illegal sequence of tags) and normalized (to include Medline UID).
Evaluation Tool
Updated evaluation tool. Use this tool to get the evaluation equivalent to that of the shared task.

References

[Zho04] GuoDong Zhou and Jian Su, "Exploring Deep Knowledge Resources in Biomedical Name Recognition", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).

[Fin04] Jenny Finkel, Shipra Dingare, Huy Nguyen, Malvina Nissim, Gail Sinclair and Christopher Manning, "Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).

[Set04] Burr Settles, "Biomedical Named Entity Recognition Using Conditional Random Fields and Novel Feature Sets", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).

[Son04] Yu Song, Eunju Kim, Gary Geunbae Lee and Byoung-kee Yi, "POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).

[Zha04] Shaojun Zhao, "Name Entity Recognition in Biomedical Text using a HMM model", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).

[Ros04] Marc Rossler, "Adapting an NER-System for German to the Biomedical Domain", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).

[Par04] Kyung-Mi Park, Seon-Ho Kim, Do-Gil Lee and Hae-Chang Rim. "Boosting Lexical Knowledge for Biomedical Named Entity Recognition", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).

[Lee04] Chih Lee, Wen-Juan Hou and Hsin-Hsi Chen, "Annotating Multiple Types of Biomedical Entities: A Single Word Classificication Approach", in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-2004).


• last modification made on 14 October 2004 by Jin-Dong Kim.
• workshop homepage :
http://www.genisis.ch/~natlang/JNLPBA04/
• shared task homepage :
http://research.nii.ac.jp/~collier/workshops/JNLPBA04st.htm