Dictionary-based Approaches for Biomedical Term Recognition
Yoshimasa Tsuruoka
Dictionary-based technical term recognition is the first step for
practical information extraction from biomedical documents because it
provides ID information of recognized terms unlike machine learning
based approaches. However, dictionary based approaches have two
serious problems: (1) a large number of false recognitions mainly
caused by short names. (2) low recall due to spelling variation. In
this talk, we address the former problem by filtering out false positives
using a machine learning technique. We alleviate the latter problem
by using an approximate string searching method.
This talk also presents an algorithm to generate possible variants for
biomedical terms, which is potentially useful
for query and dictionary expansions.
Experimental results using the MEDLINE corpus indicate that our methods
will significantly improve the precision and recall of dictionary-based
term recognition.