Protein Name Recognition by Approximate String Matching Algorithm
Introduction
One can extract the informaiton of protein-protein interaction from
MEDLINE abstracts. They might be noisy but the cost to obtain the
information is considerably lower than that by biochemical experiments.
The nice thing is that you can obtain the information of protein ids,
which are described in the dictionaries.
Protein Name Recognition
Approximate String Matching
An excellent overview is provided by Navarro.
Protein names are extraced from exsisting databases.
PIR-NREF
is a non-redundant protein database, which includes the data from
PIR-PSD,
SWISS-PROT,
TrEMBL,
RefSeq,
GenPept,
PDB.
They are noisy.
Elastic matching is needed.
Not-Adhoc.
Edit distance.
DP-maching
Elastic Matching Algorithm
Edit distance
Cost Function
1
[space]-[hyphen]
[numeral]-[numeral]
TODO
to tune the paramters automatically.
speedup