PathText/Refine Project

Many systems have been developed in the past few years to assist researchers in the discovery of knowledge published as English text, for example in the PubMed database. At the same time, higher level collective knowledge is often published using a graphical notation representing all the entities in a pathway and their interactions. We believe that these pathway visualizations could serve as an effective user interface for knowledge discovery if they can be linked to the text in publications. Since the graphical elements in a Pathway are of a very different nature to their corresponding descriptions in English text, we have developed PathText to serve as a bridge between these two systems.

The PathText Project has the following goals:

to adapt and develop text mining tools to evaluate the basis in the literature for the structure of biochemical and signalling models in systems biology
to integrate text mining techniques with visualisation technologies for better understanding of the evidence for biochemical and signalling pathways
to enrich models encoded in the Systems Biology Markup Language (SBML) with information derived from text mining
to evaluate in conjunction with domain experts the effectiveness of using models of pathways augmented with evidence derived from text mining

The current PathText prototype uses the Payao Web 2.0 community tagging system for biological networks (see acknowledgements _b,c,d below) as a web based user interface to let the user quickly find related text snippets and articles related to the different parts of the Pathway.

The text mining is carried out by the MEDIE, Facta and KLEIO systems, which rely on many databases and other enabling technologies like the Systems Biology Markup Language (SBML), Systems Biology Graphical Notation (SBGN) (see acknowledgement _a below) and the CellDesigner program which is used to show graphical models.

Relevant software

GUI

Text Mining

Relevant web sites

PathText Demonstration Video

The PathText project is planned for release in the near future, but a public demonstration is not yet on line. A video demonstration is available below which highlights the key functionality and benefits of the project.

Project Team

Principal Investigator: Sophia Ananiadou
Co-investigators: Jun'ichi Tsujii, Pedro Mendes
Researchers: Yoshimasa Tsuruoka, S. Amir Iqbal, Brian Kemper
Past researchers: Mashhuda Glencross, Duncan Hull

Funding

The PathText/Refine project is funded by the Biotechnology and Biological Sciences Research Council (BBSRC): grant code BB/E004431/1

Publications

Sophia Ananiadou, Sampo Pyysalo, Jun'ichi Tsujii and Douglas B. Kell (2010). Event extraction for systems biology by text mining the literature. Trends in Biotechnology, 28(7), 381-390.
Brian Kemper, Takuya Matsuzaki, Yukiko Matsuoka, Yoshimasa Tsuruoka, Sophia Ananiadou, Jun'ichi Tsujii and Hiroaki Kitano. (2010). PathText: A Text Mining Integrator for Biological Pathway Visualizations. Bioinformatics, 26(12), i374-i381.
Naoaki Okazaki, Sophia Ananiadou, and Jun'ichi Tsujii. (2010). Building a High Quality Sense Inventory for Improved Abbreviation Disambiguation. Bioinformatics, 26(9), 1246-1253.
Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2009. Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty, In Proceedings of ACL-IJCNLP, pp. 477-485.
Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2009. Fast Full Parsing by Linear-Chain Conditional Random Fields. In Proceedings of EACL, pp. 790-798
Duncan Hull, Steve Pettifer and Douglas B. Kell. 2008. Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Computational Biology, 4(10):e1000204+. DOI:10.1371/journal.pcbi.1000204, pmid:18974831
Duncan Hull. 2008. GO faster ChEBI with Reasonable Biochemistry, Proceedings of OWL: Experiences and Directions (OWLED 2008) Fifth International Workshop, Karlsruhe, Germany, October 26-27, 2008. Available from Nature Precedings DOI:10101/npre.2008.2329.1
Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2008. Accelerating the annotation of sparse named entities by dynamic sentence selection, BMC Bioinformatics, 9(Suppl 11):S8. DOI:10.1186/1471-2105-9-S11-S8, pmid:19025694
Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2008. FACTA: a text search engine for finding associated biomedical concepts, Bioinformatics, Vol. 24, No. 21, pp. 2559-2560. pmid:18772154, DOI:10.1093/bioinformatics/btn469.
Sophia Ananiadou, Douglas B. Kell and Jun'ichi Tsujii. 2006. Text Mining and its Potential Applications in Systems Biology Trends in Biotechnology Volume 24, Issue 12, Pages 571-579 DOI:10.1016/j.tibtech.2006.10.002, pmid:17045684

Other references

Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou. 2008. Normalizing biomedical terms by minimizing ambiguity and variability, BMC Bioinformatics 2008, 9(Suppl 3):S2. pmid:18426547, DOI:10.1186/1471-2105-9-S3-S2
Yutaka Sasaki, Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou. 2008. How to make the most of NE dictionaries in statistical NER, BMC Bioinformatics, 9(Suppl 11):S5. pmid:19025691, DOI:10.1186/1471-2105-9-S11-S5
Yoshimasa Tsuruoka, John McNaught, Jun'ichi Tsujii and Sophia Ananiadou. 2007 Learning string similarity measures for gene/protein name dictionary look-up using logistic regression Bioinformatics 23(20):2768-2774; DOI:10.1093/bioinformatics/btm393,pmid:17698493
Chikashi Nobata, Philip Cotter, Naoaki Okazaki, Brian Rea, Yutaka Sasaki, Yoshimasa Tsuruoka, Jun-ichi Tsujii, and Sophia Ananiadou (2008) Kleio: a knowledge-enriched information retrieval system for biology, Proceedings of SIGIR, pp. 787-788
Sophia Ananiadou (2008) Selected bibliography: Text-mining for Biomedicine

Acknowledgements

Rune Saetre_a, Brian Kemper_a, Yoshimasa Tsuruoka_a, Kanae Oda_a, Naoaki Okazaki_a, Yukiko Matsuoka_b, Norihiro Kikuchi_c, Hiroaki Kitano_b,d, Sophia Ananiadou_e, Junichi Tsujii_a,e

_aComputer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 Japan
_bThe Systems Biology Institute, 6-31-15 Jingumae M31 6A, Shibuya-ku, Tokyo 150-0001 Japan
_cMitsui Knowledge Industry Co., Ltd., 2-7-14 Higashinakano, Nakano-Ku, Tokyo 164-8555 Japan
_dOkinawa Institute of Science and Technology, 7542 Onna, Onna-Son, Kunigami, Okinawa 904-0411 Japan
_eNational Centre for Text Mining (NaCTeM) 131 Princess Street, Manchester M1 7DN United Kingdom

Featured News

Other News & Events

Other News Feed