Many systems have been developed in the past few years to assist researchers in the discovery of knowledge published as English text, for example in the PubMed database. At the same time, higher level collective knowledge is often published using a graphical notation representing all the entities in a pathway and their interactions. We believe that these pathway visualizations could serve as an effective user interface for knowledge discovery if they can be linked to the text in publications. Since the graphical elements in a Pathway are of a very different nature to their corresponding descriptions in English text, we have developed PathText to serve as a bridge between these two systems.
The PathText Project has the following goals:
- to adapt and develop text mining tools to evaluate the basis in the literature for the structure of biochemical and signalling models in systems biology
- to integrate text mining techniques with visualisation technologies for better understanding of the evidence for biochemical and signalling pathways
- to enrich models encoded in the Systems Biology Markup Language (SBML) with information derived from text mining
- to evaluate in conjunction with domain experts the effectiveness of using models of pathways augmented with evidence derived from text mining
The current PathText prototype uses the Payao Web 2.0 community tagging system for biological networks (see acknowledgements b,c,d below) as a web based user interface to let the user quickly find related text snippets and articles related to the different parts of the Pathway.
The text mining is carried out by the MEDIE, Facta and KLEIO systems, which rely on many databases and other enabling technologies like the Systems Biology Markup Language (SBML), Systems Biology Graphical Notation (SBGN) (see acknowledgement a below) and the CellDesigner program which is used to show graphical models.
- Text Mining
Relevant web sites
PathText Demonstration Video
The PathText project is planned for release in the near future, but a public demonstration is not yet on line. A video demonstration is available below which highlights the key functionality and benefits of the project.
Project TeamPrincipal Investigator: Sophia Ananiadou
Co-investigators: Jun'ichi Tsujii, Pedro Mendes
Researchers: Yoshimasa Tsuruoka, S. Amir Iqbal, Brian Kemper
Past researchers: Mashhuda Glencross, Duncan Hull
The PathText/Refine project is funded by the Biotechnology and Biological Sciences Research Council (BBSRC): grant code BB/E004431/1
- Sophia Ananiadou, Sampo Pyysalo, Jun'ichi Tsujii and Douglas B. Kell (2010). Event extraction for systems biology by text mining the literature. Trends in Biotechnology, 28(7), 381-390.
- Brian Kemper, Takuya Matsuzaki, Yukiko Matsuoka, Yoshimasa Tsuruoka, Sophia Ananiadou, Jun'ichi Tsujii and Hiroaki Kitano. (2010). PathText: A Text Mining Integrator for Biological Pathway Visualizations. Bioinformatics, 26(12), i374-i381.
- Naoaki Okazaki, Sophia Ananiadou, and Jun'ichi Tsujii. (2010). Building a High Quality Sense Inventory for Improved Abbreviation Disambiguation. Bioinformatics, 26(9), 1246-1253.
- Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2009. Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty, In Proceedings of ACL-IJCNLP, pp. 477-485.
- Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2009. Fast Full Parsing by Linear-Chain Conditional Random Fields. In Proceedings of EACL, pp. 790-798
- Duncan Hull, Steve Pettifer and Douglas B. Kell. 2008. Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Computational Biology, 4(10):e1000204+. DOI:10.1371/journal.pcbi.1000204, pmid:18974831
- Duncan Hull. 2008. GO faster ChEBI with Reasonable Biochemistry, Proceedings of OWL: Experiences and Directions (OWLED 2008) Fifth International Workshop, Karlsruhe, Germany, October 26-27, 2008. Available from Nature Precedings DOI:10101/npre.2008.2329.1
- Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2008. Accelerating the annotation of sparse named entities by dynamic sentence selection, BMC Bioinformatics, 9(Suppl 11):S8. DOI:10.1186/1471-2105-9-S11-S8, pmid:19025694
- Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2008. FACTA: a text search engine for finding associated biomedical concepts, Bioinformatics, Vol. 24, No. 21, pp. 2559-2560. pmid:18772154, DOI:10.1093/bioinformatics/btn469.
- Sophia Ananiadou, Douglas B. Kell and Jun'ichi Tsujii. 2006. Text Mining and its Potential Applications in Systems Biology Trends in Biotechnology Volume 24, Issue 12, Pages 571-579 DOI:10.1016/j.tibtech.2006.10.002, pmid:17045684
- Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou. 2008. Normalizing biomedical terms by minimizing ambiguity and variability, BMC Bioinformatics 2008, 9(Suppl 3):S2. pmid:18426547, DOI:10.1186/1471-2105-9-S3-S2
- Yutaka Sasaki, Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou. 2008. How to make the most of NE dictionaries in statistical NER, BMC Bioinformatics, 9(Suppl 11):S5. pmid:19025691, DOI:10.1186/1471-2105-9-S11-S5
- Yoshimasa Tsuruoka, John McNaught, Jun'ichi Tsujii and Sophia Ananiadou. 2007 Learning string similarity measures for gene/protein name dictionary look-up using logistic regression Bioinformatics 23(20):2768-2774; DOI:10.1093/bioinformatics/btm393,pmid:17698493
- Chikashi Nobata, Philip Cotter, Naoaki Okazaki, Brian Rea, Yutaka Sasaki, Yoshimasa Tsuruoka, Jun-ichi Tsujii, and Sophia Ananiadou (2008) Kleio: a knowledge-enriched information retrieval system for biology, Proceedings of SIGIR, pp. 787-788
- Sophia Ananiadou (2008) Selected bibliography: Text-mining for Biomedicine
AcknowledgementsRune Saetrea, Brian Kempera, Yoshimasa Tsuruokaa, Kanae Odaa, Naoaki Okazakia, Yukiko Matsuokab, Norihiro Kikuchic, Hiroaki Kitanob,d, Sophia Ananiadoue, Junichi Tsujiia,e
aComputer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 Japan
bThe Systems Biology Institute, 6-31-15 Jingumae M31 6A, Shibuya-ku, Tokyo 150-0001 Japan
cMitsui Knowledge Industry Co., Ltd., 2-7-14 Higashinakano, Nakano-Ku, Tokyo 164-8555 Japan
dOkinawa Institute of Science and Technology, 7542 Onna, Onna-Son, Kunigami, Okinawa 904-0411 Japan
eNational Centre for Text Mining (NaCTeM) 131 Princess Street, Manchester M1 7DN United Kingdom
- Invited Talk at ICUH 2014
- Keynote talk at NLDB 2014
- OSSMETER at ICT 2013
- Funding Success for NaCTeM
- BioNLP 2014
- Participation in event on copyright and the case of text and data mining at European Parliament
- New paper and resources to support anatomical entity recognition at literature scale
- COLING 2014