PathText/Refine Project

Many systems have been developed in the past few years to assist researchers in the discovery of knowledge published as English text, for example in the PubMed database. At the same time, higher level collective knowledge is often published using a graphical notation representing all the entities in a pathway and their interactions. We believe that these pathway visualizations could serve as an effective user interface for knowledge discovery if they can be linked to the text in publications. Since the graphical elements in a Pathway are of a very different nature to their corresponding descriptions in English text, we have developed PathText to serve as a bridge between these two systems.

The PathText Project has the following goals:

  • to adapt and develop text mining tools to evaluate the basis in the literature for the structure of biochemical and signalling models in systems biology
  • to integrate text mining techniques with visualisation technologies for better understanding of the evidence for biochemical and signalling pathways
  • to enrich models encoded in the Systems Biology Markup Language (SBML) with information derived from text mining
  • to evaluate in conjunction with domain experts the effectiveness of using models of pathways augmented with evidence derived from text mining

The current PathText prototype uses the Payao Web 2.0 community tagging system for biological networks (see acknowledgements b,c,d below) as a web based user interface to let the user quickly find related text snippets and articles related to the different parts of the Pathway.

The text mining is carried out by the MEDIE, Facta and KLEIO systems, which rely on many databases and other enabling technologies like the Systems Biology Markup Language (SBML), Systems Biology Graphical Notation (SBGN) (see acknowledgement a below) and the CellDesigner program which is used to show graphical models.

Relevant software

GUI Text Mining

Relevant web sites

PathText Demonstration Video

The PathText project is planned for release in the near future, but a public demonstration is not yet on line. A video demonstration is available below which highlights the key functionality and benefits of the project.

Project Team

Principal Investigator: Sophia Ananiadou
Co-investigators: Jun'ichi Tsujii, Pedro Mendes
Researchers: Yoshimasa Tsuruoka, S. Amir Iqbal, Brian Kemper
Past researchers: Mashhuda Glencross, Duncan Hull


The PathText/Refine project is funded by the Biotechnology and Biological Sciences Research Council (BBSRC): grant code BB/E004431/1


Other references


Rune Saetrea, Brian Kempera, Yoshimasa Tsuruokaa, Kanae Odaa, Naoaki Okazakia, Yukiko Matsuokab, Norihiro Kikuchic, Hiroaki Kitanob,d, Sophia Ananiadoue, Junichi Tsujiia,e

aComputer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 Japan
bThe Systems Biology Institute, 6-31-15 Jingumae M31 6A, Shibuya-ku, Tokyo 150-0001 Japan
cMitsui Knowledge Industry Co., Ltd., 2-7-14 Higashinakano, Nakano-Ku, Tokyo 164-8555 Japan
dOkinawa Institute of Science and Technology, 7542 Onna, Onna-Son, Kunigami, Okinawa 904-0411 Japan
eNational Centre for Text Mining (NaCTeM) 131 Princess Street, Manchester M1 7DN United Kingdom