NaCTeM

PathText Pathway/Text Mining Bridge Project

Many systems have been developed in the past few years to assist researchers in the discovery of knowledge published as English text, for example in the PubMed database. At the same time, higher level collective knowledge is often published using a graphical notation representing all the entities in a pathway and their interactions. We believe that these pathway visualizations could serve as an effective user interface for knowledge discovery if they can be linked to the text in publications. Since the graphical elements in a Pathway are of a very different nature to their corresponding descriptions in English text, we have developed PathText to serve as a bridge between these two systems.

PathText is being developed to connect Natural Language Processing (NLP) technology to the graphs and diagrams that are so often used by biologists. The current prototype uses the Payao Web 2.0 community tagging system for biological networks(see acknowledgements b,c,d below) as a web based user interface to let the user quickly find related text snippets and articles related to the different parts of the Pathway.

The Text Mining is done by the MEDIE, Facta and KLEIO systems. MEDIE, Facta and KLEIO rely on many databases and other enabling technologies like the Systems Biology Markup Language (SBML) (Hucka and et al., 2003), Graphical Notation (SBGN) (Kitano et al., 2007a) and the CellDesigner program (Funahashi et al., 2007) which is used to show graphical models.



PathText Demonstration Video

The PathText project is planned for release in the near future, but a public demonstration is not yet on line. A video demonstration is available below which highlights the key functionality and benefits of the project.




Acknowledgements

Rune Saetrea, Brian Kempera, Yoshimasa Tsuruokaa, Kanae Odaa, Naoaki Okazakia, Yukiko Matsuokab, Norihiro Kikuchic, Hiroaki Kitanob,d, Sophia Ananiadoue, Junichi Tsujiia,e

aComputer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 Japan
bThe Systems Biology Institute, 6-31-15 Jingumae M31 6A, Shibuya-ku, Tokyo 150-0001 Japan
cMitsui Knowledge Industry Co., Ltd., 2-7-14 Higashinakano, Nakano-Ku, Tokyo 164-8555 Japan
dOkinawa Institute of Science and Technology, 7542 Onna, Onna-Son, Kunigami, Okinawa 904-0411 Japan
eNational Centre for Text Mining (NaCTeM) 131 Princess Street, Manchester M1 7DN United Kingdom