Introduction

FACTA+ is an advanced text mining tool that helps you discover associations between biomedical concepts from MEDLINE articles. The whole MEDLINE corpus containing more than 20 million articles is indexed with an efficient text search engine, and it allows you to navigate such associations and their textual evidence in a highly interactive manner — the system accepts arbitrary query terms and displays relevant concepts immediately. A broad range of important biomedical concepts are covered by the combination of a machine learning-based term recognizer and large-scale dictionaries for genes, proteins, diseases, and chemical compounds.

Quick Start Guide

1. Starting a search

Type a query in the text box and press the "Find Relevant Concepts" button. A query can be an arbitrary word (e.g. "p53"), or a concept ID (e.g. "UNIPROT:P04637"). The types of concept IDs that are accepted by the system are given in Appendix A. Queries can include AND or OR as boolean operators. A query that contains several different operators needs round brackets to resolve ambiguities [e.g. "(UNIPROT:P04637 AND (lung OR gastric))"]. The AND operator can be omitted (e.g. "p53 blood" and "p53 AND blood" are equivalent).



2. Browsing relevant concepts

FACTA+ then retrieves the documents that match the query from MEDLINE. The concepts mentioned in the retrieved documents are counted and ranked according to their relevance to the query. The results are presented in a tabular format. You can choose a different ranking scheme by clicking a link just above the table. Currently, FACTA+ supports the following three ranking criteria: frequency, pointwise mutual informaition, symmetric conditional probability.


3. Viewing snippets

By clicking a document icon next to a concept name in the table, you can then view snippets that describe the association.



Discovering Indirect Associations

Another piece of functionality of FACTA+ is to help you find concepts that are indirectly associated with the query. The idea is to combine multiple direct associations to retrieve concepts whose associations with the query are not obvious. (see the figure below).

All you need to do is: (1) specify the category of target concepts (the ones that you are are interested in as the final outcome), (2) specify the category of pivot concepts (the ones that interlink your query and the target concepts), and (3) press the "Find Indirectly Associated Concepts" button.



FACTA+ then produces a ranked list of target concepts together with the informaton about how they are associated with your query. The target concepts are ranked based on their "expected information" (the first column in the table), where the "information" (the second column) represents how surprising the association is in terms of direct co-occurrence statistics. The values shown in the fourth and sixth colums are the strengths of associations between the target and pivot concepts, and the pivot concepts and the query, respectively. You can also see textual evidence (snippets) of those associations by clicking them.



Appendix A: Concept IDs

This table shows the types of concepts IDs that can be used in a query.

Concept TypeFormatExample of concept ID
Human Gene/Protein UNIPROT:*UNIPROT:P04637
Disease/Symptom UMLS:*UMLS:C0011849
Drug KEGG:*KEGG:D02118
Enzyme EC:*.*.*.*EC:3.4.23.15
Chemical compound CAS:*-*-*CAS:7782-44-7
Biomolecular events GENIA:* (see below for the full list) GENIA:Gene_expression

The table below shows the full list of concept IDs for the biomolecular events that are currently recognized by FACTA+. These are the event types defined in the BioNLP'09 shared task.

Concept IDDescription
GENIA:Gene_expressionBiomolecular events related to protein production and breakdown
GENIA:Transcription
GENIA:Protein_catabolism
GENIA:LocalizationA change of the location or presence of a protein
GENIA:BindingBinding of two or more proteins (including homodimerization); binding of a protein and DNA
GENIA:PhosphorylationAddition of a phosphate group to a protein or other organic molecule
GENIA:RegulationRegulatory or causal relations between events and proteins
GENIA:Positive_regulation
GENIA:Negative_regulation

Publications

  • Yoshimasa Tsuruoka, Jun'ichi Tsujii, and Sophia Ananiadou. 2008. FACTA: a text search engine for finding associated biomedical concepts, Bioinformatics, Vol. 24, No. 21, pp. 2559-2560.
  • Yoshimasa Tsuruoka, Makoto Miwa, Kaisei Hamamoto, Jun'ichi Tsujii, and Sophia Ananiadou. 2011. Discovering and visualizing indirect associations between biomedical concepts, Bioinformatics, Vol. 27 ISMB 2011, pp. i111-i119.