Information Extraction

If you have some software which you would like us to add to this list, or if you would like us to change or remove an entry, please contact us.

ABNER: a downloadable Java API. ABNER is an information extraction tool capable of identifying gene and protein name from text and sgml input.
Arrowsmith: a PubMed article search and subsequent term filtering tool.
BADGER: "...analyzes bracketed text and produces case frame instantiations according to application-specific domain guidelines."
BioRAT: "a downloadable Java API based PubMed article search and relationship extraction tool based on the GATE architecture (required)."
BITOLA: a MeSH based biomedical concept relationship discovery tool.
Chilibot: Created by the Dept. of Pharmacology at the University of Tennessee Health Science Centre, Chilibot is a web-based PubMed mining tool with identifies and extracts interactions between genes, proteins and other concepts of interest.
Dragon Resources: "a collection of tools for querying drug, gene, disease association, enzyme and pathway specific information extracted from PubMed documents Dragon Plant Biology Explorer (A. thaliana), Streptomyces Explorer, Metabolome Explorer (A. thaliana), Disease Explorer and the Transcription Factor (TF) specific TF Association Miner, TF Interaction Miner and TF Relation Extractor. Please note that authorisation is required to use the Transcription Factor Relation Extractor.
EDGAR: short for 'Extraction of Drugs, Genes and Relations', it is a natural language processing system that extracts information about drugs and genes relevant to cancer from the biomedical literature.
FACTS: a downloadable Perl based functional annotation tool based on sequence similarity and text inferred data.
G2D: an information resource for disease associated RefSeq genes.
GAPSCORE: a webservice/web based gene and protein name finder.
GeneScene: "an emerging database of navigable gene regulatory pathways, currently network information is limited to the ap1, p53 and 'Yeast' genes."
GENIES: PDF: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Friedman et al, Bioinformatics. 2001;17 Suppl 1:S74-82.
GNF SymAtlas: a searchable database of SymAtlas and GeneAtlas expression data.
GoMiner: a downloadable Java based search tool for identifying and clustering genes based on GO annotation hierarchies.
HAPI: an annotation tool which will accept tab-delimited array data and associate hierarchical keyword annotation taken from biomedical literature.
Harvester: a protein cross bioinformatic-database querying tool.
hypKNOWsys: hypKNOWsys aims at developing a Java-based workbench for knowledge discovery and knowledge management. Currently, hypKNOWsys has released two intermediate tools: DIAsDEM Workbench (text mining for semantic tagging) and WUMprep (Web mining pre-processing)
iHOP
InterWeaver: InterWeaver is a hybrid of more conventional sequence homology and literature mining techniques which is ideal for classifying unknown proteins.
iProLINK: "a searchable protein information resource, built on data extracted from PubMed, UniProt, Protein Information Resource (PIR) and GO."
KAT: a tool that expands the annotation information and MeSH description for queried SwissProt identifier(s) or those contained within the text of provided PubMed PMIDs.
KEA: Automatically extracts keyphrases from the full text of documents using rudimentary lexical processing and features assignment. Machine learning is used to generate a classifier that determines which candidates should be assigned as keyphrases.
KeX: a rule based gene and protein name finder which works from plain tezt or MEDLINE report formats.
KH Coder: For Content Analysis or Text Mining of Japanese language data. KH Coder builds text corpus and let you make concordance, search word/phrase/sentence/paragraph, get statistical feature of the data, etc.
LitLinker: a MEDLINE based concept relationship discovery tool.
MedBlast: a search tool for augmenting the sequence annotation information gathered from BLAST search results.
MedGene Database: an information resource for disease associated genes.
MedLEE: "a clinical data extraction, structuring and encoding for automated processes tool."
MedMiner: a bio-literature text mining tool with application to gene expression profiling.
microGENIE: "a sequence id annotation tool based on combining data from PubMed, UniGene and Swissprot."
Mutis Full Text Search Engine: Mutis is a Delphi port of the Lucene Search Engine. Provide a flexible API for index, catalog and search text-based information with great performance.
The ONDEX Suite: A framework for text mining, data integration and data analysis.
PASTA Project Website: A browsable database of information relating to the amino acids residues in the binding sites of some of the proteins recorded in the Protein Data Bank (PDB). The simple PASTAWeb interface provides protein lists for random exploration.
POSBIOTM: A text processing workbench specifically designed by Intelligent Software Lab for the analysis of biomedical documents. It currently comprises process management, named entity extraction (POSBIOTM/NER) and event extraction (POSBIOTM/Event Extraction) and annotation tools that are capable of identifying varied forms of biological entities from the analysed text and identifying interactions between them.
Suiseki: a graphically depicted database of extracted information on protein-protein interactions.
Telemakus: a concept relationship discovery tool specifically based on the science of aging.
Text Analysis Markup System: Text Analysis Markup System (TAMS) is both a system of marking documents for qualitative analysis and a series of tools for mining information based on that syntax.
Textpresso: An advanced retrieval and extraction engine for C. elegans resources that has been developed at Wormbase. As of October 2005 the corpus includes 6,500 full text papers and 20,400 abstracts. Entries are updated weekly.
TextToOnto: The aim of TextToOnto is to support developers in the ontology construction process by applying text mining techniques. For this purpose it builds on KAON (http://kaon.semanticweb.org)
txtkit: txtkit is a visual text mining tool for exploring large amounts of multilingual texts. It's an multiuser-application which mainly focuses on the process of reading and reasoning as series of decisions and events.
TMG: Text Mining for german language documents.
TMG: Text Mining for german language documents.
UNLOCK: A suite of web services to help make sense of spatial data. The text/places service extracts place names from text and provides best guesses as to their locations, according to textual context.
webExtractor: webExtractor is a Java application that is used for extracting specific content from web based HTML, XML, CSV, and free form text. The extracted data can be used for data gathering and mining purposes.
XplorMed: a MEDLINE or uploaded article biomedical concept and concept relationship identification tool.