Discovery support for life scientists


Marc Weeber
(Biosemantics Group, Dept. Medical Informatics,
Erasmus MC - Rotterdam University Medical Center & Knewco, Inc)

A scientist's core motivation is to understand why things work the way they work. Why does the apple fall from a tree? Why do people get ill? Why can a minor genetic variation have such enormous consequences? To fulfill this innate curiosity, researchers think of hypothetical explanations and devise experiments to test these hypotheses. In the past years, I have been involved in research on literature-based discovery to support the generation of likely and specific hypotheses by combining facts from the literature. In this presentation, I will describe the SemBLAST algorithm that semantically aligns biological concepts and I will provide an example how it has been used for drug discovery by retrospectively uncovering new target diseases and side effects of a drug. I will also discuss several uses of this algorithm in a genomics context.

To assist the life scientists in their discovery and knowledge management processes, KnewCo, Inc, was founded by researchers from the Biosemantics group in January 2006. KnewCo has started two key initiatives. First, a community of scientists is being built around the semantic enrichment of biomedical concepts, genes, and proteins in a Wikipedia manner. In this way, the speedup of annotation will be enormous (In 20 years, Swissprot has manually annotated about 1/5th of the proteins that are currently in in TrEMBL) and free of charge. Also, much more knowledge will become publicly available than that is currently the case in the biomedical literature. This initiative is supported by the major players in the field (Wiki foundation, open access publishers, and database/ontology providers). Second, based on this public knowledge, KnewCo will employ algorithms such as SemBLAST to implement discovery tools and alerting services to serve the key elements of the scientific process. A user will have a semantic passport that can be used for a variety of discovery and scientific knowledge management purposes. Of course, semantic Medline and other database searches will be supported as well.