Seminar - Jasmin Saric

Speaker:	Jasmin Saric, European Media Lab Research, Heidelberg, Germany
Title:	Extracting Data for Molecular Biology
Date:	12:00, Friday 8th April
Location:	Room F10, MSS Building
Abstract:	Information extraction technology is getting more and more popular within the biomedical domain. This technology basically aims at extracting relations between entities, like interactions of proteins and genes. We apply this technology to generate data for the population of our database for molecular biology. Applying NLP (natural language processing) techniques is particularly difficult in molecular biology since many forms of complex terminological variations frequently occur. To resolve these ambiguities it is indispensable to take semantic criteria into consideration. Ontologies can provide these semantic criteria. However, it is extremely labour-intensive to build such domain specific ontologies. To overcome this hurdle we are trying to use existing resources (like the GENIA corpus) and machine learning aproaches to semi-automatise this process. In my talk I will give an overview of the activities in our group (the SDBV of EML Research in Heidelberg, Germany) concerning Information Extraction, Ontology Learning, Parsing of Chemical Compound Names and Manual Extraction of Kinetic Data from biology-related scientific literature.