NaCTeM

Seminar – Dr Ian Harrow

Speaker: Dr Ian Harrow, Senior Principal Scientist, eBiology Group (IPC 432), Pfizer Global Research & Development
Title: Embedding Text Mining Solutions In Pharmaceutical Research
Date: Friday 23rd November 2007 at 13:00 – 14:30
Location: MIB Lecture Theatre
Abstract:
  • While there are numerous text mining solutions, many are designed for use by "experts". Our aim has been to make text mined information accessible to research scientists in such a way that is easy to interpret and integrate into their everyday work.
  • Our strategy consisted of a range of solutions, starting with simple text mining through keyword searches, building to advanced co-occurrence matrices which harness dictionaries of search terms. Finally, more powerful information extraction is achieved through Natural Language Processing engines using linguistic patterns and entity dictionaries and ontologies.
  • Having defined common queries which could be applied to multiple drug discovery projects, the next stage was to deliver a system to continuously re-compute searches across full literature databases such as Medline and Embase. Scaling this volume of text-mining, running every day within Pfizer’s enterprise environment has been a major challenge. The challenge was solved in a scalable manner through a grid computing approach.
  • In delivering this solution we have made a major investment in an infrastructure to manage and maintain the vast amount of data generated from text mining. Furthermore, these results are integrated with many different types of structured data in numerous databases to deliver the information to our scientists.

Presentation [to follow]