| Abstract: |
- While there are numerous text mining solutions, many are designed for use by "experts". Our aim has been to make text mined information accessible to research scientists in such a way that is easy to interpret and integrate into their everyday work.
- Our strategy consisted of a range of solutions, starting with simple text mining through keyword searches, building to advanced co-occurrence matrices which harness dictionaries of search terms. Finally, more powerful information extraction is achieved through Natural Language Processing engines using linguistic patterns and entity dictionaries and ontologies.
- Having defined common queries which could be applied to multiple drug discovery projects, the next stage was to deliver a system to continuously re-compute searches across full literature databases such as Medline and Embase. Scaling this volume of text-mining, running every day within Pfizer’s enterprise environment has been a major challenge. The challenge was solved in a scalable manner through a grid computing approach.
- In delivering this solution we have made a major investment in an infrastructure to manage and maintain the vast amount of data generated from text mining. Furthermore, these results are integrated with many different types of structured data in numerous databases to deliver the information to our scientists.
|