NaCTeM

What others are saying about us

Text mining has exciting applications for medicine. Conventional sifting of information can take weeks, and exciting new connections could potentially be missed. Medical research is also increasing interdisciplinary, including biology, chemistry, economics and other sciences. Being able to access information from other fields is a tremendous benefit and can help generate new ideas. Access to NaCTeM will be a real boost for our research teams, and a great incentive for new recruits.
Professor Phil Baker, Director, Biomedical Research Centre (BRC), Manchester
Quoted from joint press release regarding strategic partnership between the BRC and NaCTeM.

Sophia Ananiadou discussed some of NaCTeM's flagship tools like MEDIE, FACTA and KLEIO - it does look like they're starting to take all the pain out of text mining, by doing the difficult bits for us, so we can use the results to do actual mining.
Dr Andrew Clegg, Research Scientist, University College London
Quoted from blog post at biotext.org.uk concerning the Semantic Enrichment of the Scientific Literature 2009 (SESL 2009) workshop.

Over the last couple of years, scientists at Pfizer’s UK research site in Sandwich have been making use of the text mining tools and services developed by NaCTeM. One such tool, which has proven to be valuable, is TerMine, an automatic multi-word term recognition tool that has been used at Pfizer to enrich the labourintensive process of building dictionaries used for text mining. […]

Pfizer and NaCTeM have also been collaborating on a project called DECA (Disease Extraction with Concept Association) to extract associations between concepts in the biomedical domain such as diseases and symptoms from collections of biomedical texts (e.g. Medline). The aim of this project is to combine the strengths of the NaCTeM text mining tools, Kleio and FACTA to create an efficient search for associations between biomedical concepts. Also, a considerable amount of research is being applied to the challenge of lexical disambiguation of the biomedical terms. Pfizer values highly the world-class quality of the linguistic and semantic extraction skills and methodologies being developed and practised at NaCTeM which is located in the highly appropriate setting of the Manchester Centre for Integrative Systems Biology.

Ian Harrow, Senior Principal Scientist, Pfizer
Internal communication

NaCTeM has engaged closely with users in systems biology to understand their needs and to provide cutting edge text mining services. Researchers in systems biology need integrated approaches to generate hypotheses and the use of text mining technology is a must for facilitating scientific discovery given the amount of textual data generated daily. NaCTeM has tapped into this potential with great success. One of the most impressive outcomes of the work of NaCTeM are the systems MEDIE and FACTA. Such semantically based tools are important for the discovery of new knowledge in biology.
Professor Douglas Kell, Research Chair in Bioanalytical Science, University of Manchester
Internal communication

Sophia Ananiadou from NaCTeM explained the work her group has done using text mining techniques on Medline abstracts. This is the third time I’ve heard her talk about this, and it gets more interesting each time. Her aim is to enrich the literature by automatically creating semantic metadata, and thereby to make “undiscovered science” accessible. The MEDIE system is the most vivid example she showed, allowing you to construct a query in the form “subject – verb – object”. For instance, you can ask “what does p53 activate” by searching for subject=p53, verb=activate.”
Frank Norman, Manager, Library & Information Service, National Institute for Medical Research, London
Quoted from "Trading knowledge" blog on nature.com.

NaCTeM offers tools with an eye to interoperability and for which workflow software is important, for example the Unstructured Information Management Architecture, or UIMA, formerly associated with IBM and now an open project that runs in OASIS and Apache, and protocols such as SOAP for XML-based message exchange...Users can mix and match the tools they need.
Vivien Marx
Article in BioInform, vol. 12, no. 11, "SciWit, NaCTeM Tailor Text-Mining Tools For Varying Needs of Biomedical Research"

Anyone with experience of lists of abbreviations and acronyms will have spotted that they’re seldom up to date and often contain abbreviations and acronyms which, from a cursory internet search, seem to exist only in lists, rather than out in the wild. So an abbreviation list that is somehow automatically generated from current material would be extremely welcome.

AcroMine has been around for a few years but you may not have seen it before. The idea is to take all of PubMed and look for word sequences that regularly co-occur with expressions in brackets that match.

But how well does it work across disciplines? My first attempt was a term used in nuclear magnetic resonance spectroscopy: INEPT. AcroMine correctly identifies this as “insensitive nuclei enhanced by polarization transfer”. AcroMine offers 22 hits for “MMR”; the most common is, surprisingly, not the vaccine against measles, mumps, and rubella but “mismatch repair”. AcroMine even correctly offers “Large Hadron Collider” as an expansion for “LHC”.


Editor's Webwatch, European Scientific Editing, February 2009

At the JISC Collections AGM on 20 November 2008, Sophia Ananiadou, Director of the National Centre for Text Mining (NaCTeM) gave an excellent presentation on what text mining is, why it matters for researchers and how it helps to facilitate new and innovative research.

In the context of information overload and the problem of keeping up with the increasing amount of new literature available, Sophia made the point that much information on the web is unstructured (she estimates about 80%) and/or not searchable (e.g text in pdf or PowerPoint files, which cannot be found by ordinary search engines). She explained how text mining helps with not only finding relevant information, but can make intelligent connections to scholarship from other fields and provoke questions that might not otherwise have been asked.

The Centre has tools and services to help institutions and researchers do text mining – the current areas of focus are biology and the social sciences, but the Centre has had a lot of interest from publishers and they hope to expand on the disciplines they cover.

Sarah Gentleman, Research Information Network (RIN)
Quoted from RIN Team blog