NaCTeM

What others are saying about us

"Big Data" is a hot topic in the business world these days. But there's a subset of this broad field that has yet to take a turn in the spotlight. It's called "text mining" and you're probably going to be hearing a lot more about it over the coming months and years ... Academic types are at the forefront of this effort, and at least one country is already trying to help its eggheads with their text mining needs. England's recently established National Centre for Text Mining is the first publicly funded text mining clearinghouse in the world, with the stated aim of furthering academic research.
Gary Belsky, 20th March 2012
Quoted from an article in Time Business entitled Why Text Mining May Be The Next Big Thing.

PubMed is most scientists' first port of call for literature searches, and there are many fine tutorials on this site explaining how to get the most from this tool. However, many people don't know that there are also several promising text-mining tools that offer more sophisticated text-searching functions, such as semantic searches, and are quite accessible to experimental biologists. These tools analyse the free text of an article using publicly available Medline data and extract relationships between the search terms, index these relationships, and present their results almost instantly... Text-mining tools which were developed by the UK's National Centre for Text Mining (NaCTeM) in Manchester ... [MEDIE, Kleio and Facta] have convenient web interfaces, so it's easy to give them a try and see if they are useful to you.

Of course, text mining is not perfect yet - the English language is so rich and varied that an idea can be expressed in a myriad of ways, not all of which are captured by the heuristic rules of the text-mining algorithm. But these tools are so easy and fast to use that they can be added to your literature-searching repertoire today!

Dr Richard Adams, Senior Software Developer, Synthsys, University of Edinbugh, 9th Jaunary 2012
Quoted from a BitesizeBio blog entitled How Text-Mining Tools Can Improve Your Literature Searches.

Text mining has exciting applications for medicine. Conventional sifting of information can take weeks, and exciting new connections could potentially be missed. Medical research is also increasing interdisciplinary, including biology, chemistry, economics and other sciences. Being able to access information from other fields is a tremendous benefit and can help generate new ideas. Access to NaCTeM will be a real boost for our research teams, and a great incentive for new recruits.
Professor Phil Baker, Director, Biomedical Research Centre (BRC), Manchester
Quoted from joint press release regarding strategic partnership between the BRC and NaCTeM.

Sophia Ananiadou discussed some of NaCTeM's flagship tools like MEDIE, FACTA and KLEIO - it does look like they're starting to take all the pain out of text mining, by doing the difficult bits for us, so we can use the results to do actual mining.
Dr Andrew Clegg, Research Scientist, University College London
Quoted from blog post at biotext.org.uk concerning the Semantic Enrichment of the Scientific Literature 2009 (SESL 2009) workshop.

Over the last couple of years, scientists at Pfizer's research site in Sandwich have been making use of the text mining tools and services developed by NaCTeM. One such tool, which has proven to be valuable, is TerMine, an automatic multi-word term recognition tool that has been used at Pfizer to enrich the labourintensive process of building dictionaries used for text mining.

Pfizer and NaCTeM have also been collaborating on a project called DECA (Disease Extraction with Concept Association) to extract associations between concepts in the biomedical domain such as diseases and symptoms from collections of biomedical texts (e.g. Medline). The aim of this project is to combine the strengths of the NaCTeM text mining tools, Kleio and FACTA to create an efficient search for associations between biomedical concepts. Also, a considerable amount of research is being applied to the challenge of lexical disambiguation of the biomedical terms. Pfizer values highly the world-class quality of the linguistic and semantic extraction skills and methodologies being developed and practised at NaCTeM which is located in the highly appropriate setting of the Manchester Centre for Integrative Systems Biology.

Ian Harrow, Senior Principal Scientist, Pfizer
Internal communication

NaCTeM has engaged closely with users in systems biology to understand their needs and to provide cutting edge text mining services. Researchers in systems biology need integrated approaches to generate hypotheses and the use of text mining technology is a must for facilitating scientific discovery given the amount of textual data generated daily. NaCTeM has tapped into this potential with great success. One of the most impressive outcomes of the work of NaCTeM are the systems MEDIE and FACTA. Such semantically based tools are important for the discovery of new knowledge in biology.
Professor Douglas Kell, Research Chair in Bioanalytical Science, University of Manchester
Internal communication

Sophia Ananiadou from NaCTeM explained the work her group has done using text mining techniques on Medline abstracts. This is the third time I've heard her talk about this, and it gets more interesting each time. Her aim is to enrich the literature by automatically creating semantic metadata, and thereby to make "undiscovered science" accessible. The MEDIE system is the most vivid example she showed, allowing you to construct a query in the form "subject-verb-object" For instance, you can ask "what does p53 activate" by searching for "subject=p53, verb=activate".
Frank Norman, Manager, Library & Information Service, National Institute for Medical Research, London
Quoted from "Trading knowledge" blog on nature.com.

NaCTeM offers tools with an eye to interoperability and for which workflow software is important, for example the Unstructured Information Management Architecture, or UIMA, formerly associated with IBM and now an open project that runs in OASIS and Apache, and protocols such as SOAP for XML-based message exchange...Users can mix and match the tools they need.
Vivien Marx
Article in BioInform, vol. 12, no. 11, "SciWit, NaCTeM Tailor Text-Mining Tools For Varying Needs of Biomedical Research"

Anyone with experience of lists of abbreviations and acronyms will have spotted that they’re seldom up to date and often contain abbreviations and acronyms which, from a cursory internet search, seem to exist only in lists, rather than out in the wild. So an abbreviation list that is somehow automatically generated from current material would be extremely welcome.

AcroMine has been around for a few years but you may not have seen it before. The idea is to take all of PubMed and look for word sequences that regularly co-occur with expressions in brackets that match.

But how well does it work across disciplines? My first attempt was a term used in nuclear magnetic resonance spectroscopy: INEPT. AcroMine correctly identifies this as "insensitive nuclei enhanced by polarization transfer". AcroMine offers 22 hits for "MMR"; the most common is, surprisingly, not the vaccine against measles, mumps, and rubella but "mismatch repair". AcroMine even correctly offers "Large Hadron Collider" as an expansion for "LHC".


Editor's Webwatch, European Scientific Editing, February 2009

At the JISC Collections AGM on 20 November 2008, Sophia Ananiadou, Director of the National Centre for Text Mining (NaCTeM) gave an excellent presentation on what text mining is, why it matters for researchers and how it helps to facilitate new and innovative research.

In the context of information overload and the problem of keeping up with the increasing amount of new literature available, Sophia made the point that much information on the web is unstructured (she estimates about 80%) and/or not searchable (e.g text in pdf or PowerPoint files, which cannot be found by ordinary search engines). She explained how text mining helps with not only finding relevant information, but can make intelligent connections to scholarship from other fields and provoke questions that might not otherwise have been asked.

The Centre has tools and services to help institutions and researchers do text mining - the current areas of focus are biology and the social sciences, but the Centre has had a lot of interest from publishers and they hope to expand on the disciplines they cover.

Sarah Gentleman, Research Information Network (RIN)
Quoted from RIN Team blog