Mining the History of Medicine


This project will demonstrate the potential of text mining tools to helping researchers from multiple disciplines to discover and extract information automatically from medical historical archives. It brings together the strengths of two teams at the University of Manchester:

  • The National Centre for Text Mining (NaCTeM), with its proven track record of developing effective text mining tools operating in a variety of domains.
  • The Centre for the History of Science, Technology and Medicine (CHSTM), which is one of the largest groups in the history of science, technology and medicine (HSTM) in the UK, specialising in nineteenth- and twentieth-century history.

The development of such tools will open up increasingly efficient and manageable ways to reveal, explore and discuss long-term, large-scale historical transformations related to medicine, health and British society. Researchers who will benefit from the availability of such tools include:

  • humanities researchers working in several areas, including History of Medicine, British History more widely, Victorian Studies, and Literary and Cultural Studies
  • social scientists working in Sociology of Medicine and Historical Demography
  • medical researchers interested in the historical dimensions of their work
  • text mining and natural language processing researchers working in digital humanities, who will take advantage of the interoperable information extraction software and resources developed during the project.

Project aims

The project aims to develop resources and tools to support sophisticated text mining applications, as follows:

  • A unique temporal resource of medical terminology that records variation and semantic shift of medical concepts over the course of the 19th and 20th centuries
  • Cutomised tools that are able to extract terms, named entities, relations and events from medical historical archives

Tools will be made available as web services and as components of NaCTeM's interoperable text mining environments(U-Compare and Argo). This will ensure that they can be reused and flexbly integrated with other tools, to create various typres of applications that are suited to the needs of different researchers.

Maximum visibility of the tools and resources will be guaranteed by making them available via a number of major digital humanities infrastructures, each aimed at a different user groupo (e.g. CLARIN, DARIAH and META-SHARE).

As a concrete application, the tools and resources have been used to create the History of Medicine (HOM) semantic search system. Through integration of the temporal terminological resource, the system increases possibilities for researchers to broaden and deepen their work to ask 'big' questions that cover long periods, without losing sensitivity to changes in terminology and meaning.

The HOM search system operates over two large-scale medical resources, covering a wide time-span, i.e.,

The HOM system builds upon and extend techology developed by NaCTeM for semantically-oriented search of documents in other domains, such as news (ISHER system) and clinical trials.

Case Studies

The novel terminogical resource and customised search system will be used an evaluated in the context of two case studies, i.e.:

  • Exploring the modern epidemiological transition
  • Creation of a medical surveillance culture

These are two massively important and interrelated changes in British health experience, where many questions remain unanswered. The methods and results of the case studies will serve as concrete examples of how such an asset and tools can be used.

Project team

Prinicpal Investigator: Prof. Sophia Ananiadou (NaCTeM)

Co-Investigators: Mr. John McNaught (NaCTeM), Dr. Carsten Timmermann (CHSTM), Prof. Michael Worboys (CHSTM)

Researchers: Mr. Paul Thompson (NaCTeM), Dr. Elizabeth Toon (CHSTM)

Related Publications

Thompson, P., Batista-Navarro, R. T. B., Kontonatsios, G., Carter, J., Toon, E., McNaught, J., Timmermann, C., Worboys, M. and Ananiadou, S. (2016). Text Mining the History of Medicine. PLOS One, 11(1): e0144717

Thompson, P., Carter, J., McNaught, J. and Ananiadou, S. (2015). Semantically Enhanced Search System for Historical Medical Archives. In Proceedings of DigitalHeritage 2015

Thompson, P., McNaught, J. and Ananiadou, S. (2015). Customised OCR Correction for Historical Medical Text. In Proceedings of DigitalHeritage 2015

Miwa, M. and Ananiadou, S. (In Press). Adaptable, high recall, event extraction system with minimal configuration. BMC Bioinformatics., 16(Suppl 10.):S7

Alnazzawi, N., Thompson, P., Batista-Navarro, R. and Ananiadou, S. (2015). Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Medical Informatics and Decision Making, 15(Suppl. 2): S3

Bollegala, D., Kontonatsios, G. and Ananiadou, S. (2015). Cross-lingual Similarity Measure for Detecting Biomedical Term Translations. PLOS ONE

Miwa, M., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2014). Comparable Study of Event Extraction in Newswire and Biomedical Domains. In Proceedings of Coling 2014, pp. 2270 -2279 .


5th April 2016

Prof. Sophia Ananiadou will give a seminar entitled Text Mining tools and infrastructure for biomedical applications - cancer biology, history of medicine, monitoring biodiversity at the CERTH Conference Centre Vergina, Greece.

6th January 2016

A new article providing an overvew of the work carried out on the project, from a TM perspective, has been published in PLOS ONE.

Paul Thompson, Riza Theresa Batista-Navarro, Georgios Kontonatsios, Jacob Carter, Elizabeth Toon, John McNaught, Carsten Timmermann, Michael Worboys and Sophia Ananiadou (2016). Text Mining the History of Medicine. PLoS ONE 11(1): e0144717.

27th June 2015

The project has been mentioned in The Lancet, one of the world's oldest and most prestigious medical journals. The mention is in an article entitled Medical periodicals: mining the past, which examinines how current technology can help medical historians explore the vast amounts of information locked away in the historcal archives of medical journals.

15th June 2015

A workshop entitled Text Mining the History of Medicine Workshop is being hosted by the Wellcome Trust, London. All members of the project team will present the results of project to a group of leading medical historians, followed by a hands-on session to demonstrate the capabilities of the HOM search system. A feedback and discussion session will focus on how the seach system can be used in different scenarios, and how the system can be further developed in the future.

8th June 2015

Prof. Sophia Ananiadou gave an invited talk entitled Text Mining the History of Medicine at Lancaster University, reporting on the results of the project, as part of the seminar series organised by the University Centre for Computer Corpus Research on Language (UCREL). Slides are available here.

30th May 2015

Elizabeth Toon presented "Text-Mining the BMJ: Challenges, Opportunities" at the Working with 19th-Century Medical and Health Periodicals Workshop, held at St Anne’s College, University of Oxford.

19th May 2015

The History of Medicine (HOM) semantic search system, which allows semantic search over the BMJ and MOH archives is now available here.

2nd May 2015

Elizabeth Toon and Carsten Timmermann presented 'The Mining the History of Medicine Project: Big Data, Big Questions?' as part of a panel on Big Data and the Medical Humanities, at the American Association for the History of Medicine Annual Meeting in New Haven, CT, USA.

24th March 2015

Prof. Sophia Ananiadou gave a seminar about the project at the Institute of Historical Research, University of London, as part of their Digital History series of seminars.

Watch the video of the seminar here.

10th March 2015

Prof. Sophia Ananiadou gave a talk about the project at a BMJ Editors' Retreat Day in London. The audience consisted of the entire group of BMJ editors, who are also actively involved in medical research coming from all therapeutic areas and from a wide variety of different countries.

27th November 2014

Prof. Sophia Ananiadou gave a talk focussing on the Mining the History of Medicine project at an event entitled Text Mining: Tools and Opportunities, held at the British Library, London.

The aim of the event was to foster an understanding of how to bridge the gap between research ideas and data, to enable the study of digital works with novel and useful techniques. The event brought together experts developing tools and services with those who are interested in using these techniques in their research.

25th November 2014

John McNaught gave a talk entitled Are you ready for the golden age of text mining? as part of London Info International, at the Royal Horticultural Halls, London.

As part of his talk, which highlighted the ways in which text mining can be applied to scholarly publications to enrich content, enhance search and enable knowledge discovery, John talked about the benefits of text mining in the Mining the History of Medicine project.

24th August 2014

Prof. Sophia Ananiadou gave an invited talk, entitled Cross-domain, time-sensitive text mining across multiple levels of analysis at the 2014 COLING Workshop on Synchronic and Diachronic Approaches to Analyzing Technical Language (SADAATL 2014), to be held in Dublin, Ireland, on August 24th, 2014.

The focus of the workshop was on the natural language processing techniques that can be applied to technical documents across multiple domains, genres, and which are sensitive to linguistic changes that occur when considering documents from different time periods. Prof Ananiadou's talk covered work carried out at NaCTeM involving the extraction of medical terminology from archives that span long periods of time. This work is feeding into the creation of the temporal terminological resource mentioned above.

Further Information


This project is being funded by the Arts & Humanities Research Council (AHRC). It is one of 21 Digital Transformations in the Arts and Humanities projects being funded by the AHRC, as part of their investment in Big Data.
Grant number: AH/L00982X/1