SLiM: Pilot study of the utility of text mining and machine learning tools to accelerate systematic review and meta-analysis of findings of in vivo research


There is now more research published than ever before. The primary bibliographic database for biomedical research, PubMed, adds around 3,500 new references every day. Our random sample of 2,000 publications in PubMed suggests that in 2013 there were 98,000 publications describing in vivo experiments, of which 21,000 were in pharmacology and 14,500 in neuroscience. No one individual can read, let alone critically appraise or use even a small fraction of this new information, which is the product of months of investigator effort and substantial investment of research funds. This mismatch between the amount of research produced and the amount that can be effectively used, is a major challenge to biomedical research.

The Cochrane Collaboration has been highly successful in synthesising meta-analyses of clinical trial data and providing outcomes in an easily assimilated, widely recognised, format that is readily useable for healthcare funding decisions and day to day clinical practice. This approach has also influenced major improvements in research quality, especially the design, conduct and reporting of clinical trials. Whilst we wish to replicate the success of Cochrane in the pre-clinical domain, we recognise that the sheer volume and publication rate of pre-clinical data mean that methodological innovations are required that move beyond the largely manual processes that are currently adopted for most clinical systematic reviews. For example, in our recently completed systematic reviews of neuropathic pain, data had to be extracted from 229 clinical trials, whereas for the corresponding on-going pre-clinical systematic review, 65,156 publications were retrieved by the initial search. Of these, 33,818 had to be screened to identify approximately 6000 publications that actually contained relevant data.

Furthermore, there are substantial concerns about the risk of bias, either due to sub-optimal experimental design, or because published work is likely to overstate observed effects. Additionally, where sample sizes are low (and sample size calculations are seldom reported), there is also a risk that important biological effects are overlooked because individual studies do not carry sufficient weight.

In brief, therefore, the challenges are as follows:

  1. Information of potential relevance to scientists is produced at such a volume and rate that "reading the literature" is not feasible
  2. The risk of bias in in vivo research is such that detailed critical appraisal is required to allow judgement of whether the conclusions drawn are justified and whether a particular experimental design is appropriate
  3. Publication bias means that scientists relying on selected sources (e.g., particular journals) are likely to be misled
  4. Conventional systematic reviews can be helpful, but are usually between one and two years out of date even on their date of publication. This problem that is further intensified by the sheer volume of data implicit in a pre-clinical systematic review
In this project, we propose to exploit recent developments in text mining and machine learning, and to evaluate their potential to assist with the challenges of systematic reviews of in vivo data outlined above.

Text Mining Support at NaCTeM

The main tasks of the projects are:

  • identifying and retrieving relevant publications,
  • extracting meta-data from identified publications,
  • extracting outcome data from relevant publications.
The first of the tasks is a well-known problem of automatically isolating documents relevant to the user's requirements from a large collection, in order to minimise the human labour in the process. Such functionality is especially important in systematic reviews, since it is often necessary to screen tens of thousands of documents, of which only a small fraction contain truly relevant information. Several approaches to this problem have already been applied (O'Mara-Eves, 2015). Their common element is their use of training data, i.e., a collection of documents in which each document has been manually classified as being relevant or not, which allows the training of machine learning (ML) models that can predict the relevance of previously unseen documents. However, this procedure can only be applied directly when a sufficiently large manually labelled corpus exists in advance, e.g., when it is required to update a previously performed review. Often, however, when a new type of review is to be undertaken, a manually labelled corpus is not available, and it is thus necessary to create one. To optimise the process, active learning is used. Instead of requiring human experts to label every document in a collection, active learning uses an ML algorithm to select only a subset of the documents for the expert to label, according the "informativeness" of these documents, based on a number of different criteria. This can substantially reduce the manual workload required to produce training data (Miwa, 2014). Another important decision concerns how documents should best be represented to allow them to be classified accurately. For example, whilst a number of document classification techniques simply treat documents as 'bag of words', more sophisticated representations of documents have been shown to produce greater accuracy, e.g., LDA topics (Mo, 2015), Paragraph Vectors (Hashimoto, 2016) and Co-Embedding Space for Descriptive Clustering (Sato, 2017).

To address this task, NaCTeM will build upon successful previous work, including the RobotAnalyst system, developed as part of the Supporting Evidence-based Public Health Interventions using Text Mining project.


Bahor, Z., Liao, J., Macleod, M. R., Bannach-Brown, A., McCann, S. K., Wever, K. E., Thomas, J., Ottavi, T., Howells, D. W., Rice, A., Ananiadou, S. and Sena, E. (2017). Risk of bias reporting in the recent animal focal cerebral ischaemia literature. In: Clinical Science, 131(20), 2525--2532

Sato, M., Brockmeier, A. J., Kontonatsios, G., Mu, T., Goulermas, J. Y, Tsujii, J. and Ananiadou, S. (2017). Distributed Document and Phrase Co-embeddings for Descriptive Clustering. In Proceedings of EACL, pp. 991-1001.

Haynes, C., Kay, N., Harrison, K., McLeod, C., Shaw, B., Leng, G., Kontonatsios, G. and Ananiadou, S. (2016). Using text mining to facilitate study identification in public health systematic reviews. In: Guidelines International Network (G-I-N) conference

Hashimoto, K., Kontonatsios, G., Miwa, M. and Ananiadou, S. (2016). Topic Detection Using Paragraph Vectors to Support Active Learning in Systematic Reviews. In: Journal of Biomedical Informatics, 62, 5965

Mo, Y., Kontonatsios, G. and Ananiadou, S.. (2015). Supporting Systematic Reviews Using LDA-based Document Representations. Systematic Reviews , 4, 172

O'Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. and Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews 4:5 (Highly Accessed)

Miwa, M., Thomas, J., O'Mara-Eves, A. and Ananiadou, S. (2014). Reducing systematic review workload through certainty-based screening. Journal of Biomedical Informatics

Project team

Principal Investigator: Prof. Malcolm Macleod, Centre for Clinical Brain Sciences, University of Edinburgh

Prof. Sophia Ananiadou (NaCTeM)
Prof. James Thomas (UCL Institute of Education, University College London)
Prof. Andrew Rice (Department of Surgery and Cancer, Imperial College London)
Dr. Emily Sena (Centre for Clinical Brain Sciences, University of Edinburgh)

Researchers: Dr. Georgios Kontonatsios, Dr. Piotr Przybyła.


The project runs from April 2016 until March 2018, and is funded by the MRC (Grant No. MR/N015665/1). Please also see the RCUK Project page.