Pacific Life Re

Background

The role of reliable data in medicine cannot be underestimated. This applies not only to information describing general population-level phenomena covered in scientific publications, but also to health service records describing individuals. Although text mining methods have been widely applied to the former category, the latter has attracted much less attention. One of the main reasons is that these data were previously stored in a format that made them less accessible for digital processing. i.e., as paper documents, which were frequently handwritten. However, increasing adoption of digital solutions both by health service institutions and individual medical practitioners has started to change the picture. This new situation poses both new challenges and opportunities for text mining methods, since there is potentially valuable knowledge contained in individual medical records. In this project, we aim to analyse medical reports using text mining techniques, with the specific goal of quantifying the risk associated with the evidence described.

Problem

This project is being undertaken by NaCTeM in cooperation with a commercial partner, Pacific Life Re. The main task is to analyse an individual's medical report and determine the level of risk associated with the conditions described. The main challenges include the following:

Medical reports are highly structured documents, containing many elements of different types, such as simple information (e.g., date of birth, gender, height and weight), enumerations (e.g., prescribed drugs), textual descriptions (e.g., outcomes of hospital visits) and references to external documents (e.g., test results).
Risk can be associated with entities of different types: diseases, symptoms, drugs, test results or habits.
The influence of a certain risk factor always depends on its context in the document, e.g., temporal (since medical reports frequently cover many years of treatment) or accompanying gradable adjectives (e.g., severe).
The final risk is usually not a simple sum of the influences of individual factors, as some of them may strongly interact with each other, and thus have a significant impact on overall risk, e.g., family history and negative results of related tests.
External knowledge is necessary to interpret the document, as the importance of certain types of evidence (e.g., the fact that the individual has prevously suffered from a particular disease) is considered to be implicitly understood by the reader, and hence is not explicitly written in the report.
The quality of language is frequently poor: reports may contain many (potentially non-standard) abbreviations and acronyms, incomplete sentences and correspondence with patients, which can pose significant challenges for text mining methods.

The goal of our work is not only to automatically compute a global risk per document, but also to allow humans to understand how the risk value was calculated, by automatically highlighting the parts of the text that have a significant contribution to the result. This would allow them to make use of the text mined information, even if the system was unable to detect all of the risks mentioned.

Related work

The project will build on NaCTeM's experience in the following relevant areas:

topic analysis,
coembeddings,
term extraction (TerMine),
biomedical named entity recognition and normalisation,
event attribute recognition (negation, confidence etc.),
active learning,
document classification

The project will also benefit from the expertise of experienced medical risk assessment specialists at Pacific Life Re.

News

9th October 2018

The work being carried out in this project has been mentioned in an article in Cover magazine, leading industry publication for life protection and health insurance.

Publications

Przybyła, P., Brockmeier, A. J. and Ananiadou, S.. (2018). Quantifying Risk Factors in Medical Reports with a Context-Aware Linear Model. Journal of the American Medical Informatics Association, 26(6):537-546

Project team

Principal Investigator: Prof. Sophia Ananiadou
Researchers: Dr. Nhung Nguyen, Mr. Paul Thompson

Latest News

Featured News
Keynote Talk at WIMS 2026
Invited Talk at CHAIfest
Invited talk at 5th Workshop on NLP for Positive Impact
11th Workshop on Financial Technology and Natural Language Processing accepted at EMNLP 2026
Call for Participation - ClinSkill QA Shared Task @ BioNLP 2026
CFP: BioNLP 2026 and Shared Tasks @ ACL 2026
NaCTeM's work on building trustworthy AI for mental health mentioned in Forbes magazine
PsyDefDetect shared task (co-located with BioNLP @ ACL 2026) - call for participation
Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
Featured News Feed
Other News & Events
Final CFP and EXTENDED DEADLINE: Third Workshop on Patient-Oriented Language Processing @ LREC 2026
2nd Workshop on Misinformation Detection in the Era of LLMs (MisD) - Call For Papers
Keynote at the UK DWP annual conference on AI in Financial Decision Making
Invited talk at Stanford Medicine
Stanford/Elsevier Top 2% Scientist Rankings Success

Other News Feed