Entity Mention Annotation in CHF
Annotation scheme
Our annotation scheme for entity mentions aims to identify words and phrases that describe a number of types of concepts that are highly relevant to phenotype phenomema. Table 1 defines the types of concepts whose mentions are annotated on in the PhenoCHF corpus.
Concept Type | Description | Examples of mentions |
---|---|---|
Cause | Any medical problem that contributes to the occurrence of CHF | chronic renal insufficiency, hypertension |
Risk Factor | A condition that increases the chance of a patient having the CHF disease | obesity, type 2 diabetes, high cholesterol |
Sign & Symptom | Any observable manifestation of a disease which is experienced by a patient and reported to the physician | productive cough, nausea, vomiting |
Non-traditional risk factor | Conditions associated with abnormalities in kidney functions that put the patient at higher risk of developing signs ∓ symptoms and causes of CHF | iron deficiency, anemia |
Organ | Any body part | lungs, abdomen |
Chief Complaint | Mentions of CHF | CHF, congestive heart failure |
Entity mention statistics
All mentions of the concepts of the types shown in Table 1 were annotated in each document of PhenoCHF. The total counts of each type of entity annotated in each part of the corpus (i.e., narrative EHR reports and literature articles) are shown in Table 2
Concept Type | No of annotated mentions in narrative EHR reports | No of annotated mentions in literature articles |
---|---|---|
Cause | 1320 | 1107 |
Risk Factor | 1335 | 408 |
Sign & Symptom | 2449 | 304 |
Non-traditional risk factor | 308 | 329 |
Organ | 432 | - |
Distribution of entity mentions in PhenoCHF
Figure 1 provides on overview of the distribution of the mentions of phenotype-related concepts in PhenoCHF. In discharge summaries, there is large emphasis on describing the signs and symptoms of the disease, but these play a much less significant role in scientific articles, where the dominant topics are non-traditional risk factors and the etiology of CHF.
Figure 1. Distribution of entity mention annotations in PhenoCHF
Agreement
The entity annotations were undertaken by two medical doctors. The quality and consistency of the annotations were verified through the calculation of inter-annotator agreement (IAA). We calculated IAA in terms of F-Score, and found that high levels of agreement were acheived. We calcluated both exact span matches, where the start and end of the annotated text spans chosen by both annotators must match exactly, and relaxed span matches, where it is sufficient for the annotated text spans to include some common parts. The IAA statistics, in terms of F-score, are shown in Table 3.
Agreement Type | Narrative EHRs | Literature articles |
---|---|---|
Exact Match | 0.82 | 0.69 |
Relaxed Match | 0.92 | 0.77 |
Featured News
- Talk at Generative AI Summit
- Talk at Open Data Science Conference (ODSC)
- BioLaySumm 2023 - Shared Task @ BioNLP 2023
- Prof. Ananiadou appointed as Senior Area Chair for ACL 2023
- Recent funding successes for Prof. Sophia Ananiadou
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
Other News & Events
- Prof. Ananiadou gives talk as part of Women in AI speaker series
- New Knowledge Knowledge Transfer Partnership with 10BE5
- Keynote Talk at the Festival of AI
- New article on using neural architectures to aggregate sequence labels from multiple annnotators
- New article on improving biomedical extractive summarisation using domain knowledge