PhenoCHF Download
The annotations may be downloaded for research purposes (please observe the terms of the licence below).
NOTES:
- Information about annotations is provided in separate files from the text that has been annotated. The format of these annotation files is described in detail on the annotation format page.
-
The associated text files for each part of the two documents types in the corpus are obtained in different ways, as detailed below.
- Full text literature articles - These are open acess papers, and we provide the plain text files that were used as a basis for the annotation as part of the corpus download. The basename of each file name is the PMID of the associated article.
- Narrative EHR reports - These form part of the dataset of de-identified clinical records released as part of the i2b2 2008 Obesity Challenge (NLP Dataset #2). The dataset must be obtained individually from Partners Healthcare by signing a Data Use Agreement.
-
IMPORTANT NOTE: The i2b2 2008 Obesity Challenge Dataset is obtained as a single XML file, containing all clinical records. Within the XML file, each document is contained within a <doc> element, and the doc element has an id attribute, which assigns a unique id to each clinical record. Within each <doc> element, there is a <text> element, which contains the text of the clinical record.
- Annotation files are provided separately for each clinical record, in the format described on the annotation format page. The basename of the annotation files corresponds to the id of the clincal record, as specified in the id attribute of the corresponding document element in the original dataset file.
- The annotation files assume that the text for each clinical record corresponds to the text that occurs betwen the <text> and </text> tags for the record in the original dataset file.
-
IMPORTANT NOTE: The i2b2 2008 Obesity Challenge Dataset is obtained as a single XML file, containing all clinical records. Within the XML file, each document is contained within a <doc> element, and the doc element has an id attribute, which assigns a unique id to each clinical record. Within each <doc> element, there is a <text> element, which contains the text of the clinical record.
PhenoCHF corpus licence
1. Copyright of Literature Articles
The full text literature articles in the PhenoCHF corpus are drawn from the PMC Open Access Subset. These articles are protected by copyright, but are made available under a Creative Commons or similar licence that generally allows more liberal redistribution and reuse than a traditional copyrighted work. Please refer to the license of each article for specific licence terms.
2. Copyright of PhenoCHF annotations

The entity mention, relation and normalisation annotations in the PhenoCHF corpus were created at the National Centre for Text Mining (NaCTeM), School of Computer Science, University of Manchester, UK. They are licensed under a Creative Commons Attribution 4.0 International License. Please attribute NaCTeM when using the corpus and cite one or more of the following papers, depending on which annotations are used:
Entity Annotations
Alnazzawi, N., Thompson, P., Batista-Navarro, R. and Ananiadou, S. (2015). Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Medical Informatics and Decision Making, 15(Suppl. 2): S3Normalisation Annotations
Alnazzawi, N., Thompson, P. and Ananiadou, S. (2016). Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource. PLOS ONE, 11(9): e0162287
Relation Annotations
Alnazzawi, N., Thompson, P. and Ananiadou, S. (2014). Building a semantically annotated corpus for congestive heart and renal failure from clinical records and the literature. In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi), pp. 69-74.Featured News
- Talk at Generative AI Summit
- Talk at Open Data Science Conference (ODSC)
- BioLaySumm 2023 - Shared Task @ BioNLP 2023
- Prof. Ananiadou appointed as Senior Area Chair for ACL 2023
- Recent funding successes for Prof. Sophia Ananiadou
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
Other News & Events
- Prof. Ananiadou gives talk as part of Women in AI speaker series
- New Knowledge Knowledge Transfer Partnership with 10BE5
- Keynote Talk at the Festival of AI
- New article on using neural architectures to aggregate sequence labels from multiple annnotators
- New article on improving biomedical extractive summarisation using domain knowledge