BioCause annotation
View the corpus online with the brat rapid annotation tool.
The BioCause_corpus directory contains a version of the entire ID corpus, which has been enriched with causality annotation. A more detailed description of this annotation, together with access to the annotation guidelines, is available here.
The BioCause_corpus directory contains files of two types.
- .txt - Contains the text files used for the annotation.
- .ann - Contains the annotated text in stand-off format.
The .ann files contain named entity, event and causality annotations formatted according to the BioNLP 2011 ST style. In the case of terms, the ID occurs first and is delimited from the rest of the line with a TAB character. The primary annotation is given as a SPACE-separated triple (type, start-offset, end-offset). The start-offset is the index of the first character of the annotated span in the text (".txt" file), i.e. the number of characters in the document preceding it. The end-offset is the index of the first character after the annotated span. Thus, the character in the end-offset position is not included in the annotated span. For reference, the text spanned by the annotation is included, separated by a TAB character.
In the case of events, the event ID occurs first, separated by a TAB character. The event trigger is specified as TYPE:ID and identifies the event type and its trigger through the ID. By convention, the event type is specified both in the trigger annotation and the event annotation. The event trigger is separated from the event arguments by SPACE. The event arguments are a SPACE-separated set of ROLE:ID pairs, where ROLE is one of the event- and task-specific argument roles (e.g., Effect, Cause, Theme, Site) and the ID identifies the entity or event filling that role. Note that several events can share the same trigger and that while the event trigger should be specified first, the event arguments can appear in any order.
An example of an annotated causal relation within the .ann file is shown below:
T139 Argument 3854 3963 Mlc is a global regulator of carbohydrate
metabolism and controls several genes
involved in sugar utilization
T140 Trigger 3973 3982 Therefore
T141 Argument 4008 4052 Mlc also affects the virulence of Salmonella
E48 Trigger:T140 Evidence:T139 Effect:T141
Featured News
- Final CFP and EXTENDED DEADLINE: Third Workshop on Patient-Oriented Language Processing @ LREC 2026
- 2nd Workshop on Misinformation Detection in the Era of LLMs (MisD) - Call For Papers
- Call for Participation - ClinSkill QA Shared Task @ BioNLP 2026
- Keynote at the UK DWP annual conference on AI in Financial Decision Making
- CFP: BioNLP 2026 and Shared Tasks @ ACL 2026
- NaCTeM's work on building trustworthy AI for mental health mentioned in Forbes magazine
- PsyDefDetect shared task (co-located with BioNLP @ ACL 2026) - call for participation
- Stanford/Elsevier Top 2% Scientist Rankings Success
- GenAI x FinLegal @ EMNLP 2025: Advancing AI Innovation in Finance and Law
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
Other News & Events
- Invited talk at Stanford Medicine
- NaCTeM success at EMNLP 2025 - 7/7 papers accepted
- 1st Workshop on Misinformation Detection in the Era of LLMs - Presentation slides now available
- Prof. Ananiadou appointed Deputy Director of the Christabel Pankhurst Institute
- AI for Research: How Can AI Disrupt the Research Process?








