BioCause annotation
View the corpus online with the brat rapid annotation tool.
The BioCause_corpus directory contains a version of the entire ID corpus, which has been enriched with causality annotation. A more detailed description of this annotation, together with access to the annotation guidelines, is available here.
The BioCause_corpus directory contains files of two types.
- .txt - Contains the text files used for the annotation.
- .ann - Contains the annotated text in stand-off format.
The .ann files contain named entity, event and causality annotations formatted according to the BioNLP 2011 ST style. In the case of terms, the ID occurs first and is delimited from the rest of the line with a TAB character. The primary annotation is given as a SPACE-separated triple (type, start-offset, end-offset). The start-offset is the index of the first character of the annotated span in the text (".txt" file), i.e. the number of characters in the document preceding it. The end-offset is the index of the first character after the annotated span. Thus, the character in the end-offset position is not included in the annotated span. For reference, the text spanned by the annotation is included, separated by a TAB character.
In the case of events, the event ID occurs first, separated by a TAB character. The event trigger is specified as TYPE:ID and identifies the event type and its trigger through the ID. By convention, the event type is specified both in the trigger annotation and the event annotation. The event trigger is separated from the event arguments by SPACE. The event arguments are a SPACE-separated set of ROLE:ID pairs, where ROLE is one of the event- and task-specific argument roles (e.g., Effect, Cause, Theme, Site) and the ID identifies the entity or event filling that role. Note that several events can share the same trigger and that while the event trigger should be specified first, the event arguments can appear in any order.
An example of an annotated causal relation within the .ann file is shown below:
T139 Argument 3854 3963 Mlc is a global regulator of carbohydrate metabolism and controls several genes involved in sugar utilization T140 Trigger 3973 3982 Therefore T141 Argument 4008 4052 Mlc also affects the virulence of Salmonella E48 Trigger:T140 Evidence:T139 Effect:T141
Featured News
- 24-month postdoctoral research position in Athens, Greece
- PhD opportunity in collaboration with Athens Univ. of Economics and Business
- iCASE EPSRC funded PhD- multimodal NLP - UoM & BAE - Application deadline 30th April 2024
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- CFP: BIONLP 2024 and Shared Tasks @ ACL 2024
- Advances in Data Science and Artificial Intelligence Conference 2024
Other News & Events
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- New review article on emotion detection for misinformation
- BioNLP 2024 accepted as workshop at ACL 2024
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
- Chinese Government AwardAward for PhD student Tianlin Zhang