NaCTeM

Paladin

Introduction

Paladin (Proactive learning annotator for document instances) is an open-source web-based annotation tool for creating high-quality multi-label document-level datasets. By integrating active learning and proactive learning to the annotation task, Paladin makes the task less time-consuming and requires less human effort. Although Paladin is designed for multi-label settings, the system is flexible and can be adapted to other tasks in single-label settings.

Context

Obtaining labelled data is difficult, time-consuming, and requires a lot of human effort. Many libraries and systems focus on active learning. However, little attention has been paid to the interaction between the annotators and the active learning algorithm. Paladin provides a document classification annotation webapp which supports active/proactive learning.

Features

  • Active/proactive learning integration: Paladin makes annotation easy, time-efficient and requires less human effort by offering active and proactive learning.
  • An easy-to-use interface for annotators: Paladin adapts the interface of doccano, making annotation intuitive and easy to use.
  • Suitable for multi-label document annotation tasks: Paladin is best used for multi-label document annotation tasks, although it can be used for other single-label classification problems.

Framework

  • Manager creates a project; uploads the training, test, and unlabelled data; defines the label set; chooses the active/proactive learning strategy.
  • Annotator selects label(s) on the displayed label set for each document.
  • Paladin triggers the training process with newly annotated data and updates the documents for the next annotation batch.

Video

https://www.youtube.com/watch?v=PdOSHE7ubJM

Availability

The source code is publicly available at https://github.com/bluenqm/Paladin.

References

Nghiem, MQ, Baylis, P. and Ananiadou, S. (2021). Paladin: an annotation tool based on active and proactive learning, In Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics (EACL 2021): System Demonstrations, pp. 238-243.