Web Demonstration

Automatic recognition of multi-word terms and acronyms

After using this service, tell us what you think via our feedback form.

Usage

  1. Select the data entry method (text, file upload, or URL).
  2. Choose a POS tagger (TreeTagger for general text, GENIA for biomedical text).
  3. Click Analyze.

C‑value term extraction benefits from sufficient text for termhood scoring; try the sample buttons for a quick demo.

Limitations

  • Documents larger than 2MB are rejected to protect the server; use the TerMine Processing Service for larger jobs.
  • Text must be ASCII-encoded.
  • Layout of original HTML/PDF may not be reproduced.
  • Some HTML/PDF may not be extractable.

Background

TerMine integrates C‑Value multiword term extraction and AcroMine acronym recognition. Terms are recognized using linguistic analysis (POS tagging, extraction of adjective/noun sequences) and statistical analysis (termhood scoring) for scalable, high‑throughput processing.

References

  • Frantzi, K., Ananiadou, S. & Mima, H. (2000) Automatic recognition of multi‑word terms. International Journal of Digital Libraries 3(2), 117–132.
  • Okazaki, N. & Ananiadou, S. (2006) Building an abbreviation dictionary using a term recognition approach. Bioinformatics.
  • GENIA Tagger — POS tagging, shallow parsing, and NER for biomedical text.
  • TreeTagger — language‑independent POS tagger.