TerMine — Web Demonstration

Web Demonstration

Automatic recognition of multi-word terms and acronyms

Plain text (Only ASCII characters allowed)

Local text file (*.txt in ASCII encoding or *.pdf; 2MB maximum)

URL (HTML or PDF content; 2MB maximum)

POS tagger:

Preserve break lines in the source document

After using this service, tell us what you think via our feedback form.

Usage

Select the data entry method (text, file upload, or URL).
Choose a POS tagger (TreeTagger for general text, GENIA for biomedical text).
Click Analyze.

C‑value term extraction benefits from sufficient text for termhood scoring; try the sample buttons for a quick demo.

Limitations

Documents larger than 2MB are rejected to protect the server; use the TerMine Processing Service for larger jobs.
Text must be ASCII-encoded.
Layout of original HTML/PDF may not be reproduced.
Some HTML/PDF may not be extractable.

Background

TerMine integrates C‑Value multiword term extraction and AcroMine acronym recognition. Terms are recognized using linguistic analysis (POS tagging, extraction of adjective/noun sequences) and statistical analysis (termhood scoring) for scalable, high‑throughput processing.

References

Frantzi, K., Ananiadou, S. & Mima, H. (2000) Automatic recognition of multi‑word terms. International Journal of Digital Libraries 3(2), 117–132.
Okazaki, N. & Ananiadou, S. (2006) Building an abbreviation dictionary using a term recognition approach. Bioinformatics.
GENIA Tagger — POS tagging, shallow parsing, and NER for biomedical text.
TreeTagger — language‑independent POS tagger.