NaCTeM

Batch submission to TerMine (request access)

Terms of Use

By using the TerMine service, you agree to the general Terms and Conditions of Use for the NaCTeM Website, in addition to the following Terms of Use for TerMine:
  • Please let us know that you are using TerMine by email.
  • Please cite the following when publishing work that uses TerMine:
    Frantzi, K., Ananiadou, S. and Mima, H. (2000) Automatic recognition of multi-word terms. International Journal of Digital Libraries 3(2), pp.117-132.
  • Please credit and link to the NaCTeM website (http://nactem.ac.uk) in any electonic services beased on the TerMine service or resulting data.
  • Please contact us in advance if you plan to use the service for bulk processing. TerMine is a freely available service from the academic domain. This means that it is necessary to limit server load and give preference to individual users. Excessive server load may result in IP addresses or institutions being blocked from using the TerMine service. There is a limit enforced on how many times unregistered users may use this service per day.

When you submit a batch request, your job will enter a queue. When your job is complete, you will receive an email containing the URL where you can view the results.

Please note: if you want to analyze a PDF document, you must specify a URL. PDF uploading is not currently supported.

Enter URL: or
Enter Name of File to Upload:

Select type of file: Text HTML PDF

Select Parser:

Enter Email Address for Notification:

About the C-value and TerMine ...

Technical terms are important for knowledge mining, especially in the bio-medical area where vast amount of documents are available. A domain independent method for term recognition is very useful to automatically recognize terms from documents.

C-value is a domain-independent method for automatic term recognition (ATR) of candidate multiword terms which combines linguistic and statistical analyses; emphasis being placed on the statistical part. The linguistic analysis enumerates all candidate terms in a given text by applying part-of-speech tagging, extracting word sequences of adjectives/nouns based, and stop-list. The statistical analysis assigns a termhood to a candidate term by using the following four characteristics:

  • the occurrence frequency of the candidate term
  • the frequency of the candidate term as part of other longer candidate terms
  • the number of these longer candidate terms
  • the length of the candidate term

We have been developing a system for terminological management called TerMine. It employs the C-value method to extract terms. The implementation is optimized for scalability and processing speed: given a set of 1.3 million MEDLINE abstracts (2GB text), TerMine (standalone version) extracts 9.8 million term candidates and their termhood scores in about ten minutes.