NaCTeM

Termine Web Service (Request Access)

Termine Web Service provides a SOAP interface, where you can use the candidate multiword term extraction component from your client programs.

To use this service, you must add a software key to your soap client.

WSDL

Client sample

Overview of the API

Termine Web Service provides the SOAP interface that exports a function analyze. It takes a source text as a string and returns the term-extraction result as a string. A source string can be either a natural English text, e.g.,

Technical terms are important for knowledge mining, especially in the bio-medical area where vast amount of documents are available.
or the one with part-of-speech tags,

Technical	Technical	JJ
terms	term	NNS
are	be	VBP
important	important	JJ
for	for	IN
knowledge	knowledge	NN
mining	mining	NN
,	,	,
especially	especially	RB
in	in	IN
the	the	DT
bio-medical	bio-medical	JJ
area	area	NN
where	where	WRB
vast	vast	JJ
amount	amount	NN
of	of	IN
documents	document	NNS
are	be	VBP
available	available	JJ
.	.	.
EOS

It is recommended that a natural English text consists of multiple lines each of which represents a sentence. The result of term extraction may be inaccurate if a sentence spans several lines.

A text with part-of-speech tags consists of multiple lines each of which presents a word, its base-form, and part-of-speech separated by TAB characters. A line with "EOS" represents an end of a sentence. The part-of-speech tags should be compatible with the Penn Treebank Project.

The function analyze takes string arguments as follows.

  • src (required): a source text with or without part-of-speech tags
  • key (required): the key for access to this service. This is provided to you when you apply for access.
  • input_format (optional; default="plain.genia"):
    • "plain.genia": The source text presents natural English sentences (the server will process the text by using GENIA tagger)
    • "post.genia": The source text presents sentences with part-of-speech tags annotated by GENIA tagger
  • output_format (optional; default="plain"):
    • "plain": The result will be returned in plain text
    • "xml": The result will be returned in XML
  • stoplist (optional; default=""): A list of stop words separated by whitespace characters. The service does not apply a stoplist without this argument specified.
  • filter (optional; default="{JJ}*{NN}+"): A part-of-speech patter to extract terms

The function returns the analysis result in result variable in a SOAP response. An analysis result will be a list of candidate multiword terms and their C-Value scores. This is an example of the result for the sample sentence.

1.000000 technical term
1.000000 knowledge mining
1.000000 bio-medical area
1.000000 vast amount