lexrefine: Tool for refining lexicon

Japanese version

This tool is for thresholding infrequent words and lexical entry templates, and for expanding lexical entry templates by lexical rules.

lexrefine [options] rule_module orig_lexicon orig_template new_lexicon new_template
rule_modulelilfes program in which lexical rules are implemented
orig_lexiconinput lexicon
orig_templateinput template database
new_lexiconrefined lexicon
new_templaterefined template
Options
-wf thresholdthreshold of word frequency (default: 1)
-tf thresholdthreshold of the frequency of lexical entry templates (default: 0)
-uwf thresholdthreshold of the frequency of words to be regarded as unknown word (default: 1)
-utf thresholdthreshold of the frequency of lexical entry templates to be adopted for unknown words (default: 0)
-vprint debug messages
-vvprint many debug messages
-vvvprint many many debug messages

"lexrefine" refines a lexicon and a template database with the following operations.

First, remove lexical entry templates whose occurrence count is less than the threshold (the value specified by "-tf" option).

Next, apply lexical rules to remaining templates, and make lexical entry templtaes for inflected words. Write lexical rules with the following interfaces defined in "mayz/lexrefine.lil".

expand_lexical_template(+$InTemplateName, +$InTemplate, +$Freq, -$LexRules, -$NewTemplate)
$InTemplateNamename of an input template
$InTemplateinput lexical entry template
$Freqoccurrence count of the template
$LexRuleshistory of applied lexical rules
$NewTemplatederived lexical entry template
Apply lexical rules to a lexical entry template of a lexeme, and make a new lexical entry template.
A new lexical entry template is assigned the name (the pair of a lexeme name and a history of lexical rules), and is stored in a template database. The frequency of a new template is regarded as the same as the original template (lexeme).

To use derived lexical entry templates, implement the following interface.

expand_lexicon(+$InKey, +$TemplateName, -$NewKey)
$InKeya key of an input word
$TemplateNamename of a template ('lex_template' type)
$NewKeya new key
From a key of a lexicon, $InKey, make a new key, $NewKey, to which the derived template should be assigned. For example, when a template for passive is made from that for a base verb, we make a new key "loved/VBN" from "love/VB".

Next, for each entry in a lexicon, if the occurrence count of a word (to be more precise, the key given by the third argument of "reduce_lexical_template/5") is less than the threshold (the value specified by "-wf" option), the entry is removed from the lexicon. The other entries remain in the lexicon, while the templates deleted in the first step are automatically removed from the entries.

In addition, a word is regarde as "unknown word" if its occurrence count is less than the value of the "-uwf" option. That is, templates assigned to the word are added to the template list for an unknown word. The key of an unknown word is specified with "unknown_word_key/2" defined in "mayz/lexrefine.lil".

unknown_word_key(+$InKey, -$OutKey)
$InKeykey of an input word
$OutKeykey of an unknown word
Make a key for an unknown word.
For example, if this predicate provides the POS of $InKey, a parser will look up an unknown word entry with a POS. You can additionally specify the threshold of the occurrence count of templates to be assigned to unknown word entries.
MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)