unimaker: Tool for making unigram probability model

Japanese version

This tool makes an event file for a unigram probability model.

unimaker model_name lilfes_module lexicon template lexbank event_file
model_namename of a probabilistic model (this will be specified in parsing)
lilfes_modulelilfes program in which predicates for event extraction are implemented
lexiconlexicon obtained by "lexextract" (lildb format)
templatetemplate database obtained by "lexextract" (lildb format)
lexbanklexbank obtained by "lexextract" (lildb format)
event_fileoutput file in an "unfiltered event" style (text format or compressed (gz or bz) format)
Options
-ffoutput events in feature forest format
-n thresholdlimit of the number of events to be output
-vprint debug messages
-vvprint many debug messages

The name of a probabilistic model is assigned to each event file. If you specify different names, you can use multiple event files in a parser.

Given a grammar (i.e., lexicon and template database) and a lexbank as input, this tool supports the development of a maximum entropy model of the output probability of lexical entries (i.e., unigram probability). This tool makes an unfiltered event file, which is required for training the model.

An unfiltered event is represented with a string that has several fields separated by "//".

in//IN//vp[PPnp]//uni

The last field (uni) represents a category of the event. A category is used when a filter is applied to an event in the following step. The same filters are applied to events that have the same category name. Thus, the number of fields must be the same for events that have the same category name. That is, if you want to incorporate events that have different number of events, use different category names.

Each unfiltered event is a string expression of a target event. This is made by concatenating elements of a list which is obtained as the fourth argument of "extract_lexical_event/4" defined in "amismodel.lil".

extract_lexical_event(+$ModelName, -$Category, +$LexEntry, -$Event)
$ModelNamename of a probabilistic model
$Categoryname of a category
$LexEntrylexical entry
$Eventa list of strings that represents an event
Extract an event of a lexical entry

The name of a probabilistic model must be the first command-line argument of "unimaker".

If you want to specify the value of a feature function (integer or float), the following interface may be used.

extract_lexical_event_feature_value(+$ModelName, -$Category, +$LexEntry, -$Event, -$Val)
$ModelNamename of a probabilistic model
$Categoryname of a category
$LexEntrylexical entry
$Eventa list of strings that represents an event
$Valthe value of a feature function
Extract an event of a lexical entry and the corresponding feature value.

MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)