unimaker: Tool for making unigram probability model

This tool makes an event file for a unigram probability model.

unimaker model_name lilfes_module lexicon template lexbank event_file
model_name	name of a probabilistic model (this will be specified in parsing)
lilfes_module	lilfes program in which predicates for event extraction are implemented
lexicon	lexicon obtained by "lexextract" (lildb format)
template	template database obtained by "lexextract" (lildb format)
lexbank	lexbank obtained by "lexextract" (lildb format)
event_file	output file in an "unfiltered event" style (text format or compressed (gz or bz) format)
Options
-ff	output events in feature forest format
-n threshold	limit of the number of events to be output
-v	print debug messages
-vv	print many debug messages

The name of a probabilistic model is assigned to each event file. If you specify different names, you can use multiple event files in a parser.

Given a grammar (i.e., lexicon and template database) and a lexbank as input, this tool supports the development of a maximum entropy model of the output probability of lexical entries (i.e., unigram probability). This tool makes an unfiltered event file, which is required for training the model.

An unfiltered event is represented with a string that has several fields separated by "//".

in//IN//vp[PPnp]//uni

The last field (uni) represents a category of the event. A category is used when a filter is applied to an event in the following step. The same filters are applied to events that have the same category name. Thus, the number of fields must be the same for events that have the same category name. That is, if you want to incorporate events that have different number of events, use different category names.

Each unfiltered event is a string expression of a target event. This is made by concatenating elements of a list which is obtained as the fourth argument of "extract_lexical_event/4" defined in "amismodel.lil".

`extract_lexical_event(+$ModelName, -$Category, +$LexEntry, -$Event)`
$ModelName	name of a probabilistic model
$Category	name of a category
$LexEntry	lexical entry
$Event	a list of strings that represents an event
Extract an event of a lexical entry

The name of a probabilistic model must be the first command-line argument of "unimaker".

If you want to specify the value of a feature function (integer or float), the following interface may be used.

`extract_lexical_event_feature_value(+$ModelName, -$Category, +$LexEntry, -$Event, -$Val)`
$ModelName	name of a probabilistic model
$Category	name of a category
$LexEntry	lexical entry
$Event	a list of strings that represents an event
$Val	the value of a feature function
Extract an event of a lexical entry and the corresponding feature value.

MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory

MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)