treetrans: Tool for tree transformation

Japanese version

This is a tool for the conversion of parse trees using pattern rules.

treetrans [options] rule_module input_file output_database
rule_modulelilfes program in which pattern rules are implemented
input_fileInput treebank (text format)
output_databaseOutput treebank (lildb format)
Options
-vprint debug messages
-vvprint many debug messages

This tool inputs parse trees from a text file, applies tree conversion rules to each input tree, and outputs the results into a lildb-style database.

How to input parse trees

"treetrans" calls 'input_parse_tree/2' and reads parse trees from a text file. 'input_parse_tree/2' is defined in "treetrans.lil" as an interface of "treetrans". Its content is not implemented, and should be implemented by a grammar developer. A line in an input file is passed to the first argument of 'input_parse_tree/2', and a parse tree should be returned in the second argument. Parse trees must be represented with types defined in "treetypes.lil".

input_parse_tree(+$String, -$Tree)
+$StringA line in an input file
-$TreeA parse tree
Reads a parse tree from a line in the input file

If parse trees are written in the Penn Treebank-style format, you can simply use 'input_ptb_parse_tree/2' defined in "treetrans.lil". To use 'input_ptb_parse_tree/2', you need to implement the following interfaces defined in "treeio.lil".

ptb_empty_category(-$Category)
-$CategoryThe value of "SYM" to be regarded as an empty category
Specify a preterminal symbol that should be regarded as an empty category. "SYM" is a feature defined in "treetypes.lil".
ptb_preprocess_word(+$Input, -$Output)
+$Inputinput word
-$Outputpreprocessed input word
Apply preprocessing to an input word. For example, you can replace special characters and convert letters into small letters.
ptb_preprocess_pos(+$Input, -$Output)
+$Inputinput POS
-$Outputpreprocessed POS
Apply preprocessing to an input POS
ptb_delete_pos(-$POS)
-$POSPOS
Specify a POS that should be ignored. Partial trees that have only ignored POSs are also ignored. $POS represents results of 'ptb_preprocess_pos/2'.

After implementing them, call 'input_ptb_parse_tree/2' from 'input_parse_tree/2'. An example is like this.

ptb_empty_category("-NONE-").
ptb_preprocess_word($In, $Out) :- to_lower($In, $Out).
ptb_preprocess_pos($POS, $POS).
ptb_delete_pos(".").
ptb_delete_pos("""").
input_parse_tree($String, $Tree) :-
    input_ptb_parse_tree($String, $Tree).

If an input file is written in another format, implement 'input_parse_tree/2' by yourself.

How to write tree conversion rules

Parse trees are converted in the following steps.

First, the following interfaces may be used for preprocessing an input tree before applying conversion rules.

delete_tree(+$Tree)
+$Treetree: parse tree
Remove a subtree that is unifiable with +$Tree.
nonterminal_mapping(+$InSym, -$OutSym)
+$InSymnonterminal symbol of an input tree
-$OutSymnonterminal symbol of an output tree
Convert nonterminal symbol $InSym into $OutSym.
preterminal_mapping(+$InSurface, +$InSym, -$OutSurface, -$OutSym)
+$InSurfaceinput word (surface form)
+$InSyminput nonterminal symbol
-$OutSurfaceoutput word (surface form)
-$OutSymoutput nonterminal symbol
Convert a word, $InSurface/$InSym, into $OutSurface/$OutSym.
preterminal_projection(+$InSym, -$NewSym)
+$InSympreterminal symbol
-$NewSymnonterminal symbol
Insert a nonterminal symbol as the mother of preterminal $InSym.

Pattern rules are implemented as lilfes programs with interfaces defined in "treetrans.lil". Parse trees are represented in feature structures defined in "treetypes.lil". For example, the following pattern rule converts a tree like "(... than/IN XXX)" into "(... (PP than/IN XXX:argument))".

tree_transform_class("than", "topdown", "weak").

tree_subst_pattern("than",
                   TREE_NODE\$Node & TREE_DTRS\$Dtrs,
                   TREE_NODE\$Node & TREE_DTRS\$NewDtrs) :-
    $Dtrs = [$Left & tree_any & ANY_TREES\[_|_],
             $Than & tree & TREE_NODE\(SYM\"IN" & WORD\SURFACE\"than"),
             $Right & tree & TREE_NODE\HEAD_MARK\argument],
    $NewDtrs = [$Left,
                TREE_NODE\(SYM\"PP" & MOD\[] & ID\[] & HEAD_MARK\modifier) &
                TREE_DTRS\[$Than, $Right]].

First, write "tree_transform_class/3" in order to specify the name of a conversion rule, the order of rule application, and the behavior in which the rule application fails.

tree_transform_class(+$Name, +$Direction, +$Strict)
+$NameThe name of the conversion rule
+$DirectionThe order of applying the rule
  • "topdown": From a root to leaves
  • "bottomup": From leaves to a root
  • "rootonly": Only to the root of a tree
+$StrictThe behavior in which the rule application fails
  • "strict": Fail the conversion of a whole tree
  • "weak": Ignore the failure of this rule

Next, write conversion rules with the following interfaces. In all the interfaces, the first argument is the name of a rule that has been specified in "tree_transform_class/3". The treetrans tool traverses each node in parse trees and applies conversion rules in the order of "tree_transform_class/3" in the program file.

tree_ignore(+$Name, ?$Tree)
+$Namerule name
+$Treetree: parse tree
Remove a subtree that is unifiable with +$Tree.
tree_transform_rule(+$Name, +$InTree, -$OutTree)
+$Namerule name
+$InTreetree: input parse tree
-$OutTreetree: output parse tree
Convert $InTree into $OutTree.
tree_subst_pattern(+$Name, +$InPattern, +$OutPattern)
+$Namerule name
+$InTreetree: pattern of an input tree
+$OutTreetree: pattern of an output tree
Convert a parse tree that matches with $InTree using "tree_match/2" into $OutPattern.
tree_unify(+$Name, ?$Tree)
+$Namerule name
+$Treetree: parse tree
Unify $Tree with the target tree.
tree_match_pattern(+$Name, +$Pattern)
+$Namerule name
+$Treetree: pattern on a parse tree
Unify $Pattern with the target tree using "tree_match/2".

Conversion rules are applied in the order of definitions by tree_transform_class/3. For one conversion rule, conversions by the iterfaces are tested in the order of tree_ignore/2, tree_transform_rule/3, tree_subst_pattern/3, tree_unify/2, tree_match_pattern/2. If a conversion by one interface succeeds, the rest conversions for the same rule will not be tested.

In conversion rules, you can use several tools such as "tree_binarize/2" (implemented in "binarizer.lil" to binarize a tree) and "mark_head/1", "mark_modifier/1" (defined in "markhead.lil" to annotate head/modifier/argument marks.


MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)