treetrans: Tool for tree transformation

Japanese version

This is a tool for the conversion of Penn Treebank-style trees using pattern rules.

treetrans [options] rule_module input_file output_database
rule_modulelilfes program in which pattern rules are implemented
input_fileInput treebank (Penn Treebank style)
output_databaseOutput treebank (lildb format)
Options
-vprint debug messages
-vvprint many debug messages

This tool inputs Penn Treebank-style trees from a text file, applies tree conversion rules to each input tree, and outputs the results into a lildb-style database. Pattern rules are implemented as lilfes programs with interfaces defined in "treetrans.lil". Parse trees are represented in feature structures defined in "treetypes.lil". For example, the following pattern rule converts a tree like "(... than/IN XXX)" into "(... (PP than/IN XXX:argument))".

tree_transform_class("than", "topdown", "weak").

tree_subst_pattern("than",
                   TREE_NODE\$Node & TREE_DTRS\$Dtrs,
                   TREE_NODE\$Node & TREE_DTRS\$NewDtrs) :-
    $Dtrs = [$Left & tree_any & ANY_TREES\[_|_],
             $Than & tree & TREE_NODE\(SYM\"IN" & WORD\SURFACE\"than"),
             $Right & tree & TREE_NODE\HEAD_MARK\argument],
    $NewDtrs = [$Left,
                TREE_NODE\(SYM\"PP" & MOD\[] & ID\[] & HEAD_MARK\modifier) &
                TREE_DTRS\[$Than, $Right]].

How to write tree conversion rules

First, write "tree_transform_class/3" in order to specify the name of a conversion rule, the order of rule application, and the behavior in which the rule application fails.
tree_transform_class(+$Name, +$Direction, +$Strict)
+$NameThe name of the conversion rule
+$DirectionThe order of applying the rule
  • "topdown": From a root to leaves
  • "bottomup": From leaves to a root
  • "rootonly": Only to the root of a tree
+$StrictThe behavior in which the rule application fails
  • "strict": Fail the conversion of a whole tree
  • "weak": Ignore the failure of this rule

Next, write conversion rules with the following interfaces. In all the interfaces, the first argument is the name of a rule that has been specified in "tree_transform_class/3". The treetrans tool traverses each node in parse trees and applies conversion rules in the order of "tree_transform_class/3" in the program file.

tree_ignore(+$Name, ?$Tree)
+$Namerule name
+$Treetree: parse tree
Remove a subtree that is unifiable with +$Tree.
tree_transform_rule(+$Name, +$InTree, -$OutTree)
+$Namerule name
+$InTreetree: input parse tree
-$OutTreetree: output parse tree
Convert $InTree into $OutTree.
tree_subst_pattern(+$Name, +$InPattern, +$OutPattern)
+$Namerule name
+$InTreetree: pattern of an input tree
+$OutTreetree: pattern of an output tree
Convert a parse tree that matches with $InTree using "tree_match/2" into $OutPattern.
tree_unify(+$Name, ?$Tree)
+$Namerule name
+$Treetree: parse tree
Unify $Tree with the target tree.
tree_match_pattern(+$Name, +$Pattern)
+$Namerule name
+$Treetree: pattern on a parse tree
Unify $Pattern with the target tree using "tree_match/2".

Conversion rules are applied in the order of tree_ignore/2, tree_transform_rule/3, tree_subst_pattern/3, tree_unify/2, tree_match_pattern/2.

Additionally, the following interfaces may be used for formatting an input tree before applying conversion rules.

delete_tree(+$Tree)
+$Treetree: parse tree
Remove a subtree that is unifiable with +$Tree.
nonterminal_mapping(+$InSym, -$OutSym)
+$InSymnonterminal symbol of an input tree
-$OutSymnonterminal symbol of an output tree
Convert nonterminal symbol $InSym into $OutSym.
preterminal_mapping(+$InSurface, +$InSym, -$OutSurface, -$OutSym)
+$InSurfaceinput word (surface form)
+$InSyminput nonterminal symbol
-$OutSurfaceoutput word (surface form)
-$OutSymoutput nonterminal symbol
Convert a word, $InSurface/$InSym, into $OutSurface/$OutSym.
preterminal_projection(+$InSym, -$NewSym)
+$InSympreterminal symbol
-$NewSymnonterminal symbol
Insert a nonterminal symbol as the mother of preterminal $InSym.

See the developers' manual of "treetrans.lil" for details. In conversion rules, you can use several tools such as "tree_binarize/2" (implemented in "binarizer.lil" to binarize a tree) and "mark_head/1", "mark_modifier/1" (defined in "markhead.lil" to annotate head/modifier/argument marks.


MAYZ Toolkit Manual MAYZ Home Page Tsujii Laboratory
MIYAO Yusuke (yusuke@is.s.u-tokyo.ac.jp)