Tutorial 2: Use tree actions to improve the AST

In the previous tutorial, we wrote a grammar for simple mathematic expressions, compiled it with the himecc tool to generate the corresponding parser and used the result to parse an expression. The produced Abstract Syntax Tree (AST) corresponded exactly to the grammar, but had some shortcomings. It contained numerous nodes for the variables that may hinder the further analysis of the AST. Let’s take a slightly more complex example. At this point, parsing the "(2+3)*4" expression would produce the AST on the left below:


It contains all the nodes that correspond to the variables in the grammar, e.g. exp_op0, etc. However, it is hard to see the structure of the mathematical expression in this AST. What we really want to have is the AST on the right. It only contains the intended semantic of the mathematical expression. See that the brackets are not present and still the structure of the AST retains the original semantic.

In this tutorial, we will extend our previous specification of mathematic expression to include tree actions, i.e. actions that will be applied to the produced AST in order to produce a more meaningful one and prune the unnecessary details.

The Drop action

The first and simplest tree action is the Drop action. It specifies that the node marked with this action, and all its children should be completely removed from the AST. For example, let’s say we do not want to see the tokens corresponding to the brackets in the final AST. What we have to do is to mark the corresponding two nodes with Drop actions. To do so, we append a special operator in the grammar rules. The exp_atom rules then become as follow:

exp_atom-> NUMBER
    | '('! exp ')'! ;

The terminals for the brackets are followed by the ! character, which means that the AST node corresponding to them will be applied the Drop action.

The Promote action

The Promote action is used to mark a node in the AST that should replace its parent. The node itself is removed from its collection of children and takes the place of its parent. If the node to be promoted has children, then they are put in the original place of the marked node. The Promote action is specified in the grammar rules using the ^ symbol. It is place behind the element that will correspond to the AST node that should be promoted. For example, we further amend the exp_atom rule as follow:

exp_atom-> NUMBER^
    | '('! exp^ ')'! ;

This rule has two alternatives, separated with the | character. Now, we applied the Promote action using the ^ character. In the first alternative we specify that the node corresponding to the NUMBER token will replace its parent, i.e. the node corresponding to the exp_atom variable. Furthermore, in the second alternative, we specify the not only should the brackets be discarded, but that the node corresponding to the exp element should replace the parent. We modify the rest of the rules in the same fashion as follow:

exp_op0	-> exp_atom^
    |  exp_op0 '*'^ exp_atom
    |  exp_op0 '/'^ exp_atom ;
exp_op1	-> exp_op0^
    |  exp_op1 '+'^ exp_op0
    |  exp_op1 '-'^ exp_op0 ;
exp	-> exp_op1^ ;

In this example, all the nodes for the mathematical operators have to be promoted to replace their respective parents. Furthermore, for the alternatives where the expression at one lever defers to the rule as the level below, we also specify that the node from the lower level should replace the node at the upper level. It is important to note that the Promote actions will chain so that a single node can be promoted multiple times. For example, the NUMBER node produced will be promoted in the exp_atom -> NUMBER^; rule. Then as it replaces its parent, it can also be promoted by the exp_op0 -> exp_atom^; rule and so forth.


After generating the parser for the grammar amended with the tree actions, we are able to obtain the AST on the right in the example given above.

Last edited Jun 13, 2013 at 6:34 AM by lwouters, version 12


No comments yet.