Derivation and Parse Tree in Compiler Design

Derivation and Parse Tree

In this article, we will learn Derivation and Parse Tree.

Derivations

The parse tree can be constructed by taking a derivational view in which production is treated as rewriting rules. In each rewriting step, a non-terminal is replaced by the body of its production. At each step of the derivation, we have to make two decisions. First, we have to choose which non-terminal should be replaced, and the second is which production rule should choose to replace that non-terminal.

Each non-terminal can be replaced by more than one derivations in the same production rule, but the order of replacement will be different.

Left-most Derivation

If the input is scanned and replaced with the production rule from left to right, it is known as left-most derivation. In other words, we can say that we read the input string from left to right.

Example

Consider the following grammar-

S ? aB / bA

S ? aS / bAA / a

B ? bS / aBB / b

Let us consider a string w = aaabbabbba

Now, derive the string w using left-most derivation as follows

S? aB

? aaBB                    (Using B? aBB)

? aaaBBB                (Using B? aBB)

? aaabBB                 (Using B? b)

? aaabbB                  (Using B? b)

? aaabbaBB              (Using B? aBB)

? aaabbabB               (Using B? b)

? aaabbabbS               (Using B? bS)

? aaabbabbbA        (Using S ? bA)

? aaabbabbba         (Using A ? a)

Right-most Derivation

In the right-most derivation, the right-most non-terminal is always selected. If the input is scanned and replaced with the production rule from right to left, it is known as right-most derivation.

Example

Consider the following example:

S ? aB / bA

S ? aS / bAA / a

B ? bS / aBB / b

Let us consider a string w = aaabbabbba

Now, derive the string using right-most derivation as follows:

S   ? aB

?  aaBB                    (Using B ? aBB)

? aaBaBB                 (Using B ? aBB)

? aaBaBbS               (Using B ? bS)

? aaBaBbbA             (Using S ? bA)

? aaBaBbba              (Using A ? a)

? aaBabbba              (Using B ? b)

? aaaBBabbba          (Using B ? aBB)

? aaaBbabbba          (Using B ? b)

? aaabbabbba           (Using B ? b)

Parse Tree

The hierarchical structure of a symbol (terminal or non-terminal) is known as a parse tree. The symbol represents the derivation of grammar to get the input string. It depicts how productions are put to replace non-terminal. Start symbol of the production rule will be the root of a parse tree. The role of the parse tree is to see how the string is formed using the start symbol. An interior node of the parse tree is non-terminal, and all the leaf node of the parse tree is terminal. An in-order traversal of the parse tree will give an original input string.

Example 1

Production rules

S = S + S | S * S

S = a|b|c

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Example 2

Consider the following production rule

S -> sAB

A -> a

B -> b

If the input string is “sab”, then the parse tree is:

Another production rule

S -> AB

A -> c/aA

B -> d/bB

If the input string for the following production is "acbd", then the parse tree will be:

Ambiguity

For a given input string, if there exists more than one parse tree, then grammar A is said to be ambiguous. In another way, we can say that ambiguous grammar is one that produces more than one left-most derivation or right-most derivation for a given input string. Inherent language is generated by this kind of grammar.

For the design of the compiler, the grammar should be unambiguous. A grammar with ambiguity is not good for compiler design. There are no methods that can automatically detect ambiguity. To remove the ambiguity, we have to rewrite the full grammar that doesn’t contain ambiguity. It can also be removed by setting and following the associativity and precedence limitations.

Example

S = aSb | SS

S = C

For the string aabb, the above production rule will produce two parse trees: