Parsing in Compiler Design

Parsing is the term used to describe the process of converting data between different formats. The parser can carry out this operation. The parser is a part of the translator that aids in organising the linear text structure in accordance with the grammar. A grammar is a collection of established rules.

You can also say that the parser is the stage of the compiler that accepts a token string as input and transforms it into the matching Intermediate Representation (IR) using the available grammar. Syntax Analyzer is another name for the parser.

Types of Parsing in compiler Design

Parsing is broadly classified into two types:

1. Top-Down Parser

2. Bottom-Up Parser

Top-Down Parser

The top-down parser expands the non-terminals and starts from the start symbol and finishes on the terminals. This parser build a parse for the provided input text using grammatical productions. The leftmost derivation is used in this parsing technique.

Top-Down Parser is further classified into two types:

  • Recursive descent parser
  • Non-recursive descent parser

Recursive Descent Parser

It is a type of top-down parser which begins with the non-terminal and constructs the parse tree from top to down.

A Recursive Descent Parser that doesn't require Backtracking. That’s why it is known as a Predictive Parser. A correctly written grammar remove the left recursion and left factoring from it which can be read by a recursive descent parser.

Example:

Before removing left recursion   After removing left recursion
E –> E + T | T
T –> T * F | F
F –> ( E ) | id  
E –> T E’
E’ –> + T E’ | e
T –> F T’
T’ –> * F T’ | e
F –> ( E ) | id  

Note:

Here e is Epsilon

Example:

Grammar: E --> i E'
E' --> + i E' | e

Non-recursive descent parser

Predictive parsing is a type of recursive-descent parsing that doesn't involve any backtracking. Other names for it include LL(1) parser, predictive parser, parser without backtracking, and dynamic parser.

Instead of backtracking, it creates the parse tree using a parsing table.

It can anticipate which production will be used to replace an input string.

Predictive parser employs a look-ahead pointer, which links to the next input symbols, to complete its responsibilities. Predictive parser placed various restrictions on grammar and only accepts a LL(k) grammar which make a backtracking free parser.

It parses input and creates a parse tree using a stack and a parsing table. Input and stack both have an end symbol ($) that indicates the input has been used up and the stack is empty.

Bottom-up Parser

Bottom-up Parser is a parser that uses grammatical productions to construct the parse tree for the specified input string while compressing the non-terminals. It concludes that it begins with non-terminals and ends with the start symbol.

It employs the reverse of the rightmost derivation.

Shift-Reduce Parsing

For bottom-up parsing, shift-reduce parsing employs two distinct phases. Shift-step and reduce-step are the names of these steps.

Shift Step:

The next input symbol, known as the shifted symbol, is added by the input pointer during the shift step. The shifted symbol is placed on the stack. The parse tree treats the shifted symbol as a single node.

Reduce Step:

Reduce-step occurs when the parser detects a full grammar rule (RHS) and changes it to (LHS). When a handle is present at the top of the stack, then this parsing happens. A POP function on the stack is used to decrease, which pops off the handle and substitutes an LHS non-terminal symbol in its place.

It is further classified into two:

  • LR parser
  • Operator precedence parser

LR parser

A highly common bottom-up parser for context-free grammar used by computer programming language compilers and other related tools is called the LR parser.

It generates the rightmost derivation by reading their input from left to right.

It seeks to decrease the top-level grammar production by building up from the leaves, which is why it is known as a bottom-up parser.

There are three types of LR Parsers which are as follows:

  • Simple LR Parser (SLR) -This type of LR parser is simple to use and it cannot build a table for all kinds of grammar.
  • Canonical LR Parser (CLR) - It is the most effective and applies to a wide range of grammars.
  • Look Ahead LR Parser (LALR) -  It is intermediate in power between SLR and CLR.

Operator precedence parser

Operator Precedence Parsing is also a type of Bottom-Up Parsing that can be used to the class of Grammars known as Operator Grammar.

Basically, it produces the parse tree from the grammar and string that are provided. This parser requires only two consecutive non-terminals with the absence of epsilon on the right side of any production rule

Operator grammars can use the operator precedence parsing strategies.

Operator grammar can be exist if there isn't a production rule on the right side. A grammar is said to be operator grammar if the following two properties are satisfied:

1. Right side has no ε(Epsilon).

2. No two non-terminals appear sequentially, that is, there should be a terminal between two non-terminal.