Syntax Analysis Compiler Design

Syntax Analysis

This article will describe the parsing method used in the compiler. The grammatical rule of programming language can be constructed with the help of context-free grammars or BNF (Backus–Naur form) notations. The grammar offers some great benefits for language designers and compiler designers.

  • Grammar provides an exact, not much hard, linguistic description of a programming language.
  • With the help of classes of grammars, a well-organized parser can be constructed that determines the syntactic structure of the source program. A parser-construction can detect ambiguities and spots the trouble that has been missed at the beginning phase of a language.
  • The grammar is helpful for converting a source code into the correct target code and for detecting errors.
  • A grammar provides a language developed by adding new features to perform new operations. The new constructs can easily be implemented by obeying the grammatical rule of the language.

Introduction

In this section, we examine the position of the parser in a typical compiler. And then, we will look at the typical grammars for arithmetic expressions. 

The Role of the Parser

In the figure given below, the parser collects a collection of token forms from the lexical analyzer. It verifies that the grammar of the source code generates the group of tokens. We believe that the parser should report any syntax error as soon as possible and remove that error to continue advancing the rest of the program. The parser's main role is to constructs the parse tree and forwards it to the compiler for further operation. Due to checking and translation action interspersed with parsing, the parse tree should not be constructed explicitly. So the parser and the front end could be implemented by a single model.

There are two kinds of parsers for grammars: top-down and bottom-up.

The commonly used parsing method is either top-down or bottom-up. As the name suggests, the top-down method built a parse tree from the top (root) to the bottom (leaves), while the bottom-up starts from leaves and works till the top (root). In both cases, an input to the parser is scanned from left to right and one input at a time.

Context-Free Grammar

Grammar was brought to systematically narrate the syntax of programming languages like expressions and statements. A lexical analyzer can check tokens with the help of regular and patterns rule. But due to some limitation of the regular expression, the lexical analyzer cannot detect the syntax of a particular sentence. Therefore the context-free grammar is used by this phase, which is accepted by pushdown automata.

The production is:

                    stmt            if (expr) stmt else stmt                               (1.1)

CFG is a subset of regular grammar, as shown in the figure:

After seeing the figure above, we can say that every Regular Grammar is also context-free grammar. There is some problem that is far away from the regular grammar that's why CFG helps to describe the syntax of programming language.

Context-free grammar consist of four components:

  • The terminals are a basic symbol in which string is formed. Terminal symbol (?) is a group of tokens. In (1.1), If, else and the symbol "(" and ")" are terminal.
  • Non-terminals are the linguistic variable that denotes sets of strings. It is represented by (V). In (1.1), stmt and expr are non-terminals. The non-terminals help to define the language generated by the grammar. 
  • In grammar, non-terminals are the start symbol (S), which is the starting point of production. 
  • The productions portray the pattern in which the terminals and non-terminals merge to form a string. Each production consist of:

 a) The production's left side is known as non-terminals. The left side of production is also known as the head of the productions.

 b) An arrow.

 c) The production’s right side should have zero or more terminals and non – terminals. The right side of production is also known as the body of the production.