Symbol Table Compiler Design

Symbol Table

It is an important data structure used by the compiler. It stores information about various entities such as object, class, variable names, functions name, interfaces, procedure, literals, string etc. 

The purpose of the symbol table is given below: 

  • It stores the name of different entities in the structured form in one place. 
  • It checks the variable has been initialized or not.
  • It is used to determine scope resolution.
  • They are used by the compiler to achieve compile-time efficiency.

Syntax of symbol table:

<symbol name, type attribute>

Example

Suppose a variable store the information about the following variable declaration as follows:

static int marks;

Then, it will store information in this way:

<marks, int, static>

A various phase of the compiler uses the symbol table:

  1. Lexical Analysis– Maintain new entries in the table. 
  • Syntax Analysis– Add the information about the scope, dimension, attribute type, use, etc., in the table.
  • Semantic Analysis– It uses all information present in the table to check semantics by verifying that expression and assignment are semantically correct or not and then update it.
  • Intermediate Code Generation– To check how much and what type of run-time is allocated to help in adding temporary variable information.
  • Code optimization– It uses information for machine-dependent optimization.
  • Code Generation– Generated code by using identifiers available in the symbol table.

Implementation

The implementation of the symbol table can be done in any of the following ways:

  • Linear (sorted or unsorted ) List
  • Hash table
  • Binary search tree (BST)

1. List

  • It uses an array to store the name and associated information.
  • A pointer “Present" is marked at the end of all stored records, and new names are added in the same order as they arrived.
  • To find any name, we will search from the beginning of the list till the present pointer. If the match wasn't found in the table, it gave an error message as the use of an undeclared name.
  • If we insert any new name, we must ensure that the name should not already present otherwise, occur an error.
  • Insertion operation is fast, whereas lookup is slow for large tables.
  • It takes less amount of space.

2. Linked List

  • A link field is inserted into every record.
  • A pointer “XYZ" is marked to designate the first name of the symbol table.
  • Insertion operation is fast, whereas lookup is slow for large tables.

3. Hash Table

  • There have to be two tables in the hashing technique– a hash table and a symbol table. 
  • A table with an array of index range 0 to table size-1 is known as a hash table.
  • A hash function can search any name.
  • Insertion operation and lookup operation can be made very fast.

4. Binary Search Tree

  • A binary search tree is also a method to implement a symbol table.
  • Names are created as child nodes of root and obey the binary search tree's property.
  • Insertion operation and lookup operation are O(log2 n) on average.

Symbol Table Operations:

The following are the operations of the symbol table:

Insert()

  • The compiler's analysis phases often use this operation, where tokens are identified and kept in the table.
  • This is used to insert information about the unique name occurring in the source code.
  • It takes the symbol and attributes as an argument and stores the information in a symbol table.

Example:

int b;

should be proceeded by the compiler as:

insert(b, int);

loockup()

This operation is used to discover a name in the symbol table to detect:

  • If any symbol in the table exists.
  • A symbol is declared before being used or not.
  • Name is used in the scope or not.
  • Symbol’s initialization.
  • Check whether the name is declared more than once.

The format of the lookup() function is:

lookup(symbol)

The format may be distinct for different programming languages.