Code Generation

Code Generation

The last phase of the compiler is code generation. It is the compiler's back-end that makes multiple passes over the IR before generating the target program. The code generator's main task is instruction selection, registers allocation and assignment, and instruction ordering.

Code Generator Design Issue

A good code generator should produce the correct code, which is the benchmark for any code generator. There are many special cases available that the code generator can face. Due to the special case, correctness has special significance. A code generator should be designed with taking correctness in mind. The main goal of designing a code generator is its easy implementation, testing, and maintenance.

1) Input to the Code Generator

The input for the code generator is the intermediate representation (IR) of the source code generated by the compiler's front-end and symbol table's information. This input is used to know the run-time addresses of the data objects denoted by the names in the IR.

The intermediate representation has several choices such as three address codes like quadruple, triple, and indirect triples, virtual representations such as bytecodes and stack-machine code, linear representation like postfix notation and graphical representation like syntax tree and DAG representations.

We have assumed that the front-end of the compiler will be scanned, parsed, and translate the source code program into a low-level IR. So the target machine can easily manipulate the values of the names visible in the IR.

We will also assume that the intermediate code as input requires for code generation is error-free.

2) Target Program

The target program is known as the output of the code generator. The output can be any one of the following:

  • Absolute machine language: This kind of output can be placed at any particular location in memory and executed immediately.
  • Relocatable machine language: Thistype of output is also known as an object module. It allows the subprogram to be compiled separately.
  • Assembly language: Thistype of output makes the activity of code generation somewhat easier.

3) Instruction Selection

The target machine will execute the IR program mapped into a code sequence by a code generator.

The following factors determine the difficulty of performing this mapping:

  • The level of the intermediate representations
  • The nature of the instruction-set architecture
  • The quality of the generated code
  • When the IR level is high, the code generator can translate each IR statement into a sequence of machine instructions with the help of a template code.
  • When the IR level is low, the code generator can use this information to generate more efficient code sequences.
  • The difficulty of instruction selection is determined by the nature of the instruction set of the target machine. The instruction set should be uniform and complete.
  • If the target machine does not uniformly support each data type, then each exception to the general rule requires special handling.
  • If we consider the efficiency of the target program, the instruction speeds and machine idioms are other important factors.
  • The speed and size are criteria to detect the quality of the generated code.

Example:

a = b + c

d = a + e

The set of three-address statements will be translated into:

LD R0, b                                // R0 = b

ADD, R0, R0, c                     // R0 = R0 + c

ST, a, R0                               // a = R0

LD R0, a                               // R0 = a

ADD, R0, R0, e                   // R0 = R0 + e

ST d, R0                              // d = R0

Register Allocation

A key problem in code generation is to decide what values should be assigned to which registers. Those instructions that involve register operands are invariably shorter and faster than those instructions which involve memory operand.

The use of register can raise some sub-problem mentioned below:

  1. Register allocation: During register allocation, we will select the set of variables that reside in registers.
  2. Register assignment: During the register assignment, we will pick the specific register that holds a variable.

Certain machines require register-pairs (even-odd pair of register) for some operands and results. 

Example

Consider the following form of multiplication instruction:

M x, y 

Where x is the multiplicand odd register in the even/odd register pair and y is the multiplier. The odd/ even register occupies entire products.

Evaluation Order

The efficiency of the target code can be affected by the order in which computations are performed. Some other computation orders require fewer registers to hold intermediate results than others.