Data Flow Analysis

All the optimization techniques we have learned earlier depend on data flow analysis. DFA is a technique used to know about how the data is flowing in any control-flow graph.

Example:

Forglobal common sub-expression elimination, we need to find the expression that computes the same value along with any available execution path of the program.

The Data-Flow Analysis Schema

In data flow analysis, a data flow value associates with every program point represents an abstraction for a set of all possible program states for that point.

The domain of this application is a set of possible data flow values.

IN[a]: The data flow value before the statement a.

OUT[a]: The data flow valueafter the statement a.

The main aim of the data flow problem is to find a set of constraints on the IN[a]’s and OUT[a]’s for statements a.

There are two sets of constraints: Transfer function and control-flow constraints.

## Transfer Function

The semantic of the statement are the constraints for the data flow values before and after a statement.

For example, if variable x has the value y before executing any statement, say p = x. Then, after the statement’s execution, the value for both x and p will be y.

The transfer function is the relationship between the data flow values before and after the assignment statement.

Transfer function comes in two ways:

• Forward propagation along with execution path
• Backward propagation up the execution path

### Forward Propagation:

The following points should be considered:

• In forward propagation, the transfer function for any statement s will be represented by Fs.
• This transfer function takes the data flow values before the statement and produces the output or new data flow value after the statement.
• So the new data flow values after the statement will be OUT[s] = Fs (IN[s]).

### Backward propagation:

The following points should be considered:

• Backward propagation is the converse of forward propagation.
• This transfer function converts a data flow value after the statement to a new data flow value before the statement.
• So the new data flow values will be IN[s] = Fs (OUT[s]).

### Control-Flow Constraints

The second set of constraints is derived from the flow of control. If block B consists of statements S1, S2, …….., Sn, then the control flow value of Si will be equal to the control flow values into Si + 1. Which is:

IN [ Si +1 ] = OUT [ Si ], for all i = 1 , 2, …..,n – 1.

### Reaching Definition

The most common and useful data flow scheme is Reaching Definition. A definition D reaches the point P along with path following D to P such that D is not killed along the path.

### Live-Variable Analysis

Here we know about the value of a variable at point p is used from point p to the end. It means that the variable is live at point p; otherwise variable is dead at point p.

This is used for register allocation for basic blocks.

### Available Expressions

An expression a + b is said to be available at point p if all the path from entry node p evaluates a + b. There should not be any other assignment to a or b.

We can say that block kills expression a + b if it assigns a or b and does not computes a + b.

The main use of the available expression is to detect global common sub-expression.