LEX

LEX

Lex is a tool/computer program that generates a Lexical analyzer. Lex is developed by Vern Paxson in C around 1987. Lex works together with the YACC parser generator. It allows us to identify a Lexical analyzer by specifying regular expressions to describe patterns for tokens. The Lex language will be the input for the Lex tool, and the tool is termed as the Lex compiler. The role of the Lex compiler is to convert the input pattern into a transition diagram and produce code in a file called Lex.yy.c

Installing Lex on Ubuntu

The following commands are used to install the Lex on Ubuntu:

sudo apt-get update

sudo apt-get install flex   

Use of Lex

The below figure shows the working of Lex. The input file lex.1 is written in Lex language and describes the Lexical analyzer to be generated. Next, the Lex compiler executes the Lex.1 program and transform it into a C program, named lex.yy.c. Then, the C compiler compiles this file into a program a.out. The C compiler's output is working as a Lexical analyzer that takes a stream of input characters and produces a stream of tokens.

The output of the C compiled file, named a.out, is a subroutine of the parser. It is a C function that returns an integer value and will be a code for one of the possible token names. The global variable yy1val consists of the attribute value, symbol table's pointer or nothing. It will share it between the parser and the Lexical analyzer. Therefore making it simple to return both the token name and attribute value. 

Structure of Lex program

Any Lex program is separated by %% delimiter into three sections. The syntax of the Lex program is given below:

                          {Declarations}

                           %%

                          {Translation rules}

                           %%

                           {Auxiliary functions}

The first part contains a declaration of a variable, regular definition, and manifest constant. The text in the declaration section is enclosed in "%{%}" brackets. 

The syntax of translation rule is:

 pattern { Action }

Every pattern is a regular expression. The standard definitions declared in the declaration part may be used by a pattern of this part. The action is a C programming code. We can develop many types of Lex by using various languages. The rule section is enclosed in "%%%%."

The last section contains the C statement and some additional functions. We can compile these functions separately and loaded with a Lexical analyzer.

The Lexical analyzer created by Lex works with the parser in the following ways. When the parser invokes the Lex, the Lexical analyzer starts reading the remaining inputs by taking one input symbol at a time until it finds the longest starting symbol of the input that matches the patterns pi. Then, it performs the action Ai. Ai will return to the parser. If it does not match due to whitespace or comments in Pi, in that case, the Lexical analyzer will further proceed to find additional Lexeme until one of the corresponding actions cause a return to the parser. The Lexical analyzer produces the token name, which will be used by the parser but uses the shared integer variable yylval to forward other information regarding the Lexeme found if needed.