FLEX (Fast Lexical Analyzer Generator)
FLEX stands for Fast Lexical Analyzer Generator. Around 1987, Vern Paxson created Flex in C with a great deal of input and inspiration from Van Jacobson. Van Jacobson's approach is partially implemented in the fast table representation. Vern Paxson and Kevin Gong carried out the execution. These programs use a deterministic finite automaton to accomplish character parsing and tokenizing (DFA). A DFA is a hypothetical computer that understands regular languages. In other words, it aids in the transformation of a string of characters into a string of tokens. This syntax is divided into several tokens by the lexical analyzer. It eliminates any additional blank lines or code comments.A subset of the Turing machine collection includes these devices. DFAs are analogous to right-moving, read-only Turing machines.
Regular expressions are used as the foundation for the syntax. Lexical analysers, often known as lexers, are programs that carry out lexical analysis during compiler design. A tokenizer or scanner is found in a lexer. In the design of the compiler, the Lexical Analyzer's job is to receive character streams from the source code, look for valid tokens, and then transfer the information to the Syntax Analyzer when needed. When generating the tokens, the Lexical Analyzer skips over whitespace and comments. Lexical analyzer will associate any errors with the source file and line number if they are present.
How does FLEX work?
The FLEX works in the three steps and below mentioned are the steps in which the processes will be carried out by FLEX
Step 1: A lex language input file entitled lex.l that describes the lexical analyser that will be created. The lex compiler converts the lex.l programme to a C programme, which is stored in a file with the name lex.yy.c.
Step 2: The C compiler creates an executable file called a.out from the lex.yy.c file.
Step 3: The output file a.out creates a stream of tokens from a stream of input text.
Note: Here lex.l , lex.yy.c and a.out are the file names that are particularly used in the FLEX.
Program structure of FLEX
Definition section
Variable declarations, standard definitions, and manifest constants are all found in the definition section. Text within the definition section's "%%%" brackets. Anything typed between these brackets is immediately transferred to the file lex.yy.c
Rulers section
A list of regulations is contained in the rules section and is formatted as follows: pattern action, pattern must be unanticipated, action must start on the same line, etc. "%%%%" surrounds the rule section.
User code section
This section includes extra functions and C statements. Additionally, these procedures may be individually assembled and put into the lexical analyzer.
Advantages of FLEX
- Makes it easier to find a token in the symbol table.
- Removes comments and blank lines from the source program.
- Connects error messages to the origin of the program.
- If there are macros in the source program, it extends them for you.
- The read-only input characters from the source program.
- The lexical analyzer method is used by program like compilers, which may take the parsed data from a programmer's code and produce a built binary executable code.
- With the aid of a separate lexical analyzer, you can construct a customized and potentially more effective processor for the task.
Disadvantages of FLEX
- It takes a long time to read the source code and divide it into tokens.
- Compared to PEG or EBNF rules, some regular expressions could be more challenging to comprehend.
- The lexer and its token descriptions require additional testing and improvement.
- The creation of tokens and lexer tables add additional runtime overhead.
Conclusion
The initial step in the compiler design process is lexical analysis. Lexemes and tokens are groups of characters that are incorporated in a source program in accordance with a token's matching pattern. The program's whole source code is scanned using a lexical analyzer. Token identification in the symbol table is aided by a lexical analyzer. A lexical error is a character string that cannot be converted into a valid token. Useful error recovery technique is to remove one character from the remaining input. While the parser performs syntax analysis, the lexical analyzer scans the input program. By removing unwanted tokens, it makes lexical analysis and syntactic analysis easier.