Automata Regular Expressions

Regular Expressions

Regular expressions are also referred as rational expressions, which are used to describe the algebraic description of regular languages. It is generally a sequence of characters that is used to find a string in language. Regular expressions play an important role in computer science applications, which involves:

  • It is used by many text editors and utilities to search block of text for certain pattern.
  • It is also used to design the compliers for programming languages.

Like other expressions, such as arithmetic expressions, logical expressions etc., regular expression also have permitted symbols and operators. Only defined symbols can be used to build a regular expression. For example:

 (a U b) a*

The regular expression has two parts:

First is (a U b) and second is a*

The symbols a and b are shorthand for sets {a} and {b}. In this (a U b) means ({a} U {b}). In this, {a, b} and {a}* means that the language consisting of all strings having any number of a’s (e.g. ?, a, aa….}.

Both expressions and parts are combined to form the value of the entire expression.

Formal Definitions Used in Regular Expression

Let R be a regular expression over alphabet ?, then:

  • ?, is a regular expression denoting by set {?}.
  • ? is a regular expression used for representing an empty set ?.
  • In regular expression, the union of any two regular expressions R1 and R2 is written as R1+ R2, which is also a regular expression denoting the set {a U b}.
  • In regular expression, the concatenation of any two regular expressions, say R1 and R2 is written as R1R2, which is also a regular expression denoting the set {ab}.
  • The closure of any regular expressions R written as R* is also a regular expression denoting the set {R*}.

Note: Do not confuse with regular expression ? and ?. The expression ? denotes the language containing only a single string, or we can say that empty string. On the other hand ? represents the language that does not contain any strings.

Operators used in Regular Expression

We have following permitted operators for building regular expressions:

  • Union
  • Concatenation (or dot)
  • Closure (or star)

Union:

The Union of two languages P and Q is denoted by P U Q, which is the set of strings that are in either P or Q or in both.

Example:

If P = {1, 2} and Q = {a, b}

P U Q = {1, 2, a, b}

Concatenation:

The Concatenation of two languages P and Q is denoted by P.Q, which is the set of strings that can be formed by taking any string in P concatenating it with any string in Q.

Example:

If P = {a, b} and Q = {?, d}

P.Q = {a, b, ad, bd}

Closure:

The closure of a language P is denoted by P*, which defines the set of those strings that can be formed by taking any number of strings from P with repetitions and concatenations of strings. 

Example:

If P = {1, 2}*

P= {?, 1, 2, 11, 22, ………..}

Precedence of Regular Expression Operators

If we are using parenthesis with regular expressions, then the evaluation of regular expression is performed on the basis of operator’s precedence. For Regular expression, the order of precedence is given below:

  • The star operator (*) has highest precedence.
  • Concatenation operator (.) has next precedence.
  • Union operator (+) has lowest precedence.

Examples of Regular Expression

Example 1:

Consider the alphabet ? = {a, b}, and describe the regular expression set for all strings having a single b.

Solution:

The regular expression will be:

R = a*ba*

Example 2:

Consider the alphabet ? = {a, b}, and describe the regular expression set for all strings having bbbb as substring.

Solution:

The regular expression will be:

R = (a+b)* bbbb (a+b)* 

Example 3:

Consider the alphabet ? = {a, b}, and describe the regular expression set for all strings ends with ab.

Solution:

The regular expression will be:

R = (a+b)*ab

Example 4:

Consider the alphabet ? = {a, b, c}, and describe the regular expression set for all strings, which contain no more than three b’s.

 Solution:

The regular expression will be:

R = (? +b) (? +b) (? +b)

This regular expression describes the string containing zero, one, two or three b’s and nothing else.