Python Regex

Facebooktwitterredditpinterestlinkedinmailby feather

The regular expression is a sequence of characters, which is mainly used to find and replace patterns in a string or file. Regular expressions help in manipulating, finding, replacing the textual data. Python provides the “re” module, which supports to use regex in the Python program.

Python Regex functions

The functions of the regex are the following:

Sr. Functions Description
1. search It returns the match object if match found in the string.
2. match This method matches the regex pattern in the string with the optional flag. It returns true if a match is found in the string; otherwise, it returns false.
3. split It returns a list in which the string has been split in each match.
4. sub It returns a list that holds all the matches of a pattern in the string.
5. findall It returns the list of the all matches found in the given string.

Regular Expression Formatting

The combination of meta-characters, special sequence, and set is used for regular expression formatting.  

Metacharacters

The following table of regular expression syntax is available in Python:

Sr. Meta characters Description
1. [] It denotes the set of characters.
2. . It denotes the special sequence.
3. ^ It denotes the pattern present at the beginning of the string.
4. $ It denotes the pattern present at the end of the string.
5. * It denotes zero or more occurrences of a pattern in the string.  
6. + It represents one or more occurrences of a pattern in the string.
7. {} The specified number of occurrences of a string in the pattern.
8. | It denotes either this or that character is present.
9. () It represents either this or that character is present.
10. \ It represents the special sequence.

Sets

A set is a set of characters inside a pair of the square brackets [] with special meaning:

Sr. Sets Description
1. [a-n] It returns a match for any lower case character which occur between a and n.
2. [arn] It returns a match where one of the specified character (a, r and n) occurs.
3. [^arn] It returns a match for any character expect a, r, and n.
4. [0-9] It returns a match for any digit between 0 and 9.
5. [0123] It returns a match where any of the specified digit (0, 1,2, or 3) occur.
6. [0-5][0-9] It returns a match for any two-digit numbers from 00 and 59
7. [a-zA-Z] It returns a match for any character alphabetically between a and z, upper case or lower case.      
8. [+] There is no special meaning of the +, *, ., |, (), $, {} in the sets, so [+] means: it returns a match of any + character in the string.

Special Sequences

A special sequence is a \ followed by one of the characters, and consists of special meaning. The list is given below.

The backslash symbol \ is used to escape several characters including all metacharacters.

Sr. Characters Description
1. \A It returns a match if the specified characters exists at the beginning of the string.
2. \b It returns a match if the specified characters exists in the beginning or at the end of a word.
3. \B It returns a match where the specified characters exist, but NOT at the beginning (or at the end) of a word.
4. \d It returns a match if the string contain digits(0-9)
5. \D It returns a match if the string does not contain digits.
6. \s It returns a match where the string contains a white space character.
7. \S It returns a match where the string does not contain a white space character.
8. \w It returns a match if the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
9. \W It returns a match where the string does not contain any word characters.
10. \Z It returns a match if the specified characters are at the end of the string

Regex Function

As we have discussed above, to work with Python regular expression, first we need to import the re module and apply the required functions. These functions are the following:

  • The findall() function

This method returns a list containing a list of all matches of a pattern within the string. It will return an empty list if no match found within the string. For example:

Example

Output:

  • The search() function

This function searches the string for a match and returns a match object. If there are one or more matches found, then it will return only the first occurrence of the match.

Example

Output:

Example

Output:

Match Object The match object contains the search information and the results. If there is no match found, then the None value will be returned.

The Match object methods 

There are the following methods which are used with the match object.

  1. span()- Returns the tuple containing the starting and end position of the match.
  2. strings() – Returns a string passed into the function.
  3. group() – The part of the string is returned where the match is found.

Example:

Output:

  • The split() function

The split () function returns a list of string by splitting the given string at each match.

Output:

The sub() function

The sub() function provides substitution of the matches. We can replace the matches with the text of the desired choice. For example:

Output:

In the above code, all whitespaces are replaced by 1.

We can control the number of replacement by count parameter.

Output:

In the above code, we can see that the first two whitespaces are replaced by the 1.

Facebooktwitterredditpinterestlinkedinmailby feather