Python Tutorial

Introduction Python Features Python Applications System requirements for Python Python Installation Python Basics Python Variables Python Data Types Python IDE Python Keywords Python Operators Python Comments Python Pass Statement

Python Conditional Statements

Python if Statement Python elif Statement Python If-else statement Python Switch Case

Python Loops

Python for loop Python while loop Python Break Statement Python Continue Statement Python Goto Statement

Python Arrays

Python Array Python Matrix

Python Strings

Python Strings Python Regex

Python Built-in Data Structure

Python Lists Python Tuples Python Lists vs Tuples Python Dictionary Python Sets

Python Functions

Python Function Python min() function Python max() function Python User-define Functions Python Built-in Functions Anonymous/Lambda Function in Python

Python File Handling

Python File Handling Python Read CSV Python Write CSV Python Read Excel Python Write Excel Python Read Text File Python Write Text File Read JSON File in Python

Python Exception Handling

Python Exception Handling Python Errors and exceptions Python Assert

Python OOPs Concept

OOPs Concepts in Python Classes & Objects in Python Inheritance in Python Polymorphism in Python Python Encapsulation Python Constructor Static Variables in Python Abstraction in Python

Python Iterators

Iterators in Python Yield Statement In Python

Python Generators

Python Generator

Python Decorators

Python Decorator

Python Functions and Methods

Python Built-in Functions Python String Methods Python List Methods Python Dictionary Methods Python Tuple Methods Python Set Methods

Python Modules

Python Modules Python Datetime Module Python Calendar Module  

Python MySQL

Python MySQL Python MySQL Update Operation Python MySQL Delete Operation

Python MongoDB

Python MongoDB

Python Data Structure Implementation

Python Stack Python Queue Python Hash Table Python Graph

Python Advance Topics

Speech Recognition in Python Face Recognition in Python Python Rest API Python Command Line Arguments Python JSON Python Virtual Environment Type Casting in Python Collections in python Python Enumerate Python Debugger Python DefaultDict

Misc

Python PPTX Python Pickle Python Seaborn Python Coroutine Python EOL Python Infinity Python math.cos and math.acos function Python Project Ideas Based On Django Reverse a String in Python Reverse a Number in Python Python Word Tokenizer Python Trigonometric Functions Python try catch exception GUI Calculator in Python Implementing geometric shapes into the game in python Installing Packages in Python Python Try Except Python Sending Email Socket Programming in Python Python CGI Programming Python Data Structures Python abstract class Python Compiler Python K-Means Clustering List Comprehension in Python3 NSE Tools In Python Operator Module In Python Palindrome In Python Permutations in Python Pillow Python introduction and setup Python Functionalities of Pillow Module Python Argmin Python whois Python JSON Schema Python lock Return Statement In Python Reverse a sentence In Python tell() function in Python Why learn Python? Write Dictionary to CSV in Python Write a String in Python Binary Search Visualization using Pygame in Python Latest Project Ideas using Python 2022 Closest Pair of Points in Python ComboBox in Python Python vs R Python Ternary Operators Self in Python Python vs Java Python Modulo Python Packages Python Syntax Python Uses Python Logical Operators Python Multiprocessing Python History Difference between Input() and raw_input() functions in Python Conditional Statements in python Confusion Matrix Visualization Python Python Algorithms Python Modules List Difference between Python 2 and Python 3 Is Python Case Sensitive Method Overloading in Python Python Arithmetic Operators Design patterns in python Assignment Operators in Python Is Python Object Oriented Programming language Division in Python Python exit commands Continue And Pass Statements In Python Colors In Python Convert String Into Int In Python Convert String To Binary In Python Convert Uppercase To Lowercase In Python Convert XML To JSON In Python Converting Set To List In Python Covariance In Python CSV Module In Python Decision Tree In Python Difference Between Yield And Return In Python Dynamic Typing In Python Abstract design pattern in python Builder design pattern in python Prototype design pattern in Python Creational design patterns in Python

How to

How to convert integer to float in Python How to reverse a string in Python How to take input in Python How to install Python in Windows How to install Python in Ubuntu How to install PIP in Python How to call a function in Python How to download Python How to comment multiple lines in Python How to create a file in Python How to create a list in Python How to declare array in Python How to clear screen in Python How to convert string to list in Python How to take multiple inputs in Python How to write a program in Python How to compare two strings in Python How to create a dictionary in Python How to create an array in Python How to update Python How to compare two lists in Python How to concatenate two strings in Python How to print pattern in Python How to check data type in python How to slice a list in python How to implement classifiers in Python How To Print Colored Text in Python How to develop a game in python How to print in same line in python How to create a class in python How to find square root in python How to import numy in python How to import pandas in python How to uninstall python How to upgrade PIP in python How to append a string in python How to open a file in python

Sorting

Python Sort List Sort Dictionary in Python Python sort() function Python Bubble Sort

Programs

Factorial Program in Python Prime Number Program in Python Fibonacci Series Program in Python Leap Year Program in Python Palindrome Program in Python Check Palindrome In Python Calculator Program in Python Armstrong Number Program in Python Python Program to add two numbers Anagram Program in Python Even Odd Program in Python GCD Program in Python Python Exit Program Python Program to check Leap Year Operator Overloading in Python Pointers in Python Python Not Equal Operator Raise Exception in Python Salary of Python Developers in India What is a Script in Python Singleton design pattern in python

Python RegEx

Python RegEx (python regular expressions) is a concept of writing and finding expressions in a pool of characters easily.

A Regular expression (RegEx) is a set of characters that defines search patterns in python.

For example;

^p….n$

The above expression is a RegEx in python that says find a match in words that start with p, end with n, and has four characters in between.

Expression

String

Matched?

^p….n$

 

python

Matched

pigeon

Matched

pepper

No Match

peer

No Match

 

In the above example, since the word "python" and "pigeon" started with p, ended with n, and have four characters in between those two, they are a match.

import re

pattern = '^p....n$'

test_string = 'python'

result = re.match(pattern, test_string)


if result:

  print("Search successful.")

else:

  print("Search unsuccessful.")

Output:

Search successful.

In the above code, we used a regular expression pattern and searched it against a test string. We used the match function of the re module in python.

Regular Expressions

Let’s see how regular expressions are written? And what characters are used in writing regular expressions?

In the previous example, '^p....n$', both ^ and $ are metacharacters used to write regular expressions.

Metacharacters

Metacharacters in a regular expression deliver a special meaning or directive to the interpreting mechanism. Some of the Metacharacters are:

[] . ^ $ * + ? {} () \ |
  • Square Bracket []

[aip] is a regular expression. The square bracket specifies some characters that it wishes to match against the text. If any of the characters inside the square bracket matches the test text, it will be successful.

Expression

String

Matched?

[aip]

and

matched

april

matched

iris

matched

pupil

matched

mate

Not matched

As we can see, if any of the characters specified in the bracket match within the test string, it returns match successfully.

A range of characters can be defined using dash(-):

[a-d] is equal to [abcd]

[1-5] is equal to [12345]

[1-39] is equal to [1239] and not [12345….39]

 

By using ^ in [], we can specify complementary characters:

[^abc] is equal to all alpha characters except a, b and c.

[^0-9] means any non-digit character.

  • Period (.)

A period (.) in a regular expression signifies any character can take up its place. Any character can take up its place except the newline character (/n).

Expression

Test String

Matched?

abc

1 Match

ancd

1 Match (3 alphabets)

abcdef

2 Matches (6 alphabets)

ab

No Match

  • Caret (^)

Caret (^) symbol in regular expressions specifies if a test string starts with the provided character.

Expression

Test String

Matched?

^p

pathway

1 Match

python

1 Match

apron

No Match

^pa

pathway

1 Match

python

No Match

apron

No Match

  • Dollar ($)

As caret (^) is used to specify the starting character, dollar ($) is used to specify the end character.

Expression

Test String

Matched?

$n

pathway

No Match

python

1 Match

apron

1 Match

$on

pathway

No Match

python

1 Match

apron

1        Match

  • Star (*)

A star (*) symbol is present to the right of any character will check zero or more occurrences of that character in the test string.

Expression

Test String

Matched?

pa*n

 

pn

1 Match

pan

1 Match

paan

1 Match

paaaaaaan

1 Match

pain

No Match( as a is not followed by n)

par

No Match( as a is not followed by n)

paamn

No Match( as a is not followed by n)

  • Plus (+)

A plus (+) symbol is present to the right of any character will check one or more occurrences of that character in the test string.

Expression

Test String

Matched?

pa*n

 

pn

No Match

pan

1 Match

paan

1 Match

paaaaaaan

1 Match

pain

No Match( as a is not followed by n)

pa

No Match( as a is not followed by n)

paamn

No Match( as a is not followed by n)

  • Question mark (?)

A question mark (?) symbol is present to the right of any character will check zero or one occurrence of that character in the test string.

Expression

Test String

Matched?

Pa?n

 

pn

1 Match

pan

1 Match

paan

No Match

paaaaaaan

No Match

pain

No Match( as a is not followed by n)

pa

No Match( as a is not followed by n)

paamn

No Match( as a is not followed by n)

  • Braces ({})

A braces ({}) symbol is immediate right to a character specifies the number of occurrences of that character. An expression a{2,3} would mean, minimum of 2 occurrences of and a maximum of 3 occurrences in a text string.

Expression

Test String

Matched?

a{2,3}

ab

No Match

aab

1 Match (at aab)

aab aab

2 Matches (at aab aab)

aaab

1 Match (at aaab)

aab ab

1 Match (at aab)

abc paar

1 Match (at paar)

paaaar

2 Matches (at paaaar)

Let’s try one more example, [0-9]{1, 4}, the minimum occurrence of 1 and maximum occurrence of 4, for the digits 0-9.

Expression

Test String

Matched?

[0-9]{1, 4},

Ab12

1 Match (at Ab12)

Ab1234512

2 Matches (at Ab1234 and 512)

ab

No Match

  • Alternation (|)

An Alternation (|) symbol is like (a|c|d). We can specify the occurrence of any one of a, c or d.

Expression

Test String

Matched?

a|c|d

pn

No Match

pan

1 Match

pbn

1 Match

cn

1 Match

pnkj

No Match

cpajd

3 Matches (cpajd)

  • Group – ()

Group is used in an expression to group other sub-patterns. For example,  (a|b|c)yz match any string that matches either a or b or c followed by yz.

Expression

Test String

Match?

(a|b|c)yz

ayz

1 Match

yzb

1 Match

xyz

No Match

In the above example, () has grouped alternative options of a, b and c with fixed characters yz. If any test string contains a, b or c immediately followed by yz, it will show a match.

  • Backslash (\)

Backslash in python is a very powerful character. Backslash in a set of string characters nullifies any character's special meaning, which comes immediately after it. For example, \$n, $ in a regular expression conveys the end character. But in this case, since \ is put to the left of $, it will not be an expression element, and it will be read as a raw $ followed by n, i.e., $n by the python interpreter.

Similarly, if I write \\n in python, the first backslash nullifies the special significance of the next \. If we run this statement in the console, it will print \n and not a new line.

Let’s see an example.

print(“\$n”)

print(“\\n”)

print(‘those are dog\’s biscuit’)

Output:

$n

\n

those are dog’s biscuit

Special Sequences

Special sequences were devised so we could write commonly occurring expressions.

It uses commonly used symbols thus, saving a lot of time for the coder. We will look upon a few of these special sequences further down the article.

\A – It matches if the given characters are found at the starting of the string.

Expression

Test String

Matched?

\Athe

the moon in the night sky

Match

If the moon in the night sky

No Match

the bright star in the night sky

No Match

\b – It determines if the given characters match at the start or the end of a word.

Expression

Test String

Matched?

\bant

These antiques are amazing

Match

antenna is not signaling right

Match

there were a lot of red ants

Match

Its adamant to say

No  Match

 

Expression

Test String

Matched?

Ant\b

These antiques are amazing

No Match

antenna is not signaling right

No Match

there were a lot of red ants

No Match

Its adamant to say

Match

\B – It is opposite of \b. It determines if the given characters do not match at the start or the end of a word.

Expression

Test String

Matched?

\Bant

These antiques are amazing

No Match

antenna is not signaling right

No Match

there was a lot of comment

Match

Expression

Test String

Matched?

Ant\B

These antiques are amazing

Match

antenna is not signaling right

Match

there were a lots of comment

No Match

\d – It matches all the decimal values (0-9) in the test string.

Expression

Test String

Matched?

\d

123and4

Match (123and4)

If the moon in the night sky

No Match

bright star in the night sky01

Match(sky01)

\D – It matches all the non-decimal values (^0-9) in the test string.

Expression

Test String

Matched?

\D

123and4

Match (123and4)

If the moon in the night sky

Match

bright star in the night sky01

Match

\s – It matches if any white space is present in the test string. It is equivalent in terms of functioning to [ \t\n\r\f\v].

Expression

Test String

Matched?

\s

123and4

No Match

If the moon in the night sky

Match

bright star in the night sky01

Match

\S – It matches if no white space is present in the test string. It is equivalent in terms of functioning to [^ \t\n\r\f\v].

Expression

Test String

Matched?

\s

123and4

Match

If the moon in the night sky

No Match

bright star in the night sky01

No Match

\w – It matches any alphanumeric (digits and alphabets) present in the test string. It is equivalent to [a-zA-Z0-9_]. By the way, underscore _ is also considered an alphanumeric character.

Expression

Test String

Matched?

\w

123_and4

Match

If the moon in the night sky_

Match

***********

No Match

 

\W – It matches if no alphanumeric (digits and alphabets) are present in the test string. It is equivalent to [^ a-zA-Z0-9_].  Since underscore _ is also considered an alphanumeric character, it un match if it finds one.

 

Expression

Test String

Matched?

\W

123_and4

No Match

If the moon in the night sky_

No Match

***********

Match

 

\Z – It matches if the given characters are found at the starting of the string.

 

Expression

Test String

Matched?

Sky\Z

the moon in the night

No Match

If the moon in the night sky

Match

bright star in the night

No Match

Python RegEx Module – re

re- module is a python module developed to work with general expressions. The module contains functions and other constants to deal with regular expressions in python. Let's look at a few most commonly used functions.

  • findall()

The re.findall() method is one of the most commonly used methods of this module. It returns a list of all the matches found in the given string. An example is illustrated below. It returns an empty string if no match is found.

Code:

#program to find all numbers in a string

import re

string = ‘java 2 has 3 many45 features”

pattern = ‘\d+’


result = re.findall(pattern, string)

print(result)

Output:

[‘2’, ‘3’, ’45’]
  • split()

The re.split() method looks for the match and then split the string from where it matches the given text string. It will return the original string if no split is found.

Code:

#program to find all numbers in a string

import re


string = ‘java 2 has 3 many45 features”

pattern = ‘\d+’


result = re.split(pattern, string)

print(result)

Output:

[‘java ’, ‘ has ’, ‘ many’, ‘features’]

Note: We can pass maxsplit argument to re.split method. It restricts the number of splits that occur. It returns the splited string as output. In case the value of max split is not mentioned, 0 is the default value, meaning max no. of split occurs.

 

Code:

#program to find all numbers in a string

import re



#maxsplit = 1

#spliting the string only once

string = ‘java 2 has 3 many45 features”

pattern = ‘\d+’



result = re.split(pattern, string, 1)

print(result)

 

Output:

[‘java ’, ‘2 has 3 many45 features’]
  • sub()

The re.sub() method looks for a match defined by the pattern. After it finds the match, it replaces the matched value with a predefined replace value. It returns the original string if no match is found.

Code:

#program to remove all the whitespace from the text

import re


#multiline string

string = ‘java 2 has 3 many45 features”


#matches all the whitespace characters

pattern = ‘\s+’


#emty string

replace = ‘’

new_string = re.sub(pattern, replace, string)

print(new_string)

Output:

java2has3many45features

We can pass a parameter called count in the argument. It restricts the no. of matches to be replaced. The default value of the count is set to 0.

Code:

#program to remove all the whitespace from the text

import re


#multiline string

string = ‘java 2 has 3 many45 features”


#matches all the whitespace characters

pattern = ‘\s+’


#emty string

replace = ‘’


#count = 1

new_string = re.sub(pattern, replace, string, 1)

print(new_string)

Output:

java2 has 3 many 45 features
  • subn()

The re.subn() method is similar to re.sub() looks for match defined by the pattern. After it finds the match, it replaces the matched value with a predefined replace value. It returns a tuple with two parameters, result, and count of replaces. It returns the original string if no match is found.

Code:

#program to remove all the whitespace from the text

import re

#multiline string

string = ‘java 2 has 3 many 45 features”




#matches all the whitespace characters

pattern = ‘\s+’




#emty string

replace = ‘’

new_string = re.subn(pattern, replace, string)

print(new_string)

Output:

(‘java2has3many45features’, 6)
  • search()

The re.search() method is doing what it sounds like. It return true if it finds a match for the given pattern.

Code:

import re

#multiline string

string = ‘java 2 has 3 many 45 features”

#check if java is at the beginning of the string

pattern = ‘\Ajava’

match = re.search(pattern, string)

if match:

            print(“match successful”)

else:

            print(“match not successful”)

Output:

match successful

Match Object in re module

The match object in the re module has several methods and attributes. A few commonly used methods of match objects are:

match.group()

The group() method returns the matched portion of the text string.

Code:

import re

#multiline string

string = ‘java 256 88 has 3 many 45 features”


#three digits followed by two digit

pattern = ‘(\d{3}) (\d{2})’


match = re.search(pattern, string)


if match:

            print(“match.group()”)

else:

            print(“match not  found”)

Output:

256 88

match.start(), match.end() and match.span()

match.start() returns the index of the first match. Similiarly, match.end() returns the index of the last matched character.

On the other hand, match.span() returns a tuple of start and end index.

>>> match.start()

5

>>>match.end()

10

>>>match,span()

(5, 10)

There are several other methods, and functions in the re module. Please refer to the official documentation to know more about the re module.



ADVERTISEMENT
ADVERTISEMENT