Python Glob
Methods for matching files that include particular patterns in accordance with UNIX shell-related expansion rules are referred to as "glob" methods.
Similar methods for finding, locating, and searching for all of the files on a system can be found in Python's glob module. This comparable pattern in glob could be anything from the prefix of a file name to a file extension, or it could be any similarity between two or even more system files.
A task we want to complete with the Modules in Python is locating and finding all the files present in our system, which employs a similar pattern. Python has a large number of built-in modules for performing various tasks. A file extension, the prefix of the file name, or any other similarity between two or more files can be considered a similar pattern.
We can easily complete this task using a Python script and a variety of different Python modules, but some modules are more effective than others. This article will teach us how to use one of Python's most useful modules, the glob module, to fulfill file matching with a particular pattern inside of a program. We will go into great detail about the Python glob module, including how to use it within a program, its main characteristics, and its use.
Sample Code:
import glob
# Creating a glob object
files = glob.glob("./glob/home/sample_files/*")
print(files)
Output:

Python Glob Module:
We can use the Python glob module to search through all path names and find files that match a specific pattern. The Unix shell's established rules are used to specify the pattern that will be used for file matching.
The outcome of the program returns the result obtained by adhering to these rules for a particular pattern file matching in random order. When we use the file-matching pattern, we must adhere to specific conditions because the glob module can browse through the list of files at a particular location on our local disc. The module's main task will be to go through lists of disc files that only contain data in a certain pattern.
Functions for Pattern Matching:
In Python, there are a number of functions that can be used to list the files that match a specific pattern that has been defined inside the function. With the aid of these functions, we could obtain the output list of files in the specified folder that match the provided pattern in any order.
In this section, we'll go over the following functions:
- fnmatch()
- scandir()
- path.expandvars()
- path.expanduser()
The first two functions on the list above—fnmatch.fnmatch() and os.scandir()—are used to carry out the pattern-matching task rather than calling the sub-shell in Python. These two functions carry out the pattern-matching task and obtain a list of all filenames, with the order of the names being completely random. The glob module treats all files whose names start with a dot (.) as special cases, which is highly improbable in the fnmatch.fnmatch() function.
The final two functions in the list—os.path.expandvars() and os.path.expanduser()—can be used for the expansion of the shells and the tilde variable in the task of matching filename patterns.
The fnmatch module comes first. In the Unix shell style, we use the fnmatch module to match wildcards. A single file name is compared to a pattern using .fnmatch(), which returns TRUE if they match and FALSE otherwise. When the operating system makes use of a case-sensitive file system, the comparison is case-sensitive.
The following functions and their special characters are used in shell-style wildcards: ‘*’ - corresponds to every '?' - matches any single '[seq]' character - corresponds to any character in the series "[!seq]" matches any character; it is not a sequence.
The fnmatch.fnmatch(filename, pattern) function returns a boolean value indicating whether the pattern string and the supplied filename string match. Depending on whether the operating system is case-sensitive, both parameters will be modified to all lowercase or all uppercase letters before comparison. Example: A script that looks for files ending in ".py."
Let's examine its usage.
Sample Code:
import fnmatch
import os
for file in os.scandir("."):
if fnmatch.fnmatch(file, "*.py"):
print('Filename : ',file, fnmatch.fnmatch(file, "*.py"))
Output:

All of the Python files in the current directory of work with the.py extension are returned in the output.
Let's now examine the os module's scandir() function.
Using Python's os.scandir() function, we can get an iterator of os.DirEntry objects that match the entries in the directory indicated by the specified path.
The entries are given in arbitrary order after the special entries "." and ".." are excluded. Let's examine its usage.
Code:
import os
# Directory to be scanned
path = '/home/glob'
obj = os.scandir(path)
print("Files and Directories in '% s':" % path)
for entry in obj :
if entry.is_dir() or entry.is_file():
print(entry.name)
obj.close()
Output:

In order to get an iterator of objects with the name os.DirEntry that corresponds to each entry, we first scan the directory. The os.scandir() method is then used to list each file and directory in the given path. To determine whether an entry is a file or a directory, use the entry.is file() and entry.is dir() methods. We close the iterator and reveal the resources we have collected by using the scandir.close() method. The scandir.close() method is automatically used when an error appears while iterating; the iterator is finished, trash collection starts, or both. To free up memory, Python automatically deletes unused objects (built-in types or class instances). Python uses the trash or garbage collection technique to periodically release and recover memory fragments that are no longer required.
- path.expandvars()
- path.expanduser()
For tasks involving filename pattern matching, the two functions os.path.expanduser() and os.path.expandvars() can be used to enlarge shell and tilde variables. Bash makes use of the GNU reference for tilde variable expansion. If a word begins with an unquoted tilde character, all characters up until the first uncited slash (or all characters if there is no unquoted slash) are considered tilde prefixes ("). If none of the characters in the prefix are quoted, the characters in the tilde prefix after the tilde are taken into consideration as a potential name for the user. Let's examine the os module's path.expandvars() function.
Python expands the variables in the specified path using the os.path.expandvars() method. For substrings of a type nameorname in the given path, it replaces them with the value of the environment variable name.
Sample Code:
import os.path
# Path 1
path1 = r"% HOMEPATH %\Directory\file.txt"
# Path 2
path2 = r"C:\Users\$USERNAME\Directory\file.txt"
# Path 3
path3 = r"${TEMP}\file.txt"
exp_var1 = os.path.expandvars(path1)
exp_var2 = os.path.expandvars(path2)
exp_var3 = os.path.expandvars(path3)
print(exp_var1)
print(exp_var2)
print(exp_var3)
Output:

Explanation:
The os.path.expandvars() method is used in this Python program as an example. Importing the os.path module comes first. In addition to $name and $name on Windows, %name and %expansions are also supported.
The corresponding values in the indicated paths are then expanded into the environment variables. The paths with extended environment variables are then printed.
The os.path.expandvars() method replaced the environment variables "USERNAME," "HOMEPATH" and "TEMP" in the preceding example with their corresponding values.Let's now examine the OS module's path. expandusers() function.
To expand the user or path component in the given path to the user's home directory, use the Python function os.path.expanduser().
Sample Code:
import os.path
# Path
path = "~/file.txt"
full_path = os.path.expanduser(path)
print(full_path)
os.environ["HOME"] = "/home / GeeksForGeeks"
full_path = os.path.expanduser(path)
print(full_path)
path = "~Somebody / file.txt"
full_path = os.path.expanduser(path)
print(full_path)
Output:

Explanation:
The os.path.expanduser() method is used in this Python program as an example. The first step is to import the os.path module. Once an initial component in the given path has been expanded, we indicate the path using the os.path.expanduser() method. The HOME environment variable's value is then modified. The initial component in the same path is expanded using the os.path.expanduser() method. The initial user component is directly searched up with in password directory path with a user component at the beginning after expanding the initial component in the provided path. The original user component will now be expanded in the given path using the os.path.expanduser() method, and after doing so, the path will be reported.
Pattern Rules:
Let's be clear that it is not possible if anyone among us believes that we can define or employ any pattern to carry out the pattern-matching filename task. We are unable to define or apply any pattern to gather a list of files that share the same characteristics. When specifying the pattern for the filename pattern-matching functions in the glob module, we must adhere to a specific set of guidelines.
The set of guidelines for the pattern that we specify inside the pattern-matching functions of the glob module are as follows:
- When pattern matching, we must adhere to the entire set of accepted UNIX path expansion rules.
- We cannot define any ambiguous path inside the pattern; the path we define inside the pattern must be either absolute or relative.
- Only two wildcards—"*," "?"—are permitted as special characters inside the pattern; all other characters must be expressed using the symbol [].
- The filename segment (which is given in the functions) is subject to the pattern for glob module rules, and it terminates at the path separator, or '/', of the files.
For tasks involving filename pattern matching, these are some general guidelines for the patterns that we define inside the glob module functions. We must abide by these guidelines in order to complete the task successfully.
The Uses of the Glob Module:
We have already talked about how pattern matching is very useful for finding related files on our disc. We will talk about the uses of the glob module in this section and how useful it is to us.
The Python glob module has the following listed applications, and we can use it in the functions below:
- Sometimes we want to find a file with a specific prefix, a common string in the middle of a lot of file names, or the same specific extension. Now, in order to complete this task, we might need to write some code that will scan the entire directory before producing the result. In its place, the glob module will be very useful because we can use its features to perform this task quickly and easily while also saving time.
- In addition to this, the Glob module is very helpful when one of our programs needs to find a list of all the files in a specific file system whose names match a specific pattern. This task is simple to complete with Glob Module, and it can be done without opening the program's output in another sub-shell.
Therefore, by examining the glob module's applications, we can determine how crucial this module is to us and how we can use it to simplify the code and save time.
Significant functions of the glob Module:
We will now discuss additional glob module functions and how Python programs can use them. We'll also learn how these features help us in the pattern-matching task. Let's look at the list of glob module functions that are available; with their help, we can easily finish the task of filename pattern matching:
- iglob()
SYNTAX:
iglob(pathname or directory, *, recursive=False)
The iglob() function of the glob module is very helpful in producing arbitrary values for the list of files in the output. One can create a Python generator using the iglob() method. We can use the Python generator created by the glob module to list the files in a specified directory. This method also offers an iterator that iterates through values without saving all of the filenames simultaneously when called.
Let's examine the three parameters that the iglob() function requires.
Pathname:(optional) The pathname or location where we need to find a list of files with a similar structure. When operating in the same file directory as our Python installation, we could even omit the pathname parameter because it is a function's optional parameter.
'*' : (mandatory) specifies the pattern for which the function should collect file names and output a list, allowing one to customize its behaviour. The symbol "*" should be used as the first character in the pattern we describe for pattern matching inside the iglob() method, such as the file extension.
Recursive: (optional) accepts only boolean values (false or true). The function's ability to find file names using a recursive method is controlled by the recursive option.
Sample Code:
import glob
print("\nUsing glob.iglob()")
for filename in glob.iglob('/home/Somebody/Desktop/**/*.txt',
recursive = True):
print(filename)
Output:

We used the glob.iglob() functions directly from the glob module to get paths recursively from directories, files, and subdirectories.
- glob()
SYNTAX:
glob(pathname or directory, *, recursive = True)
We can get a list of files that match a specific pattern using the glob() method (Within the function, we must define that specific pattern). Following the path that we've specified inside the function, the list that the glob() function returns will be a string and should contain a path specification. The iterator or string for the glob() function returns the same value as the one returned by the iglob() method even though these values (filenames) aren't actually saved.
Sample Code:
import glob
print("Using glob.glob()")
files = glob.glob('/home/Somebody/Desktop/**/*.txt',
recursive = True)
for file in files:
print(file)
Output:

We recursively extracted the paths from files, subdirectories, and directories using the glob() function from the glob module.
- escape()
SYNTAX:
escape(pathname or directory)
escape() becomes very significant because it allows us to escape the defined character sequence that we described in the method. When searching for files with particular characters in their file names, such as those we'll define in the function, the escape() function comes in handy. It will match the sequence by matching a random literal text in the file names that contain that particular character.
Let's examine the escape() function's usage.
Sample Code:
import glob
print("All PNG files")
print(glob.glob("*.png"))
print("PNGs files with special characters in their name [_ $ #]")
char_seq = "_$#"
for char in char_seq:
esc_set = "*" + glob.escape(char) + "*" + ".png"
for file in (glob.glob(esc_set)):
print(file)
Output:

Here, we searched for PNG images whose names include the special characters #, _, and $ using the glob.escape() function.
All Functions in the glob Module:
All of the Python glob module's functions have been covered. Here is a quick recap of what it does and what those functions were.
glob() | iglob() | escape() |
Returns a list of files that match the path specified in the function argument. | Give us a generator object, and we will be able to iterate over it to get the names of the different files. | Especially useful when going to deal with filenames that consist of unusual characters |
glob() vs scandir():
Internally, a directory is searched for files that resemble a predefined pattern using the scandir() and glob() methods.
However, an iterator object is the result of the generator method scandir(). Instead, the memory-intensive glob() method returns a list.
Conclusion:
- The built-in Python function glob is mostly used for handling files. When a programmer needs to work with numerous files with the same or different extensions, like json, txt, and csv, they typically use glob.
- Patterns in files using the fnmatch(), expandvars(), scandir(), and expanduser() functions.
- The pattern matching guidelines used when handling files in relation to unix shell expansion guidelines.
- Finding a file with a specific prefix or a list of all the files in a particular file system whose names match a certain pattern are two common uses of Python's glob module.
- glob(), iglob(), and escape() are all important glob module functions.
- The uses of glob.glob() function is it can find files, match patterns with wildcards and regex, sort the output, and also remove items from it.
- The main distinction between the scandir() and glob() methods is that the former produces a list as the output, while the latter produces an iterator object.