Glob — Working with Files in Python (64/100 Days of Python)
Glob is a Python module that provides a convenient way to search for files that match a specified pattern. It allows you to use wildcard characters to match files with similar names or extensions. In this tutorial, we will explore the usage of the glob module in Python.
Basic Usage of the glob
module in Python
Glob can be used to obtain or search for specific files in the file system. A basic usage can be in a scenario where we have several files. For example, in my_folder
directory:
my_folder/
file1.txt
file2.txt
file3.jpg
file4.py
To get a list of all files in this directory, you can use the following code:
import glob
files = glob.glob('my_folder/*')
print(files)
This will output the following list:
['my_folder/file1.txt', 'my_folder/file2.txt', 'my_folder/file3.jpg', 'my_folder/file4.py']
Wildcard Characters
Glob supports the use of wildcard characters to match files with similar names or extensions. Here are the most commonly used wildcard characters:
*
: Matches any string of characters, including an empty string.?
: Matches any single character.[ ]
: Matches any character inside the brackets.[! ]
: Matches any character not inside the brackets.
For example, let’s say you want to get a list of all files in the my_folder
directory that have a .txt
extension. You can use the *
wildcard character to match any string of characters before the .txt
extension:
import glob
files = glob.glob('my_folder/*.txt')
print(files)
This will contain a list of 2 files:
['my_folder/file1.txt', 'my_folder/file2.txt']
You can also use the ?
wildcard character to match a single character. For example, to get a list of all files in the my_folder
directory that have a file name with 5 characters followed by a .txt
extension, you can use the following code:
import glob
files = glob.glob('my_folder/?????.txt')
print(files)
This will print the list of all the files that are in my_folder
, have 5 characters and a .txt
in the end:
['my_folder/file1.txt', 'my_folder/file2.txt']
Find Files Recursively with glob
By default, glob()
only searches the current directory for matching files. However, you can use the **
wildcard character to perform a recursive search that includes all subdirectories. Let’s say you have the following directory structure:
my_folder/
file1.txt
sub_folder1/
file2.txt
sub_sub_folder/
file3.txt
sub_folder2/
file4.txt
To get a list of all the .txt
files in the my_folder
directory and its subdirectories, you can use the following code:
import glob
files = glob.glob('my_folder/**/*.txt', recursive=True)
print(files)
Notice the recursive=True
flag. It’s necessary to tell glob
to search for the pattern recursively. The code above will output all the files that have a .txt
extension:
['my_folder/file1.txt', 'my_folder/sub_folder1/file2.txt', 'my_folder/sub_folder1/sub_sub_folder/file3.txt', 'my_folder/sub_folder2/file4.txt']
What’s next?
- If you found this story valuable, please consider clapping multiple times (this really helps a lot!)
- Hands-on Practice: Free Python Course
- Full series: 100 Days of Python
- Previous topic: 10 Most Useful Itertools Methods
- Next topic: Pathlib