Introduction
The glob
module in Python is a powerful tool for file pattern matching. It allows you to search for files and directories whose names match a specified pattern, using wildcards similar to those used in Unix shell commands. This module is especially useful for tasks involving file manipulation, such as batch processing files, organizing files, and automating tasks that involve file I/O.
In this article, we’ll explore the glob
module in detail, covering its functions, usage, and practical examples to demonstrate its capabilities.
Importing the glob
Module
To use the glob
module, you first need to import it into your Python script:
import glob
Basic Usage
The primary function in the glob
module is glob()
. This function returns a list of file paths that match a specified pattern.
Pattern Matching
The glob
module uses Unix-style pathname expansion, meaning it supports wildcards like *
, ?
, and []
:
*
: Matches zero or more characters.?
: Matches exactly one character.[]
: Matches any one of the characters enclosed within the brackets.
Example
Let’s start with a simple example. Assume we have a directory with the following files:
example/
├── file1.txt
├── file2.txt
├── fileA.txt
└── script.py
To list all .txt
files in the example
directory, we can use the glob
function with the *.txt
pattern:
import glob
# List all .txt files in the example directory
txt_files = glob.glob('example/*.txt')
print(txt_files)
This would output:
['example/file1.txt', 'example/file2.txt', 'example/fileA.txt']
Recursive Search
The glob
module also supports recursive searching using the **
pattern. To enable recursive searching, you need to use the recursive
parameter set to True
.
Example
Suppose the example
directory has the following structure:
example/
├── subdir1/
│ ├── file3.txt
│ └── file4.py
├── subdir2/
│ └── file5.txt
└── file1.txt
To list all .txt
files in the example
directory and its subdirectories, you can use:
import glob
# List all .txt files recursively in the example directory
txt_files = glob.glob('example/**/*.txt', recursive=True)
print(txt_files)
This would output:
['example/file1.txt', 'example/subdir1/file3.txt', 'example/subdir2/file5.txt']
Functions in the glob
Module
The glob
module provides two main functions: glob()
and iglob()
.
glob()
This function returns a list of paths matching a pathname pattern.
Syntax:
glob.glob(pattern, recursive=False)
pattern
: The pattern to match.recursive
: IfTrue
, the pattern'**'
will match any files and zero or more directories, subdirectories, and symbolic links.
Example:
import glob
# List all Python files in the current directory
python_files = glob.glob('*.py')
print(python_files)
iglob()
This function returns an iterator which yields the same values as glob()
without storing them all simultaneously.
Syntax:
glob.iglob(pattern, recursive=False)
pattern
: The pattern to match.recursive
: IfTrue
, the pattern'**'
will match any files and zero or more directories, subdirectories, and symbolic links.
Example:
import glob
# Iterate over all Python files in the current directory
for python_file in glob.iglob('*.py'):
print(python_file)
Practical Examples
Example 1: Moving Files Based on Extension
Suppose you want to organize your files by moving all .txt
files to a text_files
directory.
import glob
import shutil
import os
# Create the target directory if it doesn't exist
os.makedirs('text_files', exist_ok=True)
# Move all .txt files to the text_files directory
for txt_file in glob.glob('*.txt'):
shutil.move(txt_file, 'text_files')
Example 2: Reading All CSV Files in a Directory
If you have multiple CSV files and you want to read them into Pandas dataframes, you can use the glob
module to get the list of files.
import glob
import pandas as pd
# List all CSV files in the current directory
csv_files = glob.glob('*.csv')
# Read each CSV file into a DataFrame
dataframes = [pd.read_csv(csv_file) for csv_file in csv_files]
Example 3: Finding Files with Specific Names
You might need to find files with specific names or patterns. For example, finding all files that start with ‘data’ and end with ‘.log’.
import glob
# List all files starting with 'data' and ending with '.log'
log_files = glob.glob('data*.log')
print(log_files)
Tips and Best Practices
- Use Raw Strings for Windows Paths: On Windows, use raw strings (
r'path\to\file'
) to avoid issues with backslashes. - Combine with Other Modules: The
glob
module can be combined with other modules likeos
,shutil
, andpandas
for more advanced file operations. - Performance Considerations: Use
iglob()
for better performance when dealing with a large number of files, as it yields results one by one rather than storing them all in memory.
The glob
module in Python is an essential tool for file pattern matching and manipulation. It provides a simple and flexible way to search for files and directories based on patterns, making it invaluable for tasks involving file I/O. Whether you’re organizing files, processing batches of files, or automating repetitive tasks, the glob
module is a reliable and efficient solution.