HomepythonThe glob Module in Python

The glob Module in Python

Introduction

The glob module in Python is a powerful tool for file pattern matching. It allows you to search for files and directories whose names match a specified pattern, using wildcards similar to those used in Unix shell commands. This module is especially useful for tasks involving file manipulation, such as batch processing files, organizing files, and automating tasks that involve file I/O.

In this article, we’ll explore the glob module in detail, covering its functions, usage, and practical examples to demonstrate its capabilities.

Importing the glob Module

To use the glob module, you first need to import it into your Python script:

import glob

Basic Usage

The primary function in the glob module is glob(). This function returns a list of file paths that match a specified pattern.

Pattern Matching

The glob module uses Unix-style pathname expansion, meaning it supports wildcards like *, ?, and []:

  • *: Matches zero or more characters.
  • ?: Matches exactly one character.
  • []: Matches any one of the characters enclosed within the brackets.

Example

Let’s start with a simple example. Assume we have a directory with the following files:

example/
├── file1.txt
├── file2.txt
├── fileA.txt
└── script.py

To list all .txt files in the example directory, we can use the glob function with the *.txt pattern:

import glob

# List all .txt files in the example directory
txt_files = glob.glob('example/*.txt')
print(txt_files)

This would output:

['example/file1.txt', 'example/file2.txt', 'example/fileA.txt']

Recursive Search

The glob module also supports recursive searching using the ** pattern. To enable recursive searching, you need to use the recursive parameter set to True.

Example

Suppose the example directory has the following structure:

example/
├── subdir1/
│   ├── file3.txt
│   └── file4.py
├── subdir2/
│   └── file5.txt
└── file1.txt

To list all .txt files in the example directory and its subdirectories, you can use:

import glob

# List all .txt files recursively in the example directory
txt_files = glob.glob('example/**/*.txt', recursive=True)
print(txt_files)

This would output:

['example/file1.txt', 'example/subdir1/file3.txt', 'example/subdir2/file5.txt']

Functions in the glob Module

The glob module provides two main functions: glob() and iglob().

glob()

This function returns a list of paths matching a pathname pattern.

Syntax:

glob.glob(pattern, recursive=False)
  • pattern: The pattern to match.
  • recursive: If True, the pattern '**' will match any files and zero or more directories, subdirectories, and symbolic links.

Example:

import glob

# List all Python files in the current directory
python_files = glob.glob('*.py')
print(python_files)

iglob()

This function returns an iterator which yields the same values as glob() without storing them all simultaneously.

Syntax:

glob.iglob(pattern, recursive=False)
  • pattern: The pattern to match.
  • recursive: If True, the pattern '**' will match any files and zero or more directories, subdirectories, and symbolic links.

Example:

import glob

# Iterate over all Python files in the current directory
for python_file in glob.iglob('*.py'):
    print(python_file)

Practical Examples

Example 1: Moving Files Based on Extension

Suppose you want to organize your files by moving all .txt files to a text_files directory.

import glob
import shutil
import os

# Create the target directory if it doesn't exist
os.makedirs('text_files', exist_ok=True)

# Move all .txt files to the text_files directory
for txt_file in glob.glob('*.txt'):
    shutil.move(txt_file, 'text_files')

Example 2: Reading All CSV Files in a Directory

If you have multiple CSV files and you want to read them into Pandas dataframes, you can use the glob module to get the list of files.

import glob
import pandas as pd

# List all CSV files in the current directory
csv_files = glob.glob('*.csv')

# Read each CSV file into a DataFrame
dataframes = [pd.read_csv(csv_file) for csv_file in csv_files]

Example 3: Finding Files with Specific Names

You might need to find files with specific names or patterns. For example, finding all files that start with ‘data’ and end with ‘.log’.

import glob

# List all files starting with 'data' and ending with '.log'
log_files = glob.glob('data*.log')
print(log_files)

Tips and Best Practices

  1. Use Raw Strings for Windows Paths: On Windows, use raw strings (r'path\to\file') to avoid issues with backslashes.
  2. Combine with Other Modules: The glob module can be combined with other modules like os, shutil, and pandas for more advanced file operations.
  3. Performance Considerations: Use iglob() for better performance when dealing with a large number of files, as it yields results one by one rather than storing them all in memory.

The glob module in Python is an essential tool for file pattern matching and manipulation. It provides a simple and flexible way to search for files and directories based on patterns, making it invaluable for tasks involving file I/O. Whether you’re organizing files, processing batches of files, or automating repetitive tasks, the glob module is a reliable and efficient solution.

Subscribe
Notify of

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Popular