HomepythonThe collections module in Python

The collections module in Python

The collections module in Python is part of the standard library and provides alternatives to Python’s general-purpose built-in containers like dict, list, set, and tuple. It includes specialized container datatypes that provide more functionality and ease of use for certain tasks. This guide covers the primary classes and functions available in the collections module, along with practical examples to illustrate their usage.

Overview of the collections Module

The collections module includes the following key classes and functions:

  • namedtuple(): Factory function for creating tuple subclasses with named fields.
  • deque: List-like container with fast appends and pops on either end.
  • ChainMap: Dictionary-like class for creating a single view of multiple mappings.
  • Counter: Dictionary subclass for counting hashable objects.
  • OrderedDict: Dictionary subclass that remembers the order entries were added.
  • defaultdict: Dictionary subclass that calls a factory function to supply missing values.
  • UserDict, UserList, UserString: Wrapper classes that make it easier to create custom dictionary, list, and string subclasses.

Importing the Module

Before using the collections module, you need to import it:

import collections

Using namedtuple()

The namedtuple() function returns a new tuple subclass with named fields. It can be used to create simple classes that are immutable and lightweight.

Example

import collections

# Create a Point namedtuple
Point = collections.namedtuple('Point', ['x', 'y'])

# Instantiate a Point object
p = Point(10, 20)
print(p)  # Output: Point(x=10, y=20)

# Accessing fields
print(p.x)  # Output: 10
print(p.y)  # Output: 20

Using deque

The deque (double-ended queue) is a list-like container with fast appends and pops from both ends. It is useful for implementing queues and stacks.

Example

import collections

# Create a deque
d = collections.deque([1, 2, 3])

# Append to the right
d.append(4)
print(d)  # Output: deque([1, 2, 3, 4])

# Append to the left
d.appendleft(0)
print(d)  # Output: deque([0, 1, 2, 3, 4])

# Pop from the right
d.pop()
print(d)  # Output: deque([0, 1, 2, 3])

# Pop from the left
d.popleft()
print(d)  # Output: deque([1, 2, 3])

Using ChainMap

The ChainMap class groups multiple dictionaries into a single view. This can be useful for managing contexts or scopes.

Example

import collections

# Create two dictionaries
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}

# Create a ChainMap
chain = collections.ChainMap(dict1, dict2)
print(chain)  # Output: ChainMap({'a': 1, 'b': 2}, {'b': 3, 'c': 4})

# Accessing values
print(chain['a'])  # Output: 1
print(chain['b'])  # Output: 2 (from the first dictionary)

# Modifying values
chain['b'] = 5
print(dict1)  # Output: {'a': 1, 'b': 5}

Using Counter

The Counter class is a dictionary subclass that counts hashable objects. It is useful for tallying occurrences of items.

Example

import collections

# Create a Counter
c = collections.Counter(['apple', 'banana', 'apple', 'orange', 'banana', 'banana'])
print(c)  # Output: Counter({'banana': 3, 'apple': 2, 'orange': 1})

# Accessing counts
print(c['banana'])  # Output: 3
print(c['apple'])   # Output: 2

# Updating counts
c.update(['apple', 'apple', 'banana'])
print(c)  # Output: Counter({'banana': 4, 'apple': 4, 'orange': 1})

# Getting the most common elements
print(c.most_common(2))  # Output: [('banana', 4), ('apple', 4)]

Using OrderedDict

The OrderedDict class is a dictionary subclass that remembers the order entries were added. This can be useful for tasks that require maintaining insertion order.

Example

import collections

# Create an OrderedDict
od = collections.OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3

print(od)  # Output: OrderedDict([('a', 1), ('b', 2), ('c', 3)])

# Accessing values
print(od['b'])  # Output: 2

# Adding a new entry
od['d'] = 4
print(od)  # Output: OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])

Using defaultdict

The defaultdict class is a dictionary subclass that calls a factory function to supply missing values. This is useful when you want to avoid key errors and provide default values.

Example

import collections

# Create a defaultdict with a default factory function
dd = collections.defaultdict(int)
dd['a'] += 1
dd['b'] += 2

print(dd)  # Output: defaultdict(<class 'int'>, {'a': 1, 'b': 2})

# Create a defaultdict with a list as the default factory
dd_list = collections.defaultdict(list)
dd_list['a'].append(1)
dd_list['b'].append(2)

print(dd_list)  # Output: defaultdict(<class 'list'>, {'a': [1], 'b': [2]})

Using UserDict, UserList, UserString

These classes act as wrappers around dictionary, list, and string objects, making it easier to create custom container types by subclassing them.

Example

import collections

# Custom dictionary subclass
class MyDict(collections.UserDict):
    def __setitem__(self, key, value):
        print(f'Setting {key} to {value}')
        super().__setitem__(key, value)

md = MyDict()
md['a'] = 1  # Output: Setting a to 1
print(md)  # Output: {'a': 1}

Practical Examples

Example 1: Counting Words in a Text

Using Counter to count the occurrences of each word in a text.

import collections

text = "the quick brown fox jumps over the lazy dog the quick brown fox"
words = text.split()
word_count = collections.Counter(words)

print(word_count)
# Output: Counter({'the': 2, 'quick': 2, 'brown': 2, 'fox': 2, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1})

Example 2: Maintaining an Access Order

Using OrderedDict to keep track of the order in which keys are accessed.

import collections

class LRUCache:
    def __init__(self, capacity: int):
        self.cache = collections.OrderedDict()
        self.capacity = capacity

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)

# Example usage
lru_cache = LRUCache(2)
lru_cache.put(1, 1)
lru_cache.put(2, 2)
print(lru_cache.get(1))  # Output: 1
lru_cache.put(3, 3)
print(lru_cache.get(2))  # Output: -1 (removed due to capacity)

The collections module in Python provides a variety of specialized container datatypes that offer more functionality and ease of use compared to general-purpose built-in containers. By leveraging these classes and functions, you can write more efficient, readable, and maintainable code for a wide range of applications. Whether you need a dictionary that maintains insertion order, a counter for tallying occurrences, or a custom container type, the collections module has you covered.

Subscribe
Notify of

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Popular