Topics: collections.defaultdict
Updated 2020-10-06

The collections module

The collections module in Python contains several useful classes. One of them is especially helpful for our Magical Universe: collections.defaultdict.

When defining our CastleKilmereMember class we specified self.traits to be an empty dictionary. New positive and negative traits can be added to a person using the add_trait() function. We can check whether a person possesses a certain trait using the exhibits_trait() function. The relevant parts of the class look as follows:

class CastleKilmereMember:
    """ Creates a member of the Castle Kilmere School of Magic """
    def __init__(self, name: str, birthyear: int, sex: str):
        self.name = name
        self.birthyear = birthyear
        self.sex = sex
        self._traits = {}

    def add_trait(self, trait, value=True):
        self._traits[trait] = value

    def exhibits_trait(self, trait: str) -> bool:
        try:
            value = self._traits[trait]
            return value
        except KeyError as e:
            print(f"{self._name} does not have a character trait with the name {e}")
            return False

As visible in the definition of exhibits_trait() we have to catch and handle errors caused by querying the _traits dictionary with a non-existent key. Wouldn’t it be much nicer if we could just return False in case a Castle Kilmere member does not possess a certain character trait? We already discussed that this can be achieved using the dict.get() function (see post on Duck Typing for more details). Another, even more powerful solution is to use the defaultdict class from the collections module.

collections.defaultdict

collections.defaultdict is a subclass of the general dictionary type. What makes defaultdict perfect for our problem is that it allows to specify a callable whose return value will be used whenever a requested key cannot found. The basic usage of collections.defaultdict is as follows:

from collections import defaultdict

dict_ = defaultdict(default_factory)

if default_factory is not specified, i.e. if we just use dict_ = defaultdict() the dictionary will raise a KeyError whenever a requested key cannot be found (just like a normal dictionary). So we want to specify a default value.

Although we want to use False as our default value, we can’t use dict_ = defaultdict(False). The reason for this is simple: defaultdict requires a callable (e.g. a function) as an argument that provides the default value when invoked without arguments. False is not a callable but a boolean. So we have to define a function that returns False when called without arguments:

def return_false() -> bool:
    return False

dict_ = defaultdict(return_false)

Alternatively, we could also specify a lambda function:

dict_ = defaultdict(lambda: False)

The fact that defaultdict requires a callable makes it very powerful. We could create any kind of function and use the functions return value as a default. This has several important use cases. One common problem that can be solved with a defaultdict is grouping items in a collection.

Grouping items in a collection

Lets say we have a list of some of the pets Castle Kilmeres pupils are allowed to bring to school. We want to group the pets by type, that is, having all the owls together, all the cats and so on.

pets = [('Cotton', 'owl'), ('Ramses', 'cat'),
        ('Twiggles', 'owl'), ('Oscar', 'cat'),
        ('Louie', 'cat'), ('Bob', 'ferret'),
        ('Winston', 'owl'), ('Harry', 'owl')]

This can be achieved by providing the callable list as an argument to the defaultdict. We create a defaultdict using list as a default factory. To group the pets, we loop through our list of pets. Our first item in the list is Cotton (Luke’s owl). When we try to find the key owl in the dictionary it won’t be found. Since we used list as a default factory, a new empty list will be created and inserted into the dictionary for the key owl. We then append the name (which is Cotton) to this list. The next time we look for the key owl it will already be contained in the dictionary and the name can be appended to the existing list.

from collections import defaultdict

pets = [('Cotton', 'owl'), ('Ramses', 'cat'),
        ('Twiggles', 'owl'), ('Oscar', 'cat'),
        ('Louie', 'cat'), ('Bob', 'ferret'),
        ('Winston', 'owl'), ('Harry', 'owl')]

types_of_pets = defaultdict(list)
for name, type_ in pets:
    types_of_pets[type_].append(name)

Take a guess how the output looks like when looping through the resulting dictionary:

for key, value in types_of_pets.items():
    print(f"{key}: {value}")

The output is hopefully what you expected. We get a list of all owls, a list of all cats, and so on:

for key, value in types_of_pets.items():
    print(f"{key}: {value}")

>>> owl: ['Cotton', 'Twiggles', 'Winston', 'Harry']
>>> cat: ['Ramses', 'Oscar', 'Louie']
>>> ferret: ['Bob']

There are a lot more uses cases for the defaultdict class. For example, we could use a defaultdict to count the number of pets of each type. Use this as an exercise: which default_factory could we use to count the number of pets of each type?