Python Collections Python

Oct 8th, 2021 - written by Kimserey with .

Python comes with general purpose datatypes like dict, list or set. Those types are commonly used everywhere with the best tradeoff in term of performance and application scope. But there are times where we might see ourselves repeating a specific implementation, for example counting values while storing them in a dict is one of those cases. To cater for those repeated scenarios, the collections module gives us access to alternative types that can be used for specialized purposes. In today’s post we will look at some of those types with examples.

ChainMap

ChainMap can be used to combine mappings together. Compared to .update, it doesn’t mutate any existing mapping but instead creates a ChainMap which acts as a single mapping to access all underlying mappings.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In [16]: from collections import ChainMap

In [17]: a
Out[17]: {'a': 1, 'b': 2, 'c': 3}

In [18]: b = {"z": 10  }

In [20]: list(ChainMap(a, b).items())
Out[20]: [('z', 10), ('a', 1), ('b', 2), ('c', 3)]

In [23]: a
Out[23]: {'a': 1, 'b': 2, 'c': 3}

In [24]: b
Out[24]: {'z': 10}

In [25]: ChainMap(a, b)["z"]
Out[25]: 10

ChainMap also supports update and delete operations but it will only be executing the operations against the first mapping, therefore if we try to delete a key that is not on the first mapping, it will throw exception.

In order to delete from the subsequent mappings, we can use the .maps attribute and find the key within it.

1
2
3
4
5
6
In [27]: chain = ChainMap(a, b)

In [28]: chain["test"] = "hello"

In [29]: a
Out[29]: {'a': 1, 'b': 2, 'c': 3, 'test': 'hello'}

Here we add test: hello and we see that it was added to a which is our first mapping.

And if we try to delete z from chain which was originally on the second mapping:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
In [31]: del chain["z"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/lib/python3.8/collections/__init__.py in __delitem__(self, key)
    950         try:
--> 951             del self.maps[0][key]
    952         except KeyError:

KeyError: 'z'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-31-d735ebc693af> in <module>
----> 1 del chain["z"]

/usr/lib/python3.8/collections/__init__.py in __delitem__(self, key)
    951             del self.maps[0][key]
    952         except KeyError:
--> 953             raise KeyError('Key not found in the first mapping: {!r}'.format(key))
    954 
    955     def popitem(self):

KeyError: "Key not found in the first mapping: 'z'"

We get an explicit error telling us the key was not found on the first mapping. So we can delete values from the first mapping:

1
2
3
4
In [32]: del chain["test"]

In [33]: chain.items()
Out[33]: ItemsView(ChainMap({'a': 1, 'b': 2, 'c': 3}, {'z': 10}))

But if we want to support deleting from subsequent mapping we have to use .maps:

1
2
In [34]: chain.maps
Out[34]: [{'a': 1, 'b': 2, 'c': 3}, {'z': 10}]

With .maps we can iterate over the maps and delete the keys when found on the correct mapping.

Counter

Counter is used to provide counter functionalities which builds counter in a mapping.

For example if we need to get the count of values in an array:

1
2
In [54]: Counter(["a", "a", "c"]).items()
Out[54]: dict_items([('a', 2), ('c', 1)])

Counter acts as a mapping of tuple (value, count). It can be used instead of dict to gather count of items:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
In [57]: c = Counter()

In [58]: c["test"]
Out[58]: 0

In [59]: c["hello"]
Out[59]: 0

In [60]: c["hello"] += 1

In [61]: c["hello"] += 1

In [62]: c["hello"]
Out[62]: 2

When initialised, any key access will return 0 instead of throwing key not found. This makes it easy to increment the count without needing to be concerned about exceptions.

Few extra functions are available on Counter, for example .elements() can be used to can an iterable of the values repeated as many times as their counts:

1
2
In [63]: list(c.elements())
Out[63]: ['hello', 'hello']

And another nice function is most_common which will return the k most common items:

1
2
3
4
5
6
7
In [64]: c["world"] += 1

In [65]: list(c.items())
Out[65]: [('hello', 2), ('world', 1)]

In [66]: list(c.most_common(1))
Out[66]: [('hello', 2)]

Behind most_common, Counter uses heapq.nlargest so it follows the same complexity.

defaultdict

Just like with Counter, there are times where we want a default value. For Counter it was 0, but in other cases we want empty string or empty list so that we can directly act on the value whether it is inside the dictionary or not.

For that we can use defaultdict:

1
2
3
4
5
6
7
8
9
In [81]: d = defaultdict(list)

In [82]: d["test"]
Out[82]: []

In [83]: d["hello"].append("world")

In [84]: d
Out[84]: defaultdict(list, {'test': [], 'hello': ['world']})

deque

Lastly Python lists can be used to mimic stacks and queues.

For example for stacks we can .append then .pop which would be last in first out (LIFO).

1
2
3
4
5
6
In [92]: a = [1, 2]

In [93]: a.append(3)

In [94]: a.pop()
Out[94]: 3

Appending and popping last are both O(1) operations for lists so using lists as stack is a fine but when it comes to using lists as a queue, it would be first in first out (FIFO).

1
2
3
4
5
6
In [107]: a = [1, 2]

In [108]: a.append(3)

In [109]: a.pop(0)
Out[109]: 1

The problem here is that popping at the beginning or in an intermediate place in the list is O(n). Similarly using .insert and inserting at index 0 would be O(n).

To cater for that, collections provides deque pronounced “deck” standing for “double-ended queue” (and not a short form of “dequeue”).

deque provides the same functionality as list except it has extra functions that have constant time complexity. The main difference being the provided method to popleft and appendleft which are now constant time complexity:

1
2
3
4
5
6
7
In [112]: a = deque([1, 2])

In [113]: a
Out[113]: deque([1, 2])

In [114]: a.popleft()
Out[114]: 1

A good comparison of the complexity can be found on the Python wiki. And that concludes today’s post!

Conclusion

Today we looked at the extra collections provided by the collection module. We started by looking at ChainMap, we then moved on into looking at Counter, then moved on to defaultdict and lastly completed the post by looking at deque. I hope you liked this post and I’ll see you on the next one!

External Sources

Designed, built and maintained by Kimserey Lam.