Oct 8th, 2021 - written by Kimserey with .
Python comes with general purpose datatypes like dict
, list
or set
. Those types are commonly used everywhere with the best tradeoff in term of performance and application scope. But there are times where we might see ourselves repeating a specific implementation, for example counting values while storing them in a dict
is one of those cases. To cater for those repeated scenarios, the collections
module gives us access to alternative types that can be used for specialized purposes. In today’s post we will look at some of those types with examples.
ChainMap
ChainMap
can be used to combine mappings together. Compared to .update
, it doesn’t mutate any existing mapping but instead creates a ChainMap
which acts as a single mapping to access all underlying mappings.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In [16]: from collections import ChainMap
In [17]: a
Out[17]: {'a': 1, 'b': 2, 'c': 3}
In [18]: b = {"z": 10 }
In [20]: list(ChainMap(a, b).items())
Out[20]: [('z', 10), ('a', 1), ('b', 2), ('c', 3)]
In [23]: a
Out[23]: {'a': 1, 'b': 2, 'c': 3}
In [24]: b
Out[24]: {'z': 10}
In [25]: ChainMap(a, b)["z"]
Out[25]: 10
ChainMap
also supports update and delete operations but it will only be executing the operations against the first mapping, therefore if we try to delete a key that is not on the first mapping, it will throw exception.
In order to delete from the subsequent mappings, we can use the .maps
attribute and find the key within it.
1
2
3
4
5
6
In [27]: chain = ChainMap(a, b)
In [28]: chain["test"] = "hello"
In [29]: a
Out[29]: {'a': 1, 'b': 2, 'c': 3, 'test': 'hello'}
Here we add test: hello
and we see that it was added to a
which is our first mapping.
And if we try to delete z
from chain
which was originally on the second mapping:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
In [31]: del chain["z"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/lib/python3.8/collections/__init__.py in __delitem__(self, key)
950 try:
--> 951 del self.maps[0][key]
952 except KeyError:
KeyError: 'z'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-31-d735ebc693af> in <module>
----> 1 del chain["z"]
/usr/lib/python3.8/collections/__init__.py in __delitem__(self, key)
951 del self.maps[0][key]
952 except KeyError:
--> 953 raise KeyError('Key not found in the first mapping: {!r}'.format(key))
954
955 def popitem(self):
KeyError: "Key not found in the first mapping: 'z'"
We get an explicit error telling us the key was not found on the first mapping. So we can delete values from the first mapping:
1
2
3
4
In [32]: del chain["test"]
In [33]: chain.items()
Out[33]: ItemsView(ChainMap({'a': 1, 'b': 2, 'c': 3}, {'z': 10}))
But if we want to support deleting from subsequent mapping we have to use .maps
:
1
2
In [34]: chain.maps
Out[34]: [{'a': 1, 'b': 2, 'c': 3}, {'z': 10}]
With .maps
we can iterate over the maps and delete the keys when found on the correct mapping.
Counter
Counter
is used to provide counter functionalities which builds counter in a mapping.
For example if we need to get the count of values in an array:
1
2
In [54]: Counter(["a", "a", "c"]).items()
Out[54]: dict_items([('a', 2), ('c', 1)])
Counter
acts as a mapping of tuple (value, count)
. It can be used instead of dict
to gather count of items:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
In [57]: c = Counter()
In [58]: c["test"]
Out[58]: 0
In [59]: c["hello"]
Out[59]: 0
In [60]: c["hello"] += 1
In [61]: c["hello"] += 1
In [62]: c["hello"]
Out[62]: 2
When initialised, any key access will return 0
instead of throwing key not found. This makes it easy to increment the count without needing to be concerned about exceptions.
Few extra functions are available on Counter
, for example .elements()
can be used to can an iterable of the values repeated as many times as their counts:
1
2
In [63]: list(c.elements())
Out[63]: ['hello', 'hello']
And another nice function is most_common
which will return the k
most common items:
1
2
3
4
5
6
7
In [64]: c["world"] += 1
In [65]: list(c.items())
Out[65]: [('hello', 2), ('world', 1)]
In [66]: list(c.most_common(1))
Out[66]: [('hello', 2)]
Behind most_common
, Counter
uses heapq.nlargest
so it follows the same complexity.
defaultdict
Just like with Counter
, there are times where we want a default value. For Counter
it was 0
, but in other cases we want empty string or empty list so that we can directly act on the value whether it is inside the dictionary or not.
For that we can use defaultdict
:
1
2
3
4
5
6
7
8
9
In [81]: d = defaultdict(list)
In [82]: d["test"]
Out[82]: []
In [83]: d["hello"].append("world")
In [84]: d
Out[84]: defaultdict(list, {'test': [], 'hello': ['world']})
deque
Lastly Python lists can be used to mimic stacks and queues.
For example for stacks we can .append
then .pop
which would be last in first out (LIFO).
1
2
3
4
5
6
In [92]: a = [1, 2]
In [93]: a.append(3)
In [94]: a.pop()
Out[94]: 3
Appending and popping last are both O(1) operations for lists so using lists as stack is a fine but when it comes to using lists as a queue, it would be first in first out (FIFO).
1
2
3
4
5
6
In [107]: a = [1, 2]
In [108]: a.append(3)
In [109]: a.pop(0)
Out[109]: 1
The problem here is that popping at the beginning or in an intermediate place in the list is O(n). Similarly using .insert
and inserting at index 0
would be O(n).
To cater for that, collections
provides deque
pronounced “deck” standing for “double-ended queue” (and not a short form of “dequeue”).
deque
provides the same functionality as list except it has extra functions that have constant time complexity. The main difference being the provided method to popleft
and appendleft
which are now constant time complexity:
1
2
3
4
5
6
7
In [112]: a = deque([1, 2])
In [113]: a
Out[113]: deque([1, 2])
In [114]: a.popleft()
Out[114]: 1
A good comparison of the complexity can be found on the Python wiki. And that concludes today’s post!
Today we looked at the extra collections provided by the collection
module. We started by looking at ChainMap
, we then moved on into looking at Counter
, then moved on to defaultdict
and lastly completed the post by looking at deque
. I hope you liked this post and I’ll see you on the next one!