Python 3.9 was released on Oct. 5, 2020 and it introduces some neat features and optimizations including PEP 584, Union Operators in the built-in class dict; the so-called Dictionary Merge and Update Operators. In this blog post we will go over the new operators to see if there are any advantages or disadvantages of using them over the earlier ways of merging and updating dictionaries.
Different Ways to Merge Dictionaries
1. dict.update()
d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d1.update(d2)
print (d1)
#Output:{'a': 1, 'b': 9999, 'c': 3}
d1.update(d2) update the dictionary d1 with the key/value pairs from d2, overwriting existing keys, return None. - python docs
But the problem when we use the update() method is that it modifies one of the dictionaries. If we wish to create a third dictionary without modifying any of the other dictionaries, we cannot use this method, you would have to make a copy of one of your existing dictionaries first.
d1={'a':1,'b':2}
d2={'c':3,'b':9999}
from copy import copy
d3=copy(d1)
d3.update(d2)
print (d3)
#Output:{'a': 1, 'b': 9999, 'c': 3}
Also, you can only use this method to merge two dictionaries at a time. If you wish to merge three dictionaries, you first need to merge the first two, and then merge the third one with the modified dictionary.
d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d3={'e':4,'f':[1, 3]}
d1.update(d2)
d1.update(d3)
#Output:{'a':1, 'b':9999, 'c':3, 'e':4,'f':[1, 3]}
2. Dictionary unpacking
d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d3={**d1,**d2}
print (d3)
#Output:{'a': 1, 'b': 9999, 'c': 3}
A double asterisk ** denotes dictionary unpacking.
It will expand the contents of dictionaries d1 and d2 as a collection of key-value pairs and update the dictionary d3. - python docs
However, {**d1, **d2}
ignores the types of the mappings and always returns a dict
.type(d1)
({**d1, **d2})
fails for dict subclasses such as defaultdict
that have an incompatible __ init __ method:
from collections import defaultdict
d1 = defaultdict(None, {0: 'a'})
d2 = defaultdict(None, {1: 'b'})
{**d1, **d2}
#Output: {0: 'a', 1: 'b'}
This way of merging two dictionaries feels unnatural and hardly obvious.
I’m sorry for PEP 448, but even if you know about **d in simpler contexts, if you were to ask a typical Python user how to combine two dicts into a new one, I doubt many people would think of {**d1, **d2}. I know I myself had forgotten about it when this thread started!
3. collections.ChainMap
d1={'a':1,'b':2}
d2={'c':3,'b':9999}
from collections import ChainMap
d3=ChainMap(d1,d2)
print (d3)
#Output:ChainMap({'a': 1, 'b': 2}, {'c': 3, 'b': 9999})
print (dict(d3))
#Output:{'c': 3, 'b': 2, 'a': 1}
chainmap: A ChainMap groups multiple dictionaries or other mappings together to create a single, updateable view.- python docs
collections.ChainMap(maps)
return type is collections.ChainMap. We can convert to dict using the dict()
constructor. ChainMap is unfortunately poorly-known and doesn’t qualify as “obvious”. It also resolves duplicate keys in the opposite order to that expected (“first seen wins” instead of “last seen wins”).
Like dictionary unpacking{**d1,**d2}
, It also ignores the types of mappings and always returns a dict. For the same reason,type(d1)
(ChainMap(d2, d1))
fails for some subclasses of dict.
It probably is even less straightforward than the previous two methods and unfortunately modifies the underlying dictionaries if you update the ChainMap object:
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3}
from collections import ChainMap
d3 = ChainMap(d1, d2)
d3
ChainMap({'a': 1, 'b': 2}, {'b': 3})
d3['b'] = 4
d3
ChainMap({'a': 1, 'b': 4}, {'b': 3})
d1
{'a': 1, 'b': 4}
d2
{'b': 3}
4. dict(d1,d2)
d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d3=dict(d1,**d2)
print (d3)
#Output:{'a': 1, 'b': 9999, 'c': 3}
d3 will contain key-value pairs from d1 and d2. Keys that are common in d1 and d2 will contain values from d2. However, this only works for dictionaries that have all keys of type string:
d1={'a':1,'b':2}
d2={'a':99,1:3}
d3=dict(d1,**d2)
print (d3)
#Output:TypeError: keywords must be strings
New Ways Introduced in Python 3.9
Two union operators, merge | and update |=, have been introduced for dict.
The Dictionary Merge Operator
If you want to create a new dict based on two dictionaries you already have, you can do the following:
d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d3=d1|d2
print (d3)
#Output:{'a': 1, 'b': 9999, 'c': 3}
“Dict union will return a new dict consisting of the left operand merged with the right operand, each of which must be a dict (or an instance of a dict subclass). If a key appears in both operands, the last-seen value (i.e. that from the right-hand operand) wins.”
To demonstrate the usefulness of the merge operator,|
, let's take a look at the following example using defaultdict
:
from collections import defaultdict user_not_found_message = 'Could not find any user matching the specified user id.' ceo = defaultdict( lambda: user_not_found_message, {'id': 1, 'name': 'Jose', 'title': 'Instructor'}
) author = defaultdict( lambda: user_not_found_message, {'id': 2, 'name': 'Vlad', 'title': 'Teaching Assistant'}
)
By using the double asterisk, **
, merging the two dictionaries will work, but the method is not aware of the class object so we will end up with a traditional dictionary instead:
print({**author, **ceo})
# {'id': 2, 'name': 'Jose', 'title': 'Author', 'title': 'Instructor'}
The power of the merge operator |
is that it is aware of the class objects. As such, a defaultdict
will be returned:
print(author | ceo)
# defaultdict(<function <lambda> at 0x000002212125DE50>, {'id': 2, 'name': 'Jose', 'title': 'Instructor'})
The Dictionary Update Operator
d1|=d2
will modify d1
in place. It also accepts anything implementing the Mapping
protocol (more specifically, anything with the keys and getitem methods) or iterables of key-value pairs. Compared to dict.update
, we can achieve the same functionality with a cleaner syntax:
d1={'a':1,'b':2}
d2={'c':3,'b':9999}
d1|=d2
print (d1)
#Output:{'a': 1, 'b': 9999, 'c': 3}
Just a slight difference in code execution time!
timeit.timeit("d1={'a':1,'b':2}; d2={'c':3,'b':9999}; res = dict(d1); res.update(d2)", number=10000) 0.0034674000926315784
timeit.timeit("d1={'a':1,'b':2}; d2={'c':3,'b':9999}; res = {**d1,**d2}", number=10000) 0.0028335999231785536
timeit.timeit("d1={'a':1,'b':2}; d2={'c':3,'b':9999}; res = d1 | d2", number=10000) 0.0027796999784186482
Summary
The new operators are not here to replace the existing ways of merging and updating, but rather to complement them. Some of the major takeaways are:
- the merge operator, |, is class aware, offers a better syntax and it creates a new object.
- the update operator, |=, operates in-place, catches common errors before they happen and it doesn't create a new object.
- the operators are new features in Python 3.9