The Invent with Python Blog

Fri 01 February 2019

Hashable Objects Must Be Immutable

Posted by Al Sweigart in python   

A recent post to Reddit sparked some comments, so I wanted to clarify: In Python, hashable objects must be immutable and mutable objects cannot be hashable. (With one exception.)

Before the torches and pitchforks are gathered, let me explain some background.

If you want to make your classes hashable, you must follow two rules outlined in the Python Glossary for the entry for "hashable":

An object is hashable if [1] it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). [2] Hashable objects which compare equal must have the same hash value.

In Python, integers, floats, and bools are all immutable. And because 1 == 1.0 == True, then hash(1) == hash(1.0) == hash(True).

Let's create an immutable Point class, which has read-only x and y attributes, and it reuses the hashes for tuples:

>>> class ImmutablePoint:
...   def __init__(self, x, y):
...     self._x = x
...     self._y = y
...   def __eq__(self, other):
...     if not isinstance(other, ImmutablePoint):
...       return False
...     return self._x == other._x and self._y == other._y
...   def __hash__(self):
...     return hash((self._x, self._y))
...   @property
...   def x(self):
...     return self._x
...   @property
...   def y(self):
...     return self._y

We can create ImmutablePoint objects, and they are immutable: we can't change their x or y attributes:

>>> ip = ImmutablePoint(10, 20)

>>> ip.x, ip.y
(10, 20)

>>> ip.x = 'changed' # x and y are read-only
Traceback (most recent call last):
  File "stdin", line 1, in module
AttributeError: can't set attribute

>>> ip2 = ImmutablePoint(10, 20)

>>> ip == ip2
True

Since they are hashable, we can also use these objects as keys in a dictionary:

>>> d = {ip: 'hello'}

>>> d[ip]
'hello'

>>> d[ip2]
'hello'

Now, since hashes are based on an object's value and an object's hash never changes (according to the rule from the Python Glossary), this necessarily means that only immutable objects can be hashable. Notice that you can't use a mutable list or dictionary as a key in a dictionary, nor do they return anything from hash():

>>> {[1, 2, 3]: 'hello'}
Traceback (most recent call last):
  File "stdin", line 1, in module
TypeError: unhashable type: 'list'

>>> hash([1, 2, 3])
Traceback (most recent call last):
  File "stdin", line 1, in module
TypeError: unhashable type: 'list'

>>> hash({1:2, 3:4})
Traceback (most recent call last):
  File "stdin", line 1, in module
TypeError: unhashable type: 'dict'

But why is this so? What's so bad about making a mutable object hashable? We can just add a __hash__() method to our mutable class, right?

The reason has to do with how hashes are used in dictionaries. This will require some CS background on how dictionaries/hashmaps work (but you can watch Brandon Rhode's PyCon 2010 talk, The Mighty Dictionary). But the short answer is that if the object's value changes, then the hash must also change since hashes are based on values. However, if the object's value changes after it's used as a key in a dictionary, the hash will no longer refer to the correct bucket in the dictionary for that key. Let's use an example.

Here's a subclass of Python's built-in list data type (which is mutable) that we've added a __hash__() function to:

>>> import collections

>>> class HashableList(collections.UserList):
...     def __hash__(self):
...         return hash(tuple(self))

We can create a hashable list object and put it in a dictionary:

>>> h = HashableList([1, 2, 3])

>>> d = {h: 'hello'}

>>> d
{[1, 2, 3]: 'hello'}

It seems to work. We even make another hashable list with the same value and it seems to work too:

>>> d[h]
'hello'

>>> h2 = HashableList([1, 2, 3])

>>> d[h2]
'hello'

Now we change h. As expected, it creates a KeyError because [1, 2, 100] isn't a key in the dictionary, only [1, 2, 3] is (this turns out to be wrong, but ignore that for now):

>>> h[2] = 100

>>> d[h]
Traceback (most recent call last):
  File "stdin", line 1, in module
KeyError: [1, 2, 100]

But here's the thing. h2 no longer works as a key, even though it is [1, 2, 3]:

>>> d[h2]
Traceback (most recent call last):
  File "stdin", line 1, in module
KeyError: [1, 2, 3]

Why is this? Because the key in d isn't a copy of the [1, 2, 3] object, it's a copy of the reference. When we changed h, we also changed the dictionary key:

>>> d
{[1, 2, 100]: 'hello'}

So this means the key is now [1, 2, 100], but it's in the bucket/slot for [1, 2, 3]. But h2's [1, 2, 3] won't work because the key's value is now [1, 2, 100] and Python just assumes it happens to be a hash collision.

To make things even worse, try setting h[2] to 99. By sheer coincidence, [1, 2, 99] hashes to the same bucket/slot as [1, 2, 3]. And the values are the same (because the key has also changed to [1, 2, 99]) so it works perfectly fine even though it should cause a KeyError:

>>> h[2] = 99 # You might need to find another integer to reproduce this, depending on your Python version.

>>> d[h]
'hello'

So this is why it's a Bad Thing to have mutable items be hashable: changing them also changes the key in the dictionary. Sometimes this works even though it should cause a KeyError (when there's a hash collision by coincidence) and other times it'll cause a KeyError when it should work (because the dictionary key has changed when the mutable object has changed).

BUT WAIT, WHAT WAS THE ONE EXCEPTION? Well, a mutable object can be hashable and used as a dictionary key as long as it's hash and value are the same as its identity (or some other unique, unchanging integer). That is, it must have the following __eq__() and __hash__() implementations:

>>> class SomeClass:
...   def __eq__(self, other):
...     return other is self
...   def __hash__(self):
...     return id(self)

This also happens to be the default implementations for Python classes, so we can shorten our class to just the following:

>>> class SomeClass:
...   pass

In this case, we uphold the two rules for hashability. But we don't really have a useful class, because every object of type SomeClass is only equal to itself, and will always be False when compared to any other object, no matter what its value is.

>>> sc = SomeClass()

>>> {sc: 'hello'}
{<__main__.SomeClass object at 0x000001C6CEA57748>: 'hello'}

>>> sc2 = SomeClass()

>>> sc == sc2
False

So either you can follow Python's two hashability rules for your class, or you can create mutable, hashable objects that don't actually work in dictionaries. The only exception when you can have a mutable, hashable class is when the hash is based on the identity and not the value, which severely restricts its usefulness as a dictionary key. Which means that, effectively, only immutable objects can be hashable and mutable objects can't be hashable.

Fin.