The Invent with Python Blog

Fri 02 February 2018

The Python Data Model, Explained

Posted by Al Sweigart in python   

The Python Data Model is a document in the official Python documentation that describes the Python language's concept of data, as opposed to how other languages treat data. It's full of abstract ideas and generally illegible to new software developers. (Or even experienced developers; I learned plenty of new things from it despite writing Python code for years.)

This blog post aims to bring the complex ideas in the Python Data Model down to a more understandable level. You may find watching Ned Batchelder's Facts and Myths about Python Names and Values talk from PyCon 2015 first, though it's not strictly required. This blog post might have gaps in it; I'm writing this mostly as my own notes to understand the content in this documentation.

This blog post follows the same headings as the Python Data Model (which is section 3 in the official Python documentation, so you can open https://docs.python.org/3/reference/datamodel.html and read them roughly side by side. Think of it as "the Python data model documentation, but with code examples".

If you'd like another readable resource about the Python Data Model, check out Chapter 1 of Fluent Python. TODO Amazon link

3.1. Objects, values and types

Everything in Python is an object. There's no difference between "primitive types" (int, float, bool) and "object types" (strings, date objects, complex data structures) in Python like there is in Java. Even functions and code are stored as objects.

All Python objects have an identity, type, and value.

The identity is a unique integer that never changes during the lifetime of the object, and you can find it with the id() function (code visualization):

>>> spam = 42
>>> id(spam)
140711779350640
>>> cheese = 'hello'
>>> id(cheese)
1725534351520

In CPython (the Python interpreter you download from https://python.org, as opposed to something like PyPy or Jython) the identity is the memory address of the object.

An object's type (such as integer, float, string, etc.) is also unchanging. You can see an object's type by passing it to type() (code visualization):

>>> spam = 42
>>> type(spam)
<class 'int'>
>>> cheese = 'hello'
>>> type(cheese)
<class 'str'>
>>> type(type(cheese)) # Even the object returned from type() has a type: it's the type type.
<class 'type'>

The value is more obvious to Python programmers: 42 is an integer value, 'hello' is a string value, and so on. The value of an object may be changeable (that is, mutable) or it may be unchangeable (that is, immutable).

It's better to think of Python variables as labels attached to objects, rather than boxes that objects exist in. You can have multiple variables attached to the same object, as illustrated by the well-known Python gotcha where two variables refer to the same list (code visualization):

>>> spam = [2, 4, 6, 8]
>>> cheese = spam # Copies the reference to the list object, not the list object itself.
>>> cheese
[2, 4, 6, 8]
>>> spam[0] = 'changed'
>>> spam
['changed', 4, 6, 8]
>>> cheese # Both variables have changed because they refer to the same list object.
['changed', 4, 6, 8]

This is important because while Python variables are thought to be dynamic and changeable, often you are just changing what object a variable refers to, while the underlying object is immutable. For example, Python integers are immutable. This code doesn't change the integer object, it just points the variable to a new object:

>>> id(spam) # the integer object with value 42 has the id 140711779350640.
140711779350640
>>> spam = 43 # this isn't "changing" the integer object, it's pointing spam to a different object
>>> id(spam) # this different object has a different identity thant the 42 object
140711779350672

Integers, floats, strings, tuples, are all immutable types. Lists, dictionaries, and sets are mutable. Note the difference between mutating a list's value with the append() method and changing the list a variable refers to (code visualization):

>>> spam = [2, 4, 6, 8]
>>> id(spam) # Note the id of the list object that spam refers to
1725535036232
>>> spam.append(10)
>>> spam
[2, 4, 6, 8, 10]
>>> id(spam) # The id is the same as before.
1725535036232
>>> spam = [2, 4, 6, 8, 10, 12] # This is creating a new list object and referring spam to it
>>> id(spam) # You can tell because the id is different; this is a different list object now.
1725534094088

Objects are never explicitly destroyed by Python code. (Unlike in, say, the C programming language where you can call free() to deallocate the memory for an object.) Instead, Python objects are destroyed automatically by the garbage collector when an object no longer has any varibles referring to it. This can happen soon or long after the last variable stops referring to an object, so don't write code that depends on the immediate destruction of objects when they become unreachable.

Objects that contain references to other objects are called containers. Lists, tuples, dictionaries, and sets are examples of container types.

3.2. the standard type hierarchy

The Python language