How to Use Python 3.11's New TOML Parser, tomllib
Wed 23 February 2022 Al Sweigart
TOML Parser To Be Included In Python 3.11
The Python language steering council has accepted PEP 680, which adds a TOML parser to the Python standard library. This means in Python 3.11, you'll be able to import tomllib
and have a module that can parse TOML files.
But what are TOML files and why is a TOML parser being added to Python? This blog explains what TOML files look like, how they're used, and the Python code that lets you parse them.
TOML stands for Tom's Obvious, Minimal Language. It is a way to structure data in a text file, much like what JSON, XML, and YAML files do. The TOML standard is set up so that it's easy for humans to write TOML files in a text editor for use as configuration files. The syntax is very similar to .ini files on Windows. Here's an example TOML file:
# My Dungeons & Dragons character. name = "Mordnilap" class = "Magic user" maxHP = 14 currentHP = 2 inventory = ["apple", "enchanted staff", "healing potion"] cursed = false [spells] # The integers represent how many times a day they can be cast. fireball = 3 featherFall = 2
TOML files are designed to easily and unambiguously be parsed into a dictionary data structure. With the TOML parser that will be in Python 3.11, you'll be able to parse this text and turn into the following Python dictionary:
{'class': 'Magic user',
'currentHP': 2,
'cursed': False,
'inventory': ['apple', 'enchanted staff', 'healing potion'],
'maxHP': 14,
'name': 'Mordnilap',
'spells': {'featherFall': 2, 'fireball': 3}}
Python already has standard library modules for parsing XML and JSON text (with the xml
and json
modules, respectively.) There are third-party modules for parsing TOML files, but Python 3.11 will include one in the standard library so that all Python programs can access it without the additional step of installing a third-party TOML parsing module.
As you can see in the example TOML file above, TOML is a great format for structuring data for configuration files. It doesn't have same column-and-row format that an Excel spreadsheet or CSV file would use though.
In Python 3.11, the TOML module will be named tomllib
, since a third-party module named toml
already exists. The TOML module in 3.11 will be based on the existing toml
third-party module.
Writing Python Code with tomllib
You don't have to wait for Python 3.11 to be released to use it today. You can install the third-party toml
package with the pip tool by running pip3 install --user toml
in a Terminal window on macOS and Linux, or by running pip install --user toml
in a Command Prompt window on Windows. (Don't run this from the Python interactive shell; that doesn't work.)
Much like how the json
module has functions named load()
and loads()
(which is pronounced "load s" or "load string", not "loads"), the toml
third-party library (and future tomllib
module in Python 3.11) has load()
and loads()
functions.
For example, say we had the Dungeons & Dragons text above in a TOML file named dnd_character.toml. We could read into our Python program with the following code:
import tomllib # import tomllib in Python 3.11
import pprint
with open('dnd_character.toml') as fileObj:
content = fileObj.read()
dnd_char = tomllib.loads(content)
pprint.pprint(dnd_char)
The output of this program from the pprint
"pretty print" module is as follows:
{'class': 'Magic user',
'currentHP': 2,
'cursed': False,
'inventory': ['apple', 'enchanted staff', 'healing potion'],
'maxHP': 14,
'name': 'Mordnilap',
'spells': {'featherFall': 2, 'fireball': 3}}
The toml.loads()
function takes a string that we pulled out of a text file named dnd_character.toml. This string could have come from anywhere (say, downloading it off the web with the Requests module). But since reading it from a .toml file is so common, the tomllib.load()
function can accept the file object directly:
import tomllib # import tomllib in Python 3.11
import pprint
with open('dnd_character.toml', 'rb') as fileObj:
dnd_char = tomllib.load(fileObj)
pprint.pprint(dnd_char)
Note that the file object you pass to tomllib.load()
has to be open in "read binary" mode, and not just the default "read text" mode, so you need to pass 'rb'
as the second argument to the open()
function.
Unlike the json
module's dumps()
function, there is no tomllib.dumps()
function that can create text for TOML files. This is because while JSON is meant to be human-readable and read and written by programs, TOML files are meant to be used for configuration files. These files are often read and written by humans, but only read by software. There is a third-party module called toml-w
that has a tomli-w.dumps()
function for writing TOML-structured text. The dumps()
function won't be a part of the tomllib
module in the Python 3.11 standard library though.
Writing TOML Files in a Text Editor
We've covered how your Python programs can read in TOML files. But when you write TOML by hand, you'll need to know the syntax of the TOML format. Fortunately, Tom's Obvious, Minimal Language has an obvious, minimal format that's easy to remember. (For full details, you can read the documentation on the TOML website.)
TOML files can have comments which are ignored by parsers. They look like Python comments: beginning with a #
and extending to the end of the line. Blank lines are also allowed and ignored in TOML files.
The basic key-value pairs are written like Python assignment statements, with the key and value separated by an equal sign:
name = "Mordnilap" class = "Magic user" maxHP = 14 currentHP = 2
The tomllib.loads()
function takes the above TOML and returns the following Python dictionary:
{'name': 'Mordnilap', 'class': 'Magic user', 'maxHP': 14, 'currentHP': 2}
TOML has a few standard data types common to most programming languages, such as strings enclosed in single or double quotes (like 'Mordnilap'
) and integers (like 14
).
TOML also has floating point numbers with decimal points (like 3.1415
or -42.0
) and lowercase Boolean values true
and false
. Note that true
and false
are written differently in TOML than Python's True
and False
. TOML isn't a Python-specific format, but a general format that can be parsed by many programming languages. In this case, the lowercase true
and false
is similar to JavaScript and JSON's way of writing Boolean values.
Dates and times are also first-class values in TOML; they have a specific format (detailed in RFC 3339) and you don't need to write them as strings enclosed in quotes. There are a few different ways of writing them. For dat and timestamps that include the time zone, you can use the Z
suffix to mean UTC. Or you can specify an offset from UTC:
odt1 = 1979-05-27T07:32:00Z # UTC time zone odt2 = 1979-05-27T00:32:00-07:00 # UTC minus 7 hours
The tomllib.loads()
function takes the above TOML and returns the following Python dictionary:
{'odt1': datetime.datetime(1979, 5, 27, 7, 32, tzinfo=datetime.timezone.utc),
'odt2': datetime.datetime(1979, 5, 27, 0, 32, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200)))}
You can also have arrays in TOML, such as our example's inventory = ["apple", "enchanted staff", "healing potion"]
producing the Python dictionary {"inventory": ["apple", "enchanted staff", "healing potion"]}
. Arrays can also include other arrays.
Instead of dictionaries, TOML has tables which can be formed under a square-bracketed table headers:
[Mordnilap] class = "Magic user" maxHP = 14 currentHP = 2 [Bilbo] class = "Fighter" maxHP = 16 currentHP = 16
The tomllib.loads()
function takes the above TOML and returns the following Python dictionary:
{'Bilbo': {'class': 'Fighter', 'currentHP': 16, 'maxHP': 16},
'Mordnilap': {'class': 'Magic user', 'currentHP': 2, 'maxHP': 14}}
You can also use dotted notation to create these dictionaries. The following TOML produces an identical dictionary to the previous example:
Mordnilap.class = "Magic user" Mordnilap.maxHP = 14 Mordnilap.currentHP = 2 Bilbo.class = "Fighter" Bilbo.maxHP = 16 Bilbo.currentHP = 16
But you can also use more compact inline tables in TOML:
Mordnilap = { "class" = "Magic user", "maxHP" = 14, "currentHP" = 2 } Bilbo = { "class" = "Fighter", "maxHP" = 16, "currentHP" = 16 }
There's several other aspects to the TOML syntax which you can read in the TOML documentation, but I hope this blog post has given you a good introduction to this helpful tool that will be part of the Python standard library as the tomllib
module as of version 3.11.