This blog post examines different ways that Python lists and dictionaries can be used to represent a 2D data structure. I also write some test programs to measure the performance of each data structure.
A two-dimensional or 2D grid is used in a variety of applications. Think of chess boards, top-down video games, spreadsheets, Conway's Game of Life simulation are all examples of data that is stored in a two-dimensional grid. We can use a Cartesian coordinate system to create unique "addresses" for each item in the grid. The x coordinate is the horizontal address and the y coordinate is the vertical address.
The Cartesian coordinate system in programming is different from the one you may have learned about in math class. The origin (that is, the (0, 0) coordinate) is in the top-left corner of the screen, and while the x coordinates increase going to the right as in mathematics, the y coordinates increase going down rather than increase going up. In a spreadsheet program like Excel, the x coordinates may be represented by letters instead of numbers, but we'll use numbers for both the x and y coordinates. When listed together, the x coordinate comes first. In the coordinates (2, -5), 2 is the x coordinate and -5 is the y coordinate.
As an aside, here's a list of Python projects that utilize a 2D data structure that come from my free book, The Big Book of Small Python Projects:
- Project #13 - Conway's Game of Life
- Project #23 - Etching Drawer
- Project #37 - Hungry Robots
- Project #39 - Langton's Ants
- Project #44 - Maze Runner 2D
- Project #73 - Sudoku Puzzle
- Project #79 - 2048
The 2D Data Structures
By "2D data structure" I mean a data structure that contains other values the way that lists and dictionaries contain other values. While the data in lists can be accessed by an integer index and the data in dictionaries can be accessed by a key value, the data in our 2D data structures will be accessed by two integers: the x and y coordinates.
I'll be comparing three different data structures in this blog post:
- A "1D list", where the data is stored in a Python list. The data at the coordinates (x, y) are stored at the index
grid[y * WIDTH + x].
- A "2D list", where the data is stored in a Python list of lists. The data at the coordinates (x, y) are stored at
- A dictionary, where the data is stored in... a Python dictioanry. The data at the coordiantes (x, y) are stored at
grid[(x, y)]. (We can just specify
grid[x, y]and Python will automaticall interpret this as a tuple containing
There are a few advantages and disadvantages that I can see off the top of my head:
- The 1D list and 2d list must have a fixed width and height. The dictionary can store data at any arbitrary coordinates.
- The 1D list and 2d list use the same full amount of memory no matter how empty or full they are. The dictionary only uses up as much memory as it contains data.
- The 2D lists can be tricky to work with, especially mixing the x and y coordinates with each other.
- This is conjecture, but I think that as the dictionary becomes full, it uses up more memory than the 1D or 2D lists.
- This is conjectue, but I think the dictionary might be slower than the lists at accessing and storing data.
Without going into the specifics of Big O algorithm analysis (which you can learn about in Chapter 13 of my free book, Beyond the Basic Stuff with Python), accessing and storing data is a constant time operation for lists, lists of lists, and dictionaries. This means that it generally doesn't take longer to access or store data in lists or dictionaries as they fill up with data.
However, I'm more interested in the specific performance metrics of these as well as the memory usage. I'm going to write tests to measure these for these three different approaches to storing data in a grid. You can download and run these tests yourself on your computer. I'm running them with Python 3.10.0 on my T480s Thinkpad laptop running Windows 10.
I use Python's
timeit module to measure the performance of the test code. You can also learn about this module in Beyond the Basic Stuff with Python.
Here's the gridtest.py program I wrote to measure the runtime speed and memory usage of these three 2D grid data structures. I put the output I got next to its respective
The 2D list approach was the fastest and the dictionary approach was the slowest and used 10x as much memory as the 1D and 2D lists. But the dictionary approach gives you the flexibility of unbounded grids while the 1D and 2D lists have fixed width and height. The 1D list's requirement to calculate the index actually made it slower than the dictionary.
So if you need to have a 2D grid data structure, use the list-of-lists approach, unless you need an unbounded grid. In that case, the dictionary approach is significantly slower but offers this flexibility. Or, if performance isn't important, the dictionary approach has the easiest implementation.
In my personal view, ease of implementation and debuggability are the most important factors and my use cases don't tend to be at large enough scales where the performance differences are significant. I'd go with the dictionary approach.