18CSV, JSON, AND XML FILES

CSV, JSON, and XML are all data serialization formats used to store data as plaintext files. Whether you’re dealing with spreadsheets exported as CSV files or web APIs returning JSON data, being able to read, write, and edit serialization formats with Python code will help you automate data-related tasks with ease.

A simple drawing of a light bulb. LEARNING OBJECTIVES

  • Understand the popular plaintext data serialization formats CSV, JSON, and XML.
  • Know the difference between the spreadsheet-like CSV format and the data structure–like JSON and XML formats.
  • Know the benefits of using the Python packages for the CSV, JSON, and XML formats over writing and reading these plaintext files directly.
  • Be able to use the advanced features of the csv module, such as DictReader and DictWriter objects.
  • Understand the role of JSON and XML data in accessing online service APIs.

A grey circle with a white question mark at the center Practice Questions

These questions test your understanding of Python’s csv, json, and xml modules for working with text-based file formats.

The CSV Format

Each line in a CSV file (which uses the .csv file extension) represents a row in a spreadsheet, and commas separate the cells in the row. Many apps and programming languages support these files, making them a straightforward way to represent spreadsheet data.

  1. 1. Of the CSV, JSON, and XML formats, which most closely resembles a spreadsheet?

  2. 2. What does CSV stand for?

  3. 3. Are CSV files plaintext files? Can you view them in a text editor such as Notepad or TextEdit?

  4. 4. What data types do CSV files support?

  5. 5. CSV files don’t have as many features as Excel spreadsheet files. What is their main advantage?

  6. 6. True or false: The data in a CSV file cannot include commas.

  7. 7. Passing the filename of a CSV file to csv.reader() and csv.writer() doesn’t seem to work. Why?

  8. 8. The csv.reader() function returns a reader object. How can you get a list value containing all the rows of data from this object?

  9. 9. The csv.reader() function returns a reader object. How can you loop over each row of data from this object?

  10. 10. Assume the reader object’s data has been converted into a list value stored in a variable named example_data. Which row and column does example_data[6][1] access?

  11. 11. What data type can you pass to the writerow() method of writer objects?

  12. 12. A reader object represents each row as a list of strings. How does a DictReader object represent each row?

  13. 13. When you read a CSV file with a DictReader object, where does the object get its keys?

  14. 14. What data type can you pass to the writerow() method of DictWriter objects?

  15. 15. What is wrong with the following code?

    file_obj = open('example.csv')
    writer = csv.writer(file_obj)
  16. 16. What does TSV stand for?

  17. 17. How can you make the csv module read a TSV file?

Versatile Plaintext Formats

While CSV files are useful for storing rows of data that have the exact same columns, the JSON and XML formats can store a variety of data structures. These formats aren’t specific to Python; many programming languages have functions for reading and writing this kind of data. Both JSON and XML organize data using the equivalent of nested Python dictionaries and lists.

  1. 18. What is one term for dictionary-like data structures used in other programming languages?

  2. 19. What is a term for list-like data structures used in other programming languages?

  3. 20. Plaintext formats like JSON and XML aren’t disk-space efficient. What advantage do they have?

  4. 21. What does JSON stand for?

  5. 22. What does XML stand for?

  6. 23. Which format resembles Python syntax more closely, JSON or XML?

  7. 24. Of ['cat', 'dog',] and ["cat", "dog"], which is Python syntax and which is JSON syntax?

  8. 25. Of [True, False] and [true, false], which is Python syntax and which is JSON syntax?

  9. 26. Which format do APIs usually use to deliver their responses, JSON or XML?

  10. 27. In the JSON module, what does the s in loads() and dumps() stand for?

  11. 28. What code takes the Python dictionary {'temperature': 72} and returns the Python string '{"temperature": 72}'?

  12. 29. What other markup language does XML syntax resemble?

  13. 30. What is the closing XML tag for <spam>?

  14. 31. Why is <person><name>Alice Doe</person></name> invalid XML syntax?

  15. 32. Write the following Python data as JSON: {"address": {"street": "100 Larkin St.", "city": "San Francisco", "zip": "94102"}}.

  16. 33. Write the following Python data as XML: {"address": {"street": "100 Larkin St.", "city": "San Francisco", "zip": "94102"}}.

  17. 34. What are the names of the XML attributes in the following XML: <address street="100 Larkin St." city="San Francisco" zip="94102" />?

  18. 35. What is the name of the first element in an XML document that contains all other elements?

  19. 36. What does DOM stand for?

  20. 37. What does SAX stand for?

  21. 38. What is the name of the approach to reading XML documents that reads XML elements one at a time, instead of loading the entire XML document into memory?

  22. 39. What is the benefit of the DOM approach of reading XML documents?

  23. 40. After running import xml.etree.ElementTree as ET, what function can you call to obtain an Element object of the root element in a string of XML?

  24. 41. What is wrong with this code: ET.tostring('<person>Albert</person>', encoding='utf-8')?

  25. 42. The tostring() function returns a bytes object instead of a string. If the person variable contains an Element object, what code returns the XML string of this element?

  26. 43. Assume root stores an Element object. What does the following code do?

    for elem in root.iter():
        print(elem.tag, elem.text)
  27. 44. What does the following code do, assuming root stores an Element object?

    for elem in root.iter('number'):
        print(elem.tag, elem.text)
  28. 45. Assume you have an Element object for the XML <person></person>. What is the value of its text attribute, given the element has no text?

A simple drawing of a sharpened pencil. Practice Projects

Practice working with CSV, JSON, and XML files as you complete the following projects.

Fizz Buzz (CSV)

The Fizz Buzz program is a classic practice problem. The goal is to generate a series of numbers starting at 1 and going up. If the number is divisible by 3, the program prints Fizz. If the number is divisible by 5, the program prints Buzz. If the number is divisible by both 3 and 5, the program prints Fizz Buzz. Otherwise, the program prints the number. For example, the beginning of the series would look like this:

1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11. Fizz, 13, 14, Fizz Buzz...

Write a program that plays Fizz Buzz for the numbers 1 to 10,000 and outputs the results to a file named fizzBuzz.csv. There should be 10 entries in each row, and 1,000 rows total, so that the file contents look like this:

1,2,Fizz,4,Buzz,Fizz,7,8,Fizz,Buzz
11,Fizz,13,14,Fizz Buzz,16,17,Fizz,19,Buzz
Fizz,22,23,Fizz,Buzz,26,Fizz,28,29,Fizz Buzz
--snip--

Save this program in a file named fizzBuzzCSV.py.

Guess the Number Statistics (CSV)

Let’s modify the Guess the Number game from Chapter 3 of Automate the Boring Stuff with Python so that it collects statistics about the player’s performance in a CSV file. The original program challenges the player to guess a random number between 1 and 20. The player has only six guesses to get it right, but the program tells them if their guess is too high or too low.

Our modified program records the guesses the player made, whether they won or lost the game, and how many guesses it took. Other programs can read the data in this CSV file to perform further analysis. Copy the original game source code from the downloadable materials at https://nostarch.com/automate-boring-stuff-python-3rd-edition (or re-create the game yourself) and modify it with the following features:

  • For each game played, the program should add one row to a guessStats.csv file.
  • If this file doesn’t exist, the program should create a blank file named guessStats.csv and write ['Secret Number', 'Won', 'Attempts', 'Guess 1', 'Guess 2', 'Guess 3', 'Guess 4', 'Guess 5', 'Guess 6'] in the header row.
  • In the 'Secret Number' column of each subsequent row, the program should store the randomly selected integer that the game produced.
  • The value in the 'Won' column should be True if the player guessed the secret number and won the game; otherwise, it should be False.
  • The 'Attempts' column should record how many guesses the player made. For example, if the player won on their first guess, the number in this column should be 1. If the player lost the game and didn’t guess the number within the allowed six guesses, the number in this column should be 6.
  • The 'Guess 1', 'Guess 2', and remaining columns should contain the number that the player guessed.

Once you’ve finished, play a few games to generate data for the program. The guessStats.csv file should look something like this:

Secret Number,Won,Attempts,Guess 1,Guess 2,Guess 3,Guess 4,Guess 5,Guess 6
15,True,2,10,15
16,True,4,10,15,18,16
8,False,6,1,2,3,4,5,20

Now you can use this program to conduct important number-guessing research and answer profound questions like “How often are six guesses enough to win?” and “They didn’t start with a guess of 10? Really? Really?”

Note that if you want your CSV writer object to append values to the guessStats.csv file instead of overwriting it, you should open the file in append mode using the 'a' argument, as in csv_file = csv.writer(open('guessStats.csv', 'a', newline='')).

Save this program in a file named guessStatsCSV.py.

Guess the Number Statistics (JSON)

This project is similar to the previous “Guess the Number Statistics (CSV)” project, except it should save the statistics to a guessStats.json file. The JSON data should be formatted as a list of dictionaries, each representing a single game. Each dictionary should have the keys 'Secret Number', 'Won', and 'Guesses' as follows:

  • The 'Secret Number' key’s value is the randomly selected number for the player to guess, stored as an integer.
  • The 'Won' key’s value is a Boolean indicating whether the player won or lost.
  • The 'Guesses' value is a list of the player’s guesses, in order. The length of this list can be anywhere from one to six integers.

Once you’ve finished the program, play a few games to generate data. The guessStats.json file should look something like this:

[{"Secret Number": 18, "Guesses": [10, 15, 18], "Won": true}, {"Secret
Number": 14, "Guesses": [10, 14], "Won": true}, {"Secret Number": 14,
"Guesses": [2, 4, 5, 7, 8, 9], "Won": false}]

Note that your program must read in the existing JSON data in order to append the new game’s statistics to it. Then, it should write the entire statistics log back to the guessStats.json file.

Save this program in a file named guessStatsJSON.py.

Guess the Number Statistics (XML)

This project is similar to the previous two projects, except it should save the statistics to a guessStats.xml file. Format the XML data as follows:

  • Create a root <stats> element containing a series of <game> elements, each representing a single game.
  • Give each <game> element the attributes 'secret_number' and 'won'.
  • The 'secret_number' attribute’s value should be the randomly selected number for the player to guess, represented as an integer.
  • The 'won' attribute’s value should be either 'True' or 'False', depending on whether the player won or lost.
  • Each <game> element should have a series of <guess> elements representing a player’s guess. There can be anywhere from one to six <guess> elements in each <game> element.

Once you’ve finished the program, play a few games to generate data. The guessStats.xml file should look something like this:

<stats><game secret_number="4" won="True"><guess>10</guess><guess>5</guess>
<guess>6</guess><guess>9</guess><guess>3</guess><guess>4</guess></game>
<game secret_number="12" won="False"><guess>2</guess><guess>4</guess><guess>5
</guess><guess>6</guess><guess>8</guess><guess>10</guess></game></stats>

Save this program in a file named guessStatsXML.py.