CSV, JSON, and XML are all data serialization formats used to store data as plaintext files. Whether you’re dealing with spreadsheets exported as CSV files or web APIs returning JSON data, being able to read, write, and edit serialization formats with Python code will help you automate data-related tasks with ease.
Practice QuestionsThese questions test your understanding of Python’s csv, json, and xml modules for working with text-based file formats.
Each line in a CSV file (which uses the .csv file extension) represents a row in a spreadsheet, and commas separate the cells in the row. Many apps and programming languages support these files, making them a straightforward way to represent spreadsheet data.
1. Of the CSV, JSON, and XML formats, which most closely resembles a spreadsheet?
2. What does CSV stand for?
3. Are CSV files plaintext files? Can you view them in a text editor such as Notepad or TextEdit?
4. What data types do CSV files support?
5. CSV files don’t have as many features as Excel spreadsheet files. What is their main advantage?
6. True or false: The data in a CSV file cannot include commas.
7. Passing the filename of a CSV file to csv.reader() and csv.writer() doesn’t seem to work. Why?
8. The csv.reader() function returns a reader object. How can you get a list value containing all the rows of data from this object?
9. The csv.reader() function returns a reader object. How can you loop over each row of data from this object?
10. Assume the reader object’s data has been converted into a list value stored in a variable named example_data. Which row and column does example_data[6][1] access?
11. What data type can you pass to the writerow() method of writer objects?
12. A reader object represents each row as a list of strings. How does a DictReader object represent each row?
13. When you read a CSV file with a DictReader object, where does the object get its keys?
14. What data type can you pass to the writerow() method of DictWriter objects?
15. What is wrong with the following code?
file_obj = open('example.csv')
writer = csv.writer(file_obj)16. What does TSV stand for?
17. How can you make the csv module read a TSV file?
While CSV files are useful for storing rows of data that have the exact same columns, the JSON and XML formats can store a variety of data structures. These formats aren’t specific to Python; many programming languages have functions for reading and writing this kind of data. Both JSON and XML organize data using the equivalent of nested Python dictionaries and lists.
18. What is one term for dictionary-like data structures used in other programming languages?
19. What is a term for list-like data structures used in other programming languages?
20. Plaintext formats like JSON and XML aren’t disk-space efficient. What advantage do they have?
21. What does JSON stand for?
22. What does XML stand for?
23. Which format resembles Python syntax more closely, JSON or XML?
24. Of ['cat', 'dog',] and ["cat", "dog"], which is Python syntax and which is JSON syntax?
25. Of [True, False] and [true, false], which is Python syntax and which is JSON syntax?
26. Which format do APIs usually use to deliver their responses, JSON or XML?
27. In the JSON module, what does the s in loads() and dumps() stand for?
28. What code takes the Python dictionary {'temperature': 72} and returns the Python string '{"temperature": 72}'?
29. What other markup language does XML syntax resemble?
30. What is the closing XML tag for <spam>?
31. Why is <person><name>Alice Doe</person></name> invalid XML syntax?
32. Write the following Python data as JSON: {"address": {"street": "100 Larkin St.", "city": "San Francisco", "zip": "94102"}}.
33. Write the following Python data as XML: {"address": {"street": "100 Larkin St.", "city": "San Francisco", "zip": "94102"}}.
34. What are the names of the XML attributes in the following XML: <address street="100 Larkin St." city="San Francisco" zip="94102" />?
35. What is the name of the first element in an XML document that contains all other elements?
36. What does DOM stand for?
37. What does SAX stand for?
38. What is the name of the approach to reading XML documents that reads XML elements one at a time, instead of loading the entire XML document into memory?
39. What is the benefit of the DOM approach of reading XML documents?
40. After running import xml.etree.ElementTree as ET, what function can you call to obtain an Element object of the root element in a string of XML?
41. What is wrong with this code: ET.tostring('<person>Albert</person>', encoding='utf-8')?
42. The tostring() function returns a bytes object instead of a string. If the person variable contains an Element object, what code returns the XML string of this element?
43. Assume root stores an Element object. What does the following code do?
for elem in root.iter(): print(elem.tag, elem.text)
44. What does the following code do, assuming root stores an Element object?
for elem in root.iter('number'):
print(elem.tag, elem.text)45. Assume you have an Element object for the XML <person></person>. What is the value of its text attribute, given the element has no text?
Practice ProjectsPractice working with CSV, JSON, and XML files as you complete the following projects.
The Fizz Buzz program is a classic practice problem. The goal is to generate a series of numbers starting at 1 and going up. If the number is divisible by 3, the program prints Fizz. If the number is divisible by 5, the program prints Buzz. If the number is divisible by both 3 and 5, the program prints Fizz Buzz. Otherwise, the program prints the number. For example, the beginning of the series would look like this:
1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11. Fizz, 13, 14, Fizz Buzz...
Write a program that plays Fizz Buzz for the numbers 1 to 10,000 and outputs the results to a file named fizzBuzz.csv. There should be 10 entries in each row, and 1,000 rows total, so that the file contents look like this:
1,2,Fizz,4,Buzz,Fizz,7,8,Fizz,Buzz
11,Fizz,13,14,Fizz Buzz,16,17,Fizz,19,Buzz
Fizz,22,23,Fizz,Buzz,26,Fizz,28,29,Fizz Buzz
--snip--
Save this program in a file named fizzBuzzCSV.py.
Let’s modify the Guess the Number game from Chapter 3 of Automate the Boring Stuff with Python so that it collects statistics about the player’s performance in a CSV file. The original program challenges the player to guess a random number between 1 and 20. The player has only six guesses to get it right, but the program tells them if their guess is too high or too low.
Our modified program records the guesses the player made, whether they won or lost the game, and how many guesses it took. Other programs can read the data in this CSV file to perform further analysis. Copy the original game source code from the downloadable materials at https://nostarch.com/automate-boring-stuff-python-3rd-edition (or re-create the game yourself) and modify it with the following features:
Once you’ve finished, play a few games to generate data for the program. The guessStats.csv file should look something like this:
Secret Number,Won,Attempts,Guess 1,Guess 2,Guess 3,Guess 4,Guess 5,Guess 6 15,True,2,10,15 16,True,4,10,15,18,16 8,False,6,1,2,3,4,5,20
Now you can use this program to conduct important number-guessing research and answer profound questions like “How often are six guesses enough to win?” and “They didn’t start with a guess of 10? Really? Really?”
Note that if you want your CSV writer object to append values to the guessStats.csv file instead of overwriting it, you should open the file in append mode using the 'a' argument, as in csv_file = csv.writer(open('guessStats.csv', 'a', newline='')).
Save this program in a file named guessStatsCSV.py.
This project is similar to the previous “Guess the Number Statistics (CSV)” project, except it should save the statistics to a guessStats.json file. The JSON data should be formatted as a list of dictionaries, each representing a single game. Each dictionary should have the keys 'Secret Number', 'Won', and 'Guesses' as follows:
Once you’ve finished the program, play a few games to generate data. The guessStats.json file should look something like this:
[{"Secret Number": 18, "Guesses": [10, 15, 18], "Won": true}, {"Secret
Number": 14, "Guesses": [10, 14], "Won": true}, {"Secret Number": 14,
"Guesses": [2, 4, 5, 7, 8, 9], "Won": false}]
Note that your program must read in the existing JSON data in order to append the new game’s statistics to it. Then, it should write the entire statistics log back to the guessStats.json file.
Save this program in a file named guessStatsJSON.py.
This project is similar to the previous two projects, except it should save the statistics to a guessStats.xml file. Format the XML data as follows:
Once you’ve finished the program, play a few games to generate data. The guessStats.xml file should look something like this:
<stats><game secret_number="4" won="True"><guess>10</guess><guess>5</guess> <guess>6</guess><guess>9</guess><guess>3</guess><guess>4</guess></game> <game secret_number="12" won="False"><guess>2</guess><guess>4</guess><guess>5 </guess><guess>6</guess><guess>8</guess><guess>10</guess></game></stats>
Save this program in a file named guessStatsXML.py.