Most computer users have interacted with structured data using spreadsheets. Even if you’re an experienced Microsoft Excel user, however, you’ll never be as fast or as accurate as a computer program. Unless your idea of a good time is sifting through large spreadsheets (and no judgment if it is), you should master Python’s OpenPyXL package so that you can automate your boring Excel tasks.
Practice QuestionsThese questions test your understanding of reading and writing data in Excel files using the OpenPyXL package.
An Excel spreadsheet document is called a workbook, and a single workbook exists in a file with the .xlsx extension. Each workbook can contain multiple sheets (also called worksheets). Each sheet has columns (addressed using letters starting at A) and rows (addressed using numbers starting at 1). A box at a particular column and row is called a cell. Each cell can contain a number or text value. The grid of cells and their data make up a sheet. OpenPyXL allows your Python programs to access this data by cell address, just as you can access the values in a dictionary by their key.
Answer the following questions about reading the data in Excel spreadsheets. Where relevant, imagine that a variable named sheet stores a Worksheet object.
1. Which object represents a .xlsx file: a Workbook or a Worksheet?
2. What is the active worksheet?
3. Do worksheets have titles? Do workbooks?
4. For a Workbook object stored in a variable named wb, what code evaluates to a list of strings of all Worksheet titles?
5. If you have a Cell object in a variable named c, what is the data type of c.column?
6. Does the code sheet.cell(row=1, column=2) refer to the cell B1 or the cell C2?
7. What Worksheet object attributes store the highest row and column in the worksheet?
8. Use the get_column_letter() function to calculate the column letters for column 900.
9. Use the column_index_from_string() function to calculate the column number for column ZZ.
10. How many cells does sheet['A1': 'C3'] evaluate to?
11. What code evaluates to a list of all the Cell objects in column C?
OpenPyXL also provides ways of writing data, meaning your programs can create and edit spreadsheet files. Anytime you modify the Workbook object or its sheets and cells, you must call the save() workbook method to save the spreadsheet file. Answer the following questions about creating and editing data in Excel spreadsheets. Where relevant, imagine that a variable named sheet stores a Worksheet object. To store a Workbook object, use a variable named wb.
12. What function creates a new Workbook object?
13. What Workbook method creates a new, empty Worksheet object?
14. How do you delete Worksheet objects?
15. What does the code sheet.title = 'New Title' do?
16. If you’ve opened a file named example.xlsx with OpenPyXL and made changes to it, how can you keep the original spreadsheet without the changes?
17. What does this code do: wb.create_sheet(index=len(wb.sheetnames))?
18. How do you set the value of cell A3 in a Worksheet object to the text “Hello”?
19. Does sheet.cell(row=1, column=1).value = 42 change the value in cell A1 or cell B2?
Styling certain cells, rows, or columns can help you emphasize important areas in your spreadsheet. To customize font styles in cells, import the Font() function from the openpyxl.styles module. You can assign Font objects to a Cell object’s font attribute to apply the font style. Answer the following questions about creating custom font styles with OpenPyXL. Imagine that a variable named sheet stores a Worksheet object.
20. How do you import the Font() function in OpenPyXL?
21. What are four keyword arguments you can use when creating Font objects?
22. What code would create a Font object that represents italicized font with a size of 24 points?
23. What code would create a Font object that represents a bold Times New Roman font?
24. How do you set the font for cell B3 to a Font object stored in a variable named font?
25. If you set a cell’s font style to a 200-point Font object, will the height of the cell increase to accommodate this large font?
26. After importing the Font() function in the interactive shell, run help(Font). In the documentation that appears, notice that there are more keyword arguments than mentioned in Chapter 14 of Automate the Boring Stuff with Python. One of them is strike. Create a Font object with strike=True and apply it to a cell in a Worksheet object, then open this spreadsheet in Excel or another spreadsheet app. Describe what strike=True did to the cell’s font style.
Excel formulas, which begin with an equal sign (=), can configure cells to contain values calculated from other cells. OpenPyXL can place formulas into cells, but it doesn’t have the ability to calculate Excel formulas or populate cells with the results. You’ll need to open the spreadsheet in Excel to have it run the formulas and save their results.
27. What character do cells with Excel formulas begin with?
28. Name one Excel formula.
29. Are Excel formulas the same as Python functions?
30. Can OpenPyXL carry out the calculations of Excel formulas?
31. How can you have OpenPyXL return the resulting calculation of an Excel formula in a cell, instead of the text of the formula itself?
32. How can you have OpenPyXL return the text of the formula itself in a cell, instead of the resulting calculation?
In Excel, adjusting the sizes of rows and columns is as easy as clicking and dragging the edges of a row or column header. But if you need to set the size of a row or column based on its cells’ contents, or if you want to set sizes in a large number of spreadsheet files, it’s much quicker to write a Python program to do it. You can also hide rows and columns from view, or freeze them in place so that they’re always visible on the screen and appear on every page when you print the spreadsheet, which is handy for headers.
Answer the following questions about changing the sizes of cells, merging cells, and freezing cells with OpenPyXL. Where relevant, imagine that a variable named sheet stores a Worksheet object.
33. What code would set the height of row 3 to 100?
34. What code would set cell D2 to a square shape with sides of length 200?
35. What’s wrong with the following code: sheet.row_dimensions[1].width = 70?
36. What code would merge the cells A10 and A22?
37. What code would unmerge the cells merged in the previous question?
38. Which rows are frozen if sheet.freeze_panes = 'A2'?
39. Which rows are frozen if sheet.freeze_panes = 'A1'?
40. How do you unfreeze all rows and columns in sheet?
OpenPyXL lets you create multiple kinds of charts using the data in a sheet’s cells. Your Python code can automatically generate and insert these graphs into your Excel spreadsheets. Answer the following questions about creating different kinds of charts with OpenPyXL.
Practice ProjectsApply your new knowledge of Excel spreadsheets with these practice projects.
While you can always press ctrl-F in Excel to find specific text in a spreadsheet, you can’t search an entire folder of .xlsx spreadsheet files at once. Write a function named find_in_excel(search_text) that searches every worksheet in every .xlsx file in the current working directory (but not its subfolders) for the given search_text string.
The search should be a case-insensitive, partial text match. For example, find_in_excel('name') would match with Excel cells that contain filename or Name. For simplicity, the function should check only the active worksheet in the Excel file and treat all cell values as strings. Remember to pass in the keyword argument data_only=True to load_workbook() so that you can search calculated formula results for search_text as well.
The function should return a dictionary whose keys are filename strings and whose values are a list of strings of cell addresses containing search_text. For example, calling find_in_excel('name') could return {'example.xlsx': ['A2'], 'spam.xlsx': ['D1', 'D2']} to indicate that name was found in cell A2 in example.xlsx and in cells D1 and D2 of spam.xlsx.
You’ll want to loop over the files returned by os.listdir() to find all the spreadsheets in the current working directory, skipping the files that don’t end with .xlsx.
Save this program in a file named findAllExcel.py.
Imagine that your manager wants you to free up disk space on your hard drive. They’d like a report of all the files in your home folder, along with their sizes, in a nicely formatted Excel spreadsheet. Oh, and if you could email it to them by the end of the day, that would be great. (They will then forget to read it.)
Start this project by writing a function named get_home_folder_size() that returns a list of tuples. Each tuple should have two items: a filename string and an integer of the file’s size in bytes. Using the os.listdir() function and the stat() method of Path objects discussed in Chapter 10 of Automate the Boring Stuff with Python, your program can examine every file in the home folder.
Note that you should enclose the call to stat() in try and except blocks so that you can skip the file if file permissions or some other issue causes an error. You can write the function code yourself or use the following template:
import openpyxl, os from pathlib import Path def get_home_folder_size(): filenames_and_sizes = [] # Loop over everything in the home folder: for filename in os.listdir(Path.home()): absolute_file_path = Path.home() / filename # Skip folders/directories: if absolute_file_path.is_dir(): continue # Get file size: try: file_size = absolute_file_path.stat().st_size except Exception: # Skip files with permissions errors: continue # Record filename and size: filenames_and_sizes.append((filename, file_size)) return filenames_and_sizes # Uncomment to print the hundred largest filenames and sizes: #print(get_home_folder_size()) # TODO: Write code that puts the filenames and sizes into a spreadsheet:
If you use the provided code, the returned list of tuples could look like this:
[('.bash_history', 3557), ('.python_history', 2601), ('calc.exe',
27648), ('deleteme.mp3', 7200), ('donut.py', 2519), ... ('ttt.py',
2607)]
Next, use this data to create an Excel spreadsheet. Write a function named make_excel_report(filenames_and_sizes) that calls the get_home_folder_size() function and places the results in an Excel file named homeFilesReport.xlsx. In this Excel spreadsheet, column A should list the filenames and column B should list the file sizes.
Save this program in a file named homeFilesReportExcel.py.