Files are at the core of how computers store data. You can copy, rename, move, compress, and delete files yourself with the mouse and keyboard. But if you want to work with thousands or millions of files, you’ll save time by writing a program to do so. By learning to work with Python’s file-related features, you can automate complicated file management and minimize the potential for human error.
Practice QuestionsThese questions test your understanding of Python’s shutil, os, and zipfile modules, as well as Path objects.
The shutil module has functions that let you copy, move, rename, and delete files in your Python programs. Answer the following questions about these functions.
1. What does shutil stand for?
2. What character does Windows use to separate folders in a filepath?
3. What character does macOS and Linux use to separate folders in a filepath?
4. Which of these are actual functions: shutil.copy(), shutil.copyfile(), shutil.copytree(), or shutil.filecopy()?
5. Can the shutil.move() function move files, folders, or both?
6. What module is the makedirs() function in?
7. Is there a difference between os.makedirs('eggs') and os.makedirs (Path('eggs'))?
8. The makedirs() function normally raises an exception if the directory it tries to make already exists. What keyword argument can suppress this exception?
9. Before running code that deletes files, why should you first do a dry run?
10. What functions in the os module delete files?
11. What function in the shutil module deletes an entire folder and its contents?
12. Do the deletion functions in the os and shutil modules delete files and folders permanently, or do they move them to the recycle bin?
You can use the os.walk() function to run some code on every file in a folder and all of its subfolders. This is called walking a directory tree. Answer the following questions focusing on the os.walk() function and walking a directory tree.
13. What are the three things that the os.walk() function returns for each iteration of a for loop?
14. What argument do you pass to os.walk() to have it start from the current working directory?
15. Does the following code delete every file in the eggs folder and its subfolders?
import os
from pathlib import Path
for folder_name, subfolders, filenames in os.walk('eggs'):
for filename in filenames:
os.unlink(Path(folder_name) / filename)16. Using the os.walk() function, write code for a program that prints every subfolder in an eggs folder, including the name of the folder it resides in.
Compressing a file reduces its size, which is useful when transferring it over the internet, and since a ZIP file can contain multiple files and subfolders, it’s a handy way to package several files into one. Your Python programs can create or extract from ZIP files using functions in the zipfile module.
17. What does a .zip file contain?
18. Which of the following imports Python’s zip module: import zipfile or import ZipFile?
19. Which of the following opens a file named example.zip: zipfile.ZipFile('example.zip') or ZipFile.zipfile('example.zip')?
20. What happens if you don’t pass the compress_type=zipfile.ZIP_DEFLATED keyword argument to the write() method?
21. As you increase the compression level from 0 to 9, how is the performance of the ZipFile object affected?
22. What method gives you a list of the content in a ZIP file?
23. Can ZIP files contain folders as well as files?
24. The getinfo() method returns an object with attributes file_size and compress_size. What do these attributes represent?
25. What ZipFile method extracts the entire contents of a ZIP file to the current working directory?
26. What ZipFile method extracts a single file from a ZIP file?
27. Say you have a file named contents.txt. Write the code to put it into a ZIP file named contents.zip at the maximum compression level.
Practice ProjectsAutomate mundane tasks with these short projects.
Let’s say your coworker has a massive project consisting of hundreds of .txt files in different folders on their computer, but they want to make sure the filenames are all unique so that they don’t accidentally copy over any of them. Create a function named find_dup_filenames(folder) to find files with the same filename in different subfolders of folder. This function should return a dictionary whose keys are filenames and whose values are lists of absolute paths to the files. For example, let’s say the following files exist on your computer:
C:\Users\Al\spam.txt C:\Users\Al\eggs.txt C:\Users\Al\subfolder1\spam.txt
Calling find_dup_filenames(r'C:\Users\Al') would return this dictionary:
{'spam.txt': ['C:\\Users\\Al\\spam.txt', 'C:\\Users\\Al\\subfolder1\\spam.txt']}
Your program should call this function, then go through the returned dictionary to print duplicate filenames. Print the filename first, then each of the absolute filepaths with four spaces of indentation, so that the output looks like this:
spam.txt C:\Users\Al\spam.txt C:\Users\Al\subfolder1\spam.txt
You don’t need to test this program on your computer’s root folder; running it on every file on your computer would take too long. Instead, select your home folder or another folder containing only a handful of files and subfolders. Once you know your program works correctly, run it on your root folder to find all duplicate filenames on your computer.
Save this program in a file named dupFilename.py.
Let’s say your manager has very odd ideas about how the folders on your computer should be organized. They want you to have 26 folders named A through Z; in each of these folders, there should be another 26 folders for each letter of the alphabet. For example, the A folder would have 26 folders named AA through AZ, the B folder would have 26 folders named BA through BZ, and so on. This means you need to create 702 folders on your computer. This task is boring (and not useful), but your manager wants it done.
Create a function named make_alpha_folders(root_folder) that creates these 702 subfolders in the root_folder folder. For example, calling make_alpha_folders(r'C:\Users\Al\Desktop') would create these folders:
C:\Users\Al\Desktop\A C:\Users\Al\Desktop\A\AA C:\Users\Al\Desktop\A\AB C:\Users\Al\Desktop\A\AC --snip-- C:\Users\Al\Desktop\B C:\Users\Al\Desktop\B\BA C:\Users\Al\Desktop\B\BB --snip-- C:\Users\Al\Desktop\Z\ZA --snip-- C:\Users\Al\Desktop\Z\ZZ
Save this program in a file named alphaFolders.py.
Let’s say your coworker has thousands of ZIP files, but they only want to extract files in a particularly named folder inside the ZIP file. Write a function named extract_in_folder(zip_filename, folder) that takes the string of a ZIP filename and a string of a folder name. The function should extract only the files in that folder of the ZIP file to the current working directory. For example, if eggs.zip contains the files data1.txt, spam/data2.txt, spam/data3.txt, and bacon/data4.txt and you call extract_in_folder('eggs.zip', 'spam'), the function should extract data2.txt and data3.txt only.
Save this program in a file named extractZipFolder.py.