In this chapter, you’ll write your own recursive program to search for files according to custom needs. Your computer already has some file-searching commands and apps, but often they’re limited to retrieving files based on a partial filename. What if you need to make esoteric, highly specific searches? For example, what if you need to find all files that have an even number of bytes, or files with names that contain every vowel?
You likely will never need to do these searches specifically, but you’ll probably have odd search criteria someday. You’ll be out of luck if you can’t code this search yourself.
As you’ve learned, recursion is especially suited to problems that have a tree-like structure. The filesystem on your computer is like a tree, as you saw back in Figure 2-6. Each folder branches into subfolders, which in turn can branch into other subfolders. We’ll write a recursive function to navigate this tree.
Let’s begin by taking a look at the complete source code for the recursive file-search program. The rest of this chapter explains each section of code individually. Copy the source code for the file-search program to a file named fileFinder.py:
import os
def hasEvenByteSize(fullFilePath):
"""Returns True if fullFilePath has an even size in bytes,
otherwise returns False."""
fileSize = os.path.getsize(fullFilePath)
return fileSize % 2 == 0
def hasEveryVowel(fullFilePath):
"""Returns True if the fullFilePath has a, e, i, o, and u,
otherwise returns False."""
name = os.path.basename(fullFilePath).lower()
return ('a' in name) and ('e' in name) and ('i' in name) and ('o' in name) and ('u' in name)
def walk(folder, matchFunc):
"""Calls the match function with every file in the folder and its
subfolders. Returns a list of files that the match function
returned True for."""
matchedFiles = [] # This list holds all the matches.
folder = os.path.abspath(folder) # Use the folder's absolute path.
# Loop over every file and subfolder in the folder:
for name in os.listdir(folder):
filepath = os.path.join(folder, name)
if os.path.isfile(filepath):
# Call the match function for each file:
if matchFunc(filepath):
matchedFiles.append(filepath)
elif os.path.isdir(filepath):
# Recursively call walk for each subfolder, extending
# the matchedFiles with their matches:
matchedFiles.extend(walk(filepath, matchFunc))
return matchedFiles
print('All files with even byte sizes:')
print(walk('.', hasEvenByteSize))
print('All files with every vowel in their name:')
print(walk('.', hasEveryVowel))
The file-search program’s main function is walk()
, which “walks” across the entire span of files in a base folder and its subfolders. It calls one of two other functions that implement the custom search criteria it’s looking for. In the context of this program, we’ll call these match functions. A match function call returns True
if the file matches the search criteria; otherwise, it returns False
.
The job of the walk()
function is to call the match function once for each file in the folders it walks across. Let’s take a look at the code in more detail.
In Python, you can pass functions themselves as arguments to a function call. In the following example, a callTwice()
function calls its function argument twice, whether it’s sayHello()
or sayGoodbye()
:
Python
>>> def callTwice(func):
... func()
... func()
...
>>> def sayHello():
... print('Hello!')
...
>>> def sayGoodbye():
... print('Goodbye!')
...
>>> callTwice(sayHello)
Hello!
Hello!
>>> callTwice(sayGoodbye)
Goodbye!
Goodbye!
The callTwice()
function calls whichever function was passed to it as the func
parameter. Notice that we leave out the parentheses from the function argument, writing callTwice(sayHello)
instead of callTwice(sayHello())
. This is because we are passing the sayHello()
function itself, and not calling sayHello()
and passing its return value.
The walk()
function accepts a match function argument for its search criteria. This lets us customize the behavior of the file search without modifying the code of the walk()
function itself. We’ll take a look at walk()
later. First, let’s look at the two sample match functions in the program.
The first matching function finds files with an even byte size:
Python
import os
def hasEvenByteSize(fullFilePath):
"""Returns True if fullFilePath has an even size in bytes,
otherwise returns False."""
fileSize = os.path.getsize(fullFilePath)
return fileSize % 2 == 0
We import the os
module, which is used throughout the program to get information about the files on your computer through functions such as getsize()
, basename()
, and others. Then we create a match function named hasEvenByteSize()
. All match functions take a single string argument named fullFilePath
, and return either True
or False
to signify a match or miss.
The os.path.getsize()
function determines the size of the file in fullFilePath
in bytes. Then we use the %
modulus operator to determine whether this number is even. If it’s even, the return
statement returns True
; if it’s odd, it returns False
. For example, let’s consider the size of the Notepad application that comes with the Windows operating system (on macOS or Linux, try running this function on the /bin/ls program):
Python
>>> import os
>>> os.path.getsize('C:/Windows/system32/notepad.exe')
211968
>>> 211968 % 2 == 0
True
The hasEvenByteSize()
match function can use any Python function to find more information about the fullFilePath
file. This gives you the powerful capability to write code for any search criteria you want. As walk()
calls the match function for each file in the folder and subfolders it walks across, the match function returns True
or False
for each one. This tells walk()
whether the file is a match.
Let’s take a look at the next match function:
def hasEveryVowel(fullFilePath):
"""Returns True if the fullFilePath has a, e, i, o, and u,
otherwise returns False."""
name = os.path.basename(fullFilePath).lower()
return ('a' in name) and ('e' in name) and ('i' in name) and ('o' in name) and ('u' in name)
We call os.path.basename()
to remove the folder names from the filepath. Python does case-sensitive string comparisons, which ensures that hasEveryVowel()
doesn’t miss any vowels in the filename because they are uppercase. For example, calling os.path.basename('C:/Windows/system32/notepad.exe')
returns the string notepad.exe
. This string’s lower()
method call returns a lowercase form of the string so that we have to check for only lowercase vowels in it. “Useful Python Standard Library Functions for Working with Files” later in this chapter explores some more functions for finding out information about files.
We use a return
statement with a lengthy expression that evaluates to True
if name
contains a
, e
, i
, o
, and u
, indicating the file matches the search criteria. Otherwise, the return
statement returns False
.
While the match functions check whether a file matches the search criteria, the walk()
function finds all the files to check. The recursive walk()
function is passed the name of a base folder to search along with a match function to call for each file in the folder.
The walk()
function also recursively calls itself for each subfolder in the base folder it’s searching. These subfolders become the base folder in the recursive call. Let’s ask the three questions about this recursive function:
Figure 10-1 shows an example filesystem along with the recursive calls to walk()
, which it makes with a base folder of C:\
.
Let’s take a look at the walk()
function’s code:
def walk(folder, matchFunc):
"""Calls the match function with every file in the folder and its
subfolders. Returns a list of files that the match function
returned True for."""
matchedFiles = [] # This list holds all the matches.
folder = os.path.abspath(folder) # Use the folder's absolute path.
The walk()
function has two parameters: folder
is a string of the base folder to search (we can pass '.'
to refer to the current folder the Python program is run from), and matchFunc
is a Python function that is passed a filename and returns True
if the function says it is a search match. Otherwise, the function returns False
.
The next part of the function examines the contents of folder
:
Python
# Loop over every file and subfolder in the folder:
for name in os.listdir(folder):
filepath = os.path.join(folder, name)
if os.path.isfile(filepath):
The for
loop calls os.listdir()
to return a list of the contents of the folder
folder. This list includes all files and subfolders. For each file, we create the full, absolute path by joining the folder with the name of the file or folder. If the name refers to a file, the os.path.isfile()
function call returns True
, and we’ll check to see if the file is a search match:
Python
# Call the match function for each file:
if matchFunc(filepath):
matchedFiles.append(filepath)
We call the match function, passing it the full absolute filepath of the for
loop’s current file. Note that matchFunc
is the name of one of walk()
’s parameters. If hasEvenByteSize()
, hasEveryVowel()
, or another function is passed as the argument for the matchFunc
parameter, then that is the function walk()
calls. If filepath
contains a file that is a match according to the matching algorithm, it’s added to the matches
list:
Python
elif os.path.isdir(filepath):
# Recursively call walk for each subfolder, extending
# the matchedFiles with their matches:
matchedFiles.extend(walk(filepath, matchFunc))
Otherwise, if the for
loop’s file is a subfolder, the os.path.isdir()
function call returns True
. We then pass the subfolder to a recursive function call. The recursive call returns a list of all matching files in the subfolder (and its subfolders), which are then added to the matches
list:
return matchedFiles
After the for
loop finishes, the matches
list contains all the matching files in this folder (and in all its subfolders). This list becomes the return value for the walk()
function.
Now that we’ve implemented the walk()
function and some match functions, we can run our custom file search. We pass the '.'
string, a special directory name meaning the current directory, for the first argument to walk()
so that it uses the folder the program was run from as the base folder to search:
Python
print('All files with even byte sizes:')
print(walk('.', hasEvenByteSize))
print('All files with every vowel in their name:')
print(walk('.', hasEveryVowel))
The output of this program depends on what files are on your computer, but this demonstrates how you can write code for any search criteria you have. For example, the output could look like the following:
Python
All files with even byte sizes:
['C:\\Path\\accesschk.exe', 'C:\\Path\\accesschk64.exe',
'C:\\Path\\AccessEnum.exe', 'C:\\Path\\ADExplorer.exe',
'C:\\Path\\Bginfo.exe', 'C:\\Path\\Bginfo64.exe',
'C:\\Path\\diskext.exe', 'C:\\Path\\diskext64.exe',
'C:\\Path\\Diskmon.exe', 'C:\\Path\\DiskView.exe',
'C:\\Path\\hex2dec64.exe', 'C:\\Path\\jpegtran.exe',
'C:\\Path\\Tcpview.exe', 'C:\\Path\\Testlimit.exe',
'C:\\Path\\wget.exe', 'C:\\Path\\whois.exe']
All files with every vowel in their name:
['C:\\Path\\recursionbook.bat']
Let’s take a look at some functions that could help you as you write your own match functions. The standard library of modules that comes with Python features several useful functions for getting information about files. Many of these are in the os
and shutil
modules, so your program must run import os
or import shutil
before it can call these functions.
The full filepath passed to the match functions can be broken into the base name and directory name with the os.path.basename()
and os.path.dirname()
functions. You can also call os.path.split()
to obtain these names as a tuple. Enter the following into Python’s interactive shell. On macOS or Linux, try using /bin/ls
as the filename:
Python
>>> import os
>>> filename = 'C:/Windows/system32/notepad.exe'
>>> os.path.basename(filename)
'notepad.exe'
>>> os.path.dirname(filename)
'C:/Windows/system32'
>>> os.path.split(filename)
('C:/Windows/system32', 'notepad.exe')
>>> folder, file = os.path.split(filename)
>>> folder
'C:/Windows/system32'
>>> file
'notepad.exe'
You can use any of Python’s string methods on these string values to help evaluate the file against your search criteria, such as lower()
in the hasEveryVowel()
match function.
Files have timestamps indicating when they were created, last modified, and last accessed. Python’s os.path.getctime()
, os.path.getmtime()
, and os.path.getatime()
, respectively, return these timestamps as floating-point values indicating the number of seconds since the Unix epoch, midnight on January 1, 1970, in the Coordinated Universal Time (UTC) time zone. Enter the following into the interactive shell:
Python
> import os
> filename = 'C:/Windows/system32/notepad.exe'
> os.path.getctime(filename)
1625705942.1165037
> os.path.getmtime(filename)
1625705942.1205275
> os.path.getatime(filename)
1631217101.8869188
These float values are easy for programs to use since they’re just single numbers, but you’ll need functions from Python’s time
module to make them simpler for humans to read. The time.localtime()
function converts a Unix epoch timestamp into a struct_time
object in the computer’s time zone. A struct_time
object has several attributes whose names begin with tm_
for obtaining date and time information. Enter the following into the interactive shell:
Python
>>> import os
>>> filename = 'C:/Windows/system32/notepad.exe'
>>> ctimestamp = os.path.getctime(filename)
>>> import time
>>> time.localtime(ctimestamp)
time.struct_time(tm_year=2021, tm_mon=7, tm_mday=7, tm_hour=19,
tm_min=59, tm_sec=2, tm_wday=2, tm_yday=188, tm_isdst=1)
>>> st = time.localtime(ctimestamp)
>>> st.tm_year
2021
>>> st.tm_mon
7
>>> st.tm_mday
7
>>> st.tm_wday
2
>>> st.tm_hour
19
>>> st.tm_min
59
>>> st.tm_sec
2
Note that the tm_mday
attribute is the day of the month, ranging from 1
to 31
. The tm_wday
attribute is the day of the week, starting at 0
for Monday, 1
for Tuesday, and so on, up to 6
for Sunday.
If you need a brief, human-readable string of the time_struct
object, pass it to the time.asctime()
function:
Python
>>> import os
>>> filename = 'C:/Windows/system32/notepad.exe'
>>> ctimestamp = os.path.getctime(filename)
>>> import time
>>> st = time.localtime(ctimestamp)
>>> time.asctime(st)
'Wed Jul 7 19:59:02 2021'
While the time.localtime()
function returns a struct_time
object in the local time zone, the time.gmtime()
function returns a struct_time
object in the UTC or Greenwich Mean time zone. Enter the following into the interactive shell:
Python
>>> import os
>>> filename = 'C:/Windows/system32/notepad.exe'
>>> ctimestamp = os.path.getctime(filename)
>>> import time
>>> ctimestamp = os.path.getctime(filename)
>>> time.localtime(ctimestamp)
time.struct_time(tm_year=2021, tm_mon=7, tm_mday=7, tm_hour=19,
tm_min=59, tm_sec=2, tm_wday=2, tm_yday=188, tm_isdst=1)
>>> time.gmtime(ctimestamp)
time.struct_time(tm_year=2021, tm_mon=7, tm_mday=8, tm_hour=0,
tm_min=59, tm_sec=2, tm_wday=3, tm_yday=189, tm_isdst=0)
The interaction between these os.path
functions (which return Unix epoch timestamps) and time
functions (which return struct_time
objects) can be confusing. Figure 10-2 shows the chain of code starting from the filename string and ending with obtaining the individual parts of the timestamp.
Finally, the time.time()
function returns the number of seconds since the Unix epoch to the current time.
After the walk()
function returns a list of files matching your search criteria, you may want to rename, delete, or perform another operation on them. The shutil
and os
modules in the Python standard library have functions to do this. Further, the send2trash
third-party module can also send files to your operating system’s Recycle Bin, rather than permanently deleting them.
To move a file, call the shutil.move()
function with two arguments. The first argument is the file to move, and the second is the folder to move it to. For example, you could call the following:
Python
>>> import shutil
>>> shutil.move('spam.txt', 'someFolder')
'someFolder\\spam.txt'
The shutil.move()
function returns the string of the new filepath of the file. You can also specify a filename to move and rename the file at the same time:
Python
>>> import shutil
>>> shutil.move('spam.txt', 'someFolder\\newName.txt')
'someFolder\\newName.txt'
If the second argument lacks a folder, you can just specify a new name for the file to rename it in its current folder:
Python
>>> import shutil
>>> shutil.move('spam.txt', 'newName.txt')
'newName.txt'
Note that the shutil.move()
function both moves and renames files, similar to the way the Unix and macOS mv
command both moves and renames files. There is no separate shutil.rename()
function.
To copy a file, call the shutil.copy()
function with two arguments. The first argument is the filename of the file to copy, and the second argument is the new name of the copy. For example, you could call the following:
Python
>>> import shutil
>>> shutil.copy('spam.txt', 'spam-copy.txt')
'spam-copy.txt'
The shutil.copy()
function returns the name of the copy. To delete a file, call the os.unlink()
function and pass it the name of the file to delete:
Python
>>> import os
>>> os.unlink('spam.txt')
>>>
The name unlink is used instead of delete because of the technical detail that it removes the filename linked to the file. But since most files have only one linked filename, this unlinking also deletes the file. It’s all right if you don’t understand these filesystem concepts; just know that os.unlink()
deletes a file.
Calling os.unlink()
permanently deletes the file, which can be dangerous if a bug in your program causes the function to delete the wrong file. Instead, you can use the send2trash
module’s send2trash()
function to put the file in your operating system’s Recycle Bin. To install this module, run python -m pip install --user send2trash
from the command prompt on Windows or run run python3 -m pip install
from the terminal on macOS or Linux. Once the module is installed, you’ll be able to import it with import send2trash
.
Enter the following into the interactive shell:
Python
>>> open('deleteme.txt', 'w').close() # Create a blank file.
>>> import send2trash
>>> send2trash.send2trash('deleteme.txt')
This example creates a blank file named deleteme.txt. After calling send2trash.send2trash()
(the module and function share the same name), this file is removed to the Recycle Bin.
This chapter’s file-search project uses recursion to “walk” across the contents of a folder and all its subfolders. The file-finder program’s walk()
function navigates these folders recursively, applying custom search criteria to every file in every subfolder. The search criteria are implemented as match functions, which are passed to the walk()
function. This allows us to change the search criteria by writing new functions instead of modifying the code in walk()
.
Our project had two match functions, for finding files with an even byte file size or containing every vowel in its name, but you can write your own functions to pass to walk()
. This is the power behind programming; you can create features for your own needs that are not available in commercial apps.
The documentation for Python’s built-in os.walk()
function (similar to the walk()
function in the file-finder project) is at https://docs.python.org/3/library/os.html#os.walk. You can also learn more about your computer’s filesystem and Python’s file functions in Chapter 9 of my book Automate the Boring Stuff with Python, 2nd edition (No Starch Press, 2019) at https://automatetheboringstuff.com/2e/chapter9.
The datetime
module in the Python standard library also has more ways to interact with timestamp data. You can learn more about it in Chapter 17 of Automate the Boring Stuff with Python, 2nd edition at https://automatetheboringstuff.com/2e/chapter17.