“Why do security police grab people and torture them? To get their information. And hard disks put up no resistance to torture. You need to give the hard disk a way to resist. That’s cryptography.”
—Patrick Ball, Human Rights Data Analysis Group
In previous chapters, our programs have only worked on small messages that we type directly into the source code as string values. The cipher program we’ll make in this chapter will allow you to encrypt and decrypt entire files, which can be millions of characters in size.
The transposition file cipher program encrypts and decrypts plain (unformatted) text files. These are the kind of files that only have text data and usually have the .txt file extension. You can write your own text files with programs such as Notepad on Windows, TextEdit on macOS, and gedit on Linux. (Word processing programs can produce plain text files as well, but keep in mind that they won’t save any font, size, color, or other formatting.) You can even use IDLE’s file editor by saving the files with a .txt extension instead of the usual .py extension.
For some samples, you can download text files from https://www.nostarch.com/crackingcodes/. These sample text files are of books that are now in the public domain and legal to download and use. For example, Mary Shelley’s classic novel Frankenstein has more than 78,000 words in its text file! To type this book into an encryption program would take a lot of time, but by using the downloaded file, the program can do the encryption in a couple of seconds.
As with the transposition cipher–testing program, the transposition file cipher program imports the transpositionEncrypt.py and transpositionDecrypt.py files so it can call the encryptMessage() and decryptMessage() functions. As a result, you don’t have to retype the code for these functions in the new program.
Open a new file editor window by selecting File▸New File. Enter the following code into the file editor and save it as transpositionFileCipher.py. Then download frankenstein.txt from https://www.nostarch.com/crackingcodes/ and place this file in the same folder as the transpositionFileCipher.py file. Press F5 to run the program.
transposition
FileCipher.py
1. # Transposition Cipher Encrypt/Decrypt File
2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
3.
4. import time, os, sys, transpositionEncrypt, transpositionDecrypt
5.
6. def main():
7. inputFilename = 'frankenstein.txt'
8. # BE CAREFUL! If a file with the outputFilename name already exists,
9. # this program will overwrite that file:
10. outputFilename = 'frankenstein.encrypted.txt'
11. myKey = 10
12. myMode = 'encrypt' # Set to 'encrypt' or 'decrypt'.
13.
14. # If the input file does not exist, the program terminates early:
15. if not os.path.exists(inputFilename):
16. print('The file %s does not exist. Quitting...' % (inputFilename))
17. sys.exit()
18.
19. # If the output file already exists, give the user a chance to quit:
20. if os.path.exists(outputFilename):
21. print('This will overwrite the file %s. (C)ontinue or (Q)uit?' %
(outputFilename))
22. response = input('> ')
23. if not response.lower().startswith('c'):
24. sys.exit()
25.
26. # Read in the message from the input file:
27. fileObj = open(inputFilename)
28. content = fileObj.read()
29. fileObj.close()
30.
31. print('%sing...' % (myMode.title()))
32.
33. # Measure how long the encryption/decryption takes:
34. startTime = time.time()
35. if myMode == 'encrypt':
36. translated = transpositionEncrypt.encryptMessage(myKey, content)
37. elif myMode == 'decrypt':
38. translated = transpositionDecrypt.decryptMessage(myKey, content)
39. totalTime = round(time.time() - startTime, 2)
40. print('%sion time: %s seconds' % (myMode.title(), totalTime))
41.
42. # Write out the translated message to the output file:
43. outputFileObj = open(outputFilename, 'w')
44. outputFileObj.write(translated)
45. outputFileObj.close()
46.
47. print('Done %sing %s (%s characters).' % (myMode, inputFilename,
len(content)))
48. print('%sed file is %s.' % (myMode.title(), outputFilename))
49.
50.
51. # If transpositionCipherFile.py is run (instead of imported as a module),
52. # call the main() function:
53. if __name__ == '__main__':
54. main()
When you run the transpositionFileCipher.py program, it should produce this output:
Encrypting...
Encryption time: 1.21 seconds
Done encrypting frankenstein.txt (441034 characters).
Encrypted file is frankenstein.encrypted.txt.
A new frankenstein.encrypted.txt file is created in the same folder as transpositionFileCipher.py. When you open this file with IDLE’s file editor, you’ll see the encrypted contents of frankenstein.py. It should look something like this:
PtFiyedleo a arnvmt eneeGLchongnes Mmuyedlsu0#uiSHTGA r sy,n t ys
s nuaoGeL
sc7s,
--snip--
Once you have an encrypted text, you can send it to someone else to decrypt it. The recipient will also need to have the transposition file cipher program.
To decrypt the text, make the following changes to the source code (in bold) and run the transposition file cipher program again:
7. inputFilename = 'frankenstein.encrypted.txt'
8. # BE CAREFUL! If a file with the outputFilename name already exists,
9. # this program will overwrite that file:
10. outputFilename = 'frankenstein.decrypted.txt'
11. myKey = 10
12. myMode = 'decrypt' # Set to 'encrypt' or 'decrypt'.
This time when you run the program, a new file named frankenstein.decrypted.txt that is identical to the original frankenstein.txt file will appear in the folder.
Before we dive into the code for transpositionFileCipher.py, let’s examine how Python works with files. The three steps to reading the contents of a file are opening the file, reading the file content into a variable, and closing the file. Similarly, to write new content in a file, you must open (or create) the file, write the new content, and close the file.
Python can open a file to read from or write to using the open() function. The open() function’s first parameter is the name of the file to open. If the file is in the same folder as the Python program, you can just use the file’s name, such as 'thetimemachine.txt'. The command to open thetimemachine.txt if it existed in the same folder as your Python program would look like this:
fileObj = open('thetimemachine.txt')
A file object is stored in the fileObj variable, which will be used to read from or write to the file.
You can also specify the absolute path of the file, which includes the folders and parent folders that the file is in. For example, 'C:\\Users\\Al\\frankenstein.txt' (on Windows) and '/Users/Al/frankenstein.txt' (on macOS and Linux) are absolute paths. Remember that on Windows the backslash (\) must be escaped by typing another backslash before it.
For example, if you wanted to open the frankenstein.txt file, you would pass the path of the file as a string for the open() function’s first parameter (and format the absolute path according to your operating system):
fileObj = open('C:\\Users\\Al\\frankenstein.txt')
The file object has several methods for writing to, reading from, and closing the file.
For the encryption program, after reading in the text file’s content, you’ll need to write the encrypted (or decrypted) content to a new file, which you’ll do by using the write() method.
To use write() on a file object, you need to open the file object in write mode, which you do by passing open() the string 'w' as a second argument. (This second argument is an optional parameter because the open() function can still be used without passing two arguments.) For example, enter the following line of code into the interactive shell:
>>> fileObj = open('spam.txt', 'w')
This line creates a file named spam.txt in write mode so you can edit it. If a file of the same name exists where the open() function creates the new file, the old file is overwritten, so be careful when using open() in write mode.
With spam.txt now open in write mode, you can write to the file by calling the write() method on it. The write() method takes one argument: a string of text to write to the file. Enter the following into the interactive shell to write 'Hello, world!' to spam.txt:
>>> fileObj.write('Hello, world!')
13
Passing the string 'Hello, world!' to the write() method writes that string to the spam.txt file and then prints 13, the number of characters in the string written to the file.
When you’re finished working with a file, you need to tell Python you’re done with the file by calling the close() method on the file object:
>>> fileObj.close()
There is also an append mode, which is like write mode except append mode doesn’t overwrite the file. Instead, strings are written to the end of the content already in the file. Although we won’t use it in this program, you can open a file in append mode by passing the string 'a' as the second argument to open().
If you get an io.UnsupportedOperation: not readable error message when you try calling write() on a file object, you might not have opened the file in write mode. When you don’t include the open() function’s optional parameter, it automatically opens the file object in read mode ('r') instead, which allows you to use only the read() method on the file object.
The read() method returns a string containing all the text in the file. To try it out, we’ll read the spam.txt file we created earlier with the write() method. Run the following code from the interactive shell:
>>> fileObj = open('spam.txt', 'r')
>>> content = fileObj.read()
>>> print(content)
Hello world!
>>> fileObj.close()
The file is opened, and the file object that is created is stored in the fileObj variable. Once you have the file object, you can read the file using the read() method and store it in the content variable, which you then print. When you’re done with the file object, you need to close it with close().
If you get the error message IOError: [Errno 2] No such file or directory, make sure the file actually is where you think it is and double-check that you typed the filename and folder name correctly. (Directory is another word for folder.)
We’ll use open(), read(), write(), and close() on the files that we open to encrypt or decrypt in transpositionFileCipher.py.
The first part of the transpositionFileCipher.py program should look familiar. Line 4 is an import statement for the programs transpositionEncrypt.py and transpositionDecrypt.py as well as Python’s time, os, and sys modules. Then we start main() by setting up some variables to use in the program.
1. # Transposition Cipher Encrypt/Decrypt File
2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
3.
4. import time, os, sys, transpositionEncrypt, transpositionDecrypt
5.
6. def main():
7. inputFilename = 'frankenstein.txt'
8. # BE CAREFUL! If a file with the outputFilename name already exists,
9. # this program will overwrite that file:
10. outputFilename = 'frankenstein.encrypted.txt'
11. myKey = 10
12. myMode = 'encrypt' # Set to 'encrypt' or 'decrypt'.
The inputFilename variable holds a string of the file to read, and the encrypted (or decrypted) text is written to the file named in outputFilename. The transposition cipher uses an integer for a key, which is stored in myKey. The program expects myMode to store 'encrypt' or 'decrypt' to tell it to encrypt or decrypt the inputFilename file. But before we can read from the inputFilename file, we need to check that it exists using os.path.exists().
Reading files is always harmless, but you need to be careful when writing to files. Calling the open() function in write mode on a filename that already exists overwrites the original content. Using the os.path.exists() function, your programs can check whether or not that file already exists.
The os.path.exists() function takes a single string argument for a filename or a path to a file and returns True if the file already exists and False if it doesn’t. The os.path.exists() function exists inside the path module, which exists inside the os module, so when we import the os module, the path module is imported, too.
Enter the following into the interactive shell:
>>> import os
➊ >>> os.path.exists('spam.txt')
False
>>> os.path.exists('C:\\Windows\\System32\\calc.exe') # Windows
True
>>> os.path.exists('/usr/local/bin/idle3') # macOS
False
>>> os.path.exists('/usr/bin/idle3') # Linux
False
In this example, the os.path.exists() function confirms that the calc.exe file exists in Windows. Of course, you’ll only get these results if you’re running Python on Windows. Remember to escape the backslash in a Windows file path by typing another backslash before it. If you’re using macOS, only the macOS example will return True, and only the last example will return True for Linux. If the full file path isn’t given ➊, Python will check the current working directory. For IDLE’s interactive shell, this is the folder that Python is installed in.
We use the os.path.exists() function to check that the filename in inputFilename exists. Otherwise, we have no file to encrypt or decrypt. We do this in lines 14 to 17:
14. # If the input file does not exist, then the program terminates early:
15. if not os.path.exists(inputFilename):
16. print('The file %s does not exist. Quitting...' % (inputFilename))
17. sys.exit()
If the file doesn’t exist, we display a message to the user and then quit the program.
Next, the program checks whether a file with the same name as outputFilename exists, and if so, it asks the user to type C if they want to continue running the program or Q to quit the program. Because a user might type various responses, such as 'c', 'C', or even the word 'Continue', we want to make sure the program will accept all of these versions. To do this, we’ll use more string methods.
The upper() and lower() string methods will return the string they are called on in all uppercase or all lowercase letters, respectively. Enter the following into the interactive shell to see how the methods work on the same string:
>>> 'Hello'.upper()
'HELLO'
>>> 'Hello'.lower()
'hello'
Just as the lower() and upper() string methods return a string in lowercase or uppercase, the title() string method returns a string in title case. Title case is where the first character of every word is uppercase and the rest of the characters are lowercase. Enter the following into the interactive shell:
>>> 'hello'.title()
'Hello'
>>> 'HELLO'.title()
'Hello'
>>> 'extra! extra! man bites shark!'.title()
'Extra! Extra! Man Bites Shark!'
We’ll use title() a little later in the program to format messages we output for the user.
The startswith() method returns True if its string argument is found at the beginning of the string. Enter the following into the interactive shell:
>>> 'hello'.startswith('h')
True
>>> 'hello'.startswith('H')
False
>>> spam = 'Albert'
➊ >>> spam.startswith('Al')
True
The startswith() method is case sensitive and can also be used on strings with multiple characters ➊.
The endswith() string method is used to check whether a string value ends with another specified string value. Enter the following into the interactive shell:
>>> 'Hello world!'.endswith('world!')
True
➋ >>> 'Hello world!'.endswith('world')
False
The string values must match perfectly. Notice that the lack of the exclamation mark in 'world' ➋ causes endswith() to return False.
As noted, we want the program to accept any response that starts with a C regardless of capitalization. This means that we want the file to be overwritten whether the user types c, continue, C, or another string that begins with C. We’ll use the string methods lower() and startswith() to make the program more flexible when taking user input:
19. # If the output file already exists, give the user a chance to quit:
20. if os.path.exists(outputFilename):
21. print('This will overwrite the file %s. (C)ontinue or (Q)uit?' %
(outputFilename))
22. response = input('> ')
23. if not response.lower().startswith('c'):
24. sys.exit()
On line 23, we take the first letter of the string and check whether it is a C using the startswith() method. The startswith() method that we use is case sensitive and checks for a lowercase 'c', so we use the lower() method to modify the response string’s capitalization to always be lowercase. If the user didn’t enter a response starting with a C, then startswith() returns False, which makes the if statement evaluate to True (because of the not in the if statement), and sys.exit() is called to end the program. Technically, the user doesn’t have to enter Q to quit; any string that doesn’t begin with C causes the sys.exit() function to be called to quit the program.
On line 27, we start using the file object methods discussed at the beginning of this chapter.
26. # Read in the message from the input file:
27. fileObj = open(inputFilename)
28. content = fileObj.read()
29. fileObj.close()
30.
31. print('%sing...' % (myMode.title()))
Lines 27 to 29 open the file stored in inputFilename, read its contents into the content variable, and then close the file. After reading in the file, line 31 outputs a message for the user telling them that the encryption or decryption has begun. Because myMode should either contain the string 'encrypt' or 'decrypt', calling the title() string method capitalizes the first letter of the string in myMode and splices the string into the '%sing' string, so it displays either 'Encrypting...' or 'Decrypting...'.
Encrypting or decrypting an entire file can take much longer than a short string. A user might want to know how long the process takes for a file. We can measure the length of the encryption or decryption process by using the time module.
The time.time() function returns the current time as a float value of the number of seconds since January 1, 1970. This moment is called the Unix Epoch. Enter the following into the interactive shell to see how this function works:
>>> import time
>>> time.time()
1540944000.7197928
>>> time.time()
1540944003.4817972
Because time.time() returns a float value, it can be precise to a millisecond (that is, 1/1000 of a second). Of course, the numbers that time.time() displays depend on the moment in time that you call this function and may be difficult to interpret. It might not be clear that 1540944000.7197928 is Tuesday, October 30, 2018, at approximately 5 pm. However, the time.time() function is useful for comparing the number of seconds between calls to time.time(). We can use this function to determine how long a program has been running.
For example, if you subtract the floating-point values returned when I called time.time() previously in the interactive shell, you would get the amount of time in between those calls while I was typing:
>>> 1540944003.4817972 - 1540944000.7197928
2.7620043754577637
If you need to write code that handles dates and times, see https://www.nostarch.com/crackingcodes/ for information on the datetime module.
On line 34, time.time() returns the current time to store in a variable named startTime. Lines 35 to 38 call encryptMessage() or decryptMessage(), depending on whether 'encrypt' or 'decrypt' is stored in the myMode variable.
33. # Measure how long the encryption/decryption takes:
34. startTime = time.time()
35. if myMode == 'encrypt':
36. translated = transpositionEncrypt.encryptMessage(myKey, content)
37. elif myMode == 'decrypt':
38. translated = transpositionDecrypt.decryptMessage(myKey, content)
39. totalTime = round(time.time() - startTime, 2)
40. print('%sion time: %s seconds' % (myMode.title(), totalTime))
Line 39 calls time.time() again after the program decrypts or encrypts and subtracts startTime from the current time. The result is the number of seconds between the two calls to time.time(). The time.time() - startTime expression evaluates to a value that is passed to the round() function, which rounds to the nearest two decimal points, because we don’t need millisecond precision for the program. This value is stored in totalTime. Line 40 uses string splicing to print the program mode and displays to the user the amount of time it took for the program to encrypt or decrypt.
The encrypted (or decrypted) file contents are now stored in the translated variable. But this string is forgotten when the program terminates, so we want to store the string in a file to have even after the program has finished running. The code on lines 43 to 45 does this by opening a new file (and passing 'w' to the open() function) and then calling the write() file object method:
42. # Write out the translated message to the output file:
43. outputFileObj = open(outputFilename, 'w')
44. outputFileObj.write(translated)
45. outputFileObj.close()
Then, lines 47 and 48 print more messages to the user indicating that the process is done and the name of the written file:
47. print('Done %sing %s (%s characters).' % (myMode, inputFilename,
len(content)))
48. print('%sed file is %s.' % (myMode.title(), outputFilename))
Line 48 is the last line of the main() function.
Lines 53 and 54 (which are executed after the def statement on line 6 is executed) call the main() function if this program is being run instead of being imported:
51. # If transpositionCipherFile.py is run (instead of imported as a module),
52. # call the main() function:
53. if __name__ == '__main__':
54. main()
This is explained in detail in “The __name__ Variable” on page 95.
Congratulations! There wasn’t much to the transpositionFileCipher.py program aside from the open(), read(), write(), and close() functions, which let us encrypt large text files on a hard drive. You learned how to use the os.path.exists() function to check whether a file already exists. As you’ve seen, we can extend our programs’ capabilities by importing their functions for use in new programs. This greatly increases our ability to use computers to encrypt information.
You also learned some useful string methods to make a program more flexible when accepting user input and how to use the time module to measure how fast your program runs.
Unlike the Caesar cipher program, the transposition file cipher has too many possible keys to attack by simply using brute force. But if we can write a program that recognizes English (as opposed to strings of gibberish), the computer could examine the output of thousands of decryption attempts and determine which key can successfully decrypt a message to English. You’ll learn how to do this in Chapter 11.