Note: The second edition of this book is available under the title Cracking Codes with Python
“Why do security police grab people and torture them? To get their information. If you build an information management system that concentrates information from dozens of people, you’ve made that dozens of times more attractive. You’ve focused the repressive regime’s attention on the hard disk. And hard disks put up no resistance to torture. You need to give the hard disk a way to resist. That’s cryptography.”
Up until now our programs have only worked on small messages that we type directly into the source code as string values. The cipher program in this chapter will use the transposition cipher to encrypt and decrypt entire files, which can be millions of characters in size.
This program will encrypt and decrypt plain text files. These are the kind of files that only have text data and usually have the .txt file extension. Files from word processing programs that let you change the font, color, or size of the text do not produce plain text files. You can write your own text files using Notepad (on Windows), TextMate or TextEdit (on OS X), or gedit (on Linux) or a similar plain text editor program. You can even use IDLE’s own file editor and save the files with a .txt extension instead of the usual .py extension.
For some samples, you can download the following text files from this book’s website:
These are text files of some books (that are now in the public domain, so it is perfectly legal to download them.) For example, download Mary Shelley’s classic novel “Frankenstein” from http://invpy.com/frankenstein.txt. Double-click the file to open it in a text editor program. There are over 78,000 words in this text file! It would take some time to type this into our encryption program. But if it is in a file, the program can read the file and do the encryption in a couple seconds.
If you get an error that looks like “UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 148: character maps to <undefined>” then you are running the cipher program on a non-plain text file, also called a “binary file”.
To find other public domain texts to download, go to the Project Gutenberg website at http://www.gutenberg.org/.
Like our transposition cipher testing program, the transposition cipher file program will import our transpositionEncrypt.py and transpositionDecrypt.py files so we can use the encryptMessage() and decryptMessage() functions in them. This way we don’t have to re-type the code for these functions in our new program.
Open a new file editor window by clicking on File ► New Window. Type in the following code into the file editor, and then save it as transpositionFileCipher.py. Press F5 to run the program. Note that first you will need to download frankenstein.txt and place this file in the same directory as the transpositionFileCipher.py file. You can download this file from http://invpy.com/frankenstein.txt.
In the directory that frankenstein.txt and transpositionFileCipher.py files are in, there will be a new file named frankenstein.encrypted.txt that contains the content of frankenstein.txt in encrypted form. If you double-click the file to open it, it should look something like this:
To decrypt, make the following changes to the source code (written in bold) and run the transposition cipher program again:
This time when you run the program a new file will appear in the folder named frankenstein.decrypted.txt that is identical to the original frankenstein.txt file.
When you run the above program, it produces this output:
A new frankenstein.encrypted.txt file will have been created in the same directory as transpositionFileCipher.py. If you open this file with IDLE’s file editor, you will see the encrypted contents of frankenstein.py. You can now email this encrypted file to someone for them to decrypt.
Up until now, any input we want to give our programs would have to be typed in by the user. Python programs can open and read files directly off of the hard drive. There are three steps to reading the contents of a file: opening the file, reading into a variable, and then closing the file.
The open() function’s first parameter is a string for the name of the file to open. If the file is in the same directory as the Python program then you can just type in the name, such as 'thetimemachine.txt'. You can always specify the absolute path of the file, which includes the directory that it is in. For example, 'c:\\Python32\\frankenstein.txt' (on Windows) and '/usr/foobar/frankenstein.txt' (on OS X and Linux) are absolute filenames. (Remember that the \ backslash must be escaped with another backslash before it.)
The open() function returns a value of the “file object” data type. This value has several methods for reading from, writing to, and closing the file.
The read() method will return a string containing all the text in the file. For example, say the file spam.txt contained the text “Hello world!”. (You can create this file yourself using IDLE’s file editor. Just save the file with a .txt extension.) Run the following from the interactive shell (this codes assumes you are running Windows and the spam.txt file is in the c:\ directory):
If your text file has multiple lines, the string returned by read() will have \n newline characters in it at the end of each line. When you try to print a string with newline characters, the string will print across several lines:
If you get an error message that says “IOError: [Errno 2] No such file or directory” then double check that you typed the filename (and if it is an absolute path, the directory name) correctly. Also make sure that the file actually is where you think it is.
After you have read the file’s contents into a variable, you can tell Python that you are done with the file by calling the close() method on the file object.
Python will automatically close any open files when the program terminates. But when you want to re-read the contents of a file, you must close the file object and then call the open() function on the file again.
Here’s the code in our transposition cipher program that reads the file whose filename is stored in the inputFilename variable:
We read the original file and now will write the encrypted (or decrypted) form to a different file. The file object returned by open() has a write() function, although you can only use this function if you open the file in “write” mode instead of “read” mode. You do this by passing the string value 'w' as the second parameter. For example:
Along with “read” and “write”, there is also an “append” mode. The “append” is like “write” mode, except any strings written to the file will be appended to the end of any content that is already in the file. “Append” mode will not overwrite the file if it already exists. To open a file in append mode, pass the string 'a' as the second argument to open().
(Just in case you were curious, you could pass the string 'r' to open() to open the file in read mode. But since passing no second argument at all also opens the file in read mode, there’s no reason to pass 'r'.)
You can write text to a file by calling the file object’s write() method. The file object must have been opened in write mode, otherwise, you will get a “io.UnsupportedOperation: not readable” error message. (And if you try to call read() on a file object that was opened in write mode, you will get a “io.UnsupportedOperation: not readable” error message.)
The write() method takes one argument: a string of text that is to be written to the file. Lines 43 to 45 open a file in write mode, write to the file, and then close the file object.
Now that we have the basics of reading and writing files, let’s look at the source code to the transposition file cipher program.
The first part of the program should look familiar. Line 4 is an import statement for our transpositionEncrypt.py and transpositionDecrypt.py programs. It also imports the Python’s time, os, and sys modules.
The main() function will be called after the def statements have been executed to define all the functions in the program. The inputFilename variable holds a string of the file to read, and the encrypted (or decrypted) text is written to the file with the name in outputFilename.
The transposition cipher uses an integer for a key, stored in myKey. If 'encrypt' is stored in myMode, the program will encrypt the contents of the inputFilename file. If 'decrypt' is stored in myMode, the contents of inputFilename will be decrypted.
Reading files is always harmless, but we need to be careful when writing files. If we call the open() function in write mode with a filename that already exists, that file will first be deleted to make way for the new file. This means we could accidentally erase an important file if we pass the important file’s name to the open() function. Using the os.path.exists() function, we can check if a file with a certain filename already exists.
The os.path.exists() file has a single string parameter for the filename, and returns True if this file already exists and False if it doesn’t. The os.path.exists() function exists inside the path module, which itself exists inside the os module. But if we import the os module, the path module will be imported too.
Try typing the following into the interactive shell:
(Of course, you will only get the above results if you are running Python on Windows. The calc.exe file does not exist on OS X or Linux.)
We use the os.path.exists() function to check that the filename in inputFilename actually exists. Otherwise, we have no file to encrypt or decrypt. In that case, we display a message to the user and then quit the program.
If the file the program will write to already exists, the user is asked to type in “C” if they want to continue running the program or “Q” to quit the program.
The string in the response variable will have lower() called on it, and the returned string from lower() will have the string method startswith() called on it. The startswith() method will return True if its string argument can be found at the beginning of the string. Try typing the following into the interactive shell:
On line 23, if the user did not type in 'c', 'continue', 'C', or another string that begins with C, then sys.exit() will be called to end the program. Technically, the user doesn’t have to enter “Q” to quit; any string that does not begin with “C” will cause the sys.exit() function to be called to quit the program.
There is also an endswith() string method that can be used to check if a string value ends with another certain string value. Try typing the following into the interactive shell:
Just like the lower() and upper() string methods will return a string in lowercase or uppercase, the title() string method returns a string in “title case”. Title case is where every word is uppercase for the first character and lowercase for the rest of the characters. Try typing the following into the interactive shell:
Lines 27 to 29 open the file with the name stored in inputFilename and read in its contents into the content variable. On line 31, we display a message telling the user that the encryption or decryption has begun. Since myMode should either contain the string 'encrypt' or 'decrypt', calling the title() string method will either display 'Encrypting...' or 'Decrypting...'.
All computers have a clock that keeps track of the current date and time. Your Python programs can access this clock by calling the time.time() function. (This is a function named time() that is in a module named time.)
The time.time() function will return a float value of the number of seconds since January 1st, 1970. This moment is called the Unix Epoch. Try typing the following into the interactive shell:
The float value shows that the time.time() function can be precise down to a millisecond (that is, 1/1,000 of a second). Of course, the numbers that time.time() displays for you will depend on the moment in time that you call this function. It might not be clear that 1349411356.892 is Thursday, October 4th, 2012 around 9:30 pm. However, the time.time() function is useful for comparing the number of seconds between calls to time.time(). We can use this function to determine how long our program has been running.
We want to measure how long the encryption or decryption process takes for the contents of the file. Lines 35 to 38 call the encryptMessage() or decryptMessage() (depending on whether 'encrypt' or 'decrypt' is stored in the myMode variable). Before this code however, we will call time.time() and store the current time in a variable named startTime.
On line 39 after the encryption or decryption function calls have returned, we will call time.time() again and subtract startTime from it. This will give us the number of seconds between the two calls to time.time().
For example, if you subtract the floating point values returned when I called time.time() before in the interactive shell, you would get the amount of time in between those calls while I was typing:
(The difference Python calculated between the two floating point values is not precise due to rounding errors, which cause very slight inaccuracies when doing math with floats. For our programs, it will not matter. But you can read more about rounding errors at http://invpy.com/rounding.)
The time.time() - startTime expression evaluates to a value that is passed to the round() function which rounds to the nearest two decimal points. This value is stored in totalTime. On line 40, the amount of time is displayed to the user by calling print().
The encrypted (or decrypted) file contents are now stored in the translated variable. But this string will be forgotten when the program terminates, so we want to write the string out to a file to store it on the hard drive. The code on lines 43 to 45 do this by opening a new file (passing 'w' to open() to open the file in write mode) and then calling the write() file object method.
Afterwards, we print some more messages to the user telling them that the process is done and what the name of the written file is. Line 48 is the last line of the main() function.
Lines 53 and 54 (which get executed after the def statement on line 6 is executed) will call the main() function if this program is being run instead of being imported. (This is explained in Chapter 8’s “The Special __name__ Variable” section.)
Practice exercises can be found at http://invpy.com/hackingpractice11A.
Congratulations! There wasn’t much to this new program aside from the open(), write(), read(), and close() functions, but this lets us encrypt text files on our hard drive that are megabytes or gigabytes in size. It doesn’t take much new code because all of the implementation for the cipher has already been written. We can extend our programs (such as adding file reading and writing capabilities) by importing their functions for use in new programs. This greatly increases our ability to use computers to encrypt information.
There are too many possible keys to simply brute-force and examine the output of a message encrypted with the transposition cipher. But if we can write a program that recognizes English (as opposed to strings of gibberish), we can have the computer examine the output of thousands of decryption attempts and determine which key can successfully decrypt a message to English.