Note: The second edition of this book is available under the title Cracking Codes with Python
We can hack the Caesar cipher by using a cryptanalytic technique called “brute-force”. Because our code breaking program is so effective against the Caesar cipher, you shouldn’t use it to encrypt your secret information.
Ideally, the ciphertext would never fall into anyone’s hands. But Kerckhoffs’s Principle (named after the19th-century cryptographer Auguste Kerckhoffs) says that a cipher should still be secure even if everyone else knows how the cipher works and has the ciphertext (that is, everything except the key). This was restated by the 20th century mathematician Claude Shannon as Shannon’s Maxim: “The enemy knows the system.”
Figure 7-1. Auguste Kerckhoffs
January 19, 1835 - August 9, 1903
Figure 7-2. Claude Shannon
April 30, 1916 - February 24, 2001
“A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.”
“The enemy knows the system.”
Nothing stops a cryptanalyst from guessing one key, decrypting the ciphertext with that key, looking at the output, and if it was not the correct key then moving on to the next key. The technique of trying every possible decryption key is called a brute-force attack. It isn’t a very sophisticated hack, but through sheer effort (which the computer will do for us) the Caesar cipher can be broken.
Open a new file editor window by clicking on File ► New Window. Type in the following code into the file editor, and then save it as caesarHacker.py. Press F5 to run the program. Note that first you will need to download the pyperclip.py module and place this file in the same directory as the caesarHacker.py file. You can download this file from http://invpy.com/pyperclip.py.
You will see that much of this code is the same as the code in the original Caesar cipher program. This is because the Caesar cipher hacker program does the same steps to decrypt the key.
Here is what the Caesar cipher program looks like when you run it. It is trying to break the ciphertext, “GUVF VF ZL FRPERG ZRFFNTR.” Notice that the decrypted output for key 13 is plain English, so the original encryption key must have been 13.
The hacker program will create a message variable that stores the ciphertext string the program tries to decrypt. The LETTERS constant variable contains every character that can be encrypted with the cipher. The value for LETTERS needs to be exactly the same as the value for LETTERS used in the Caesar cipher program that encrypted the ciphertext we are trying to hack, otherwise the hacker program won’t work.
Line 8 is a for loop that does not iterate over a string value, but instead iterates over the return value from a call to a function named range(). The range() function takes one integer argument and returns a value of the range data type. These range values can be used in for loops to loop a specific number of times. Try typing the following into the interactive shell:
More specifically, the range value returned from the range() function call will set the for loop’s variable to the integers 0 up to, but not including, the argument passed to range(). Try typing the following into the interactive shell:
Line 8 is a for loop that will set the key variable with the values 0 up to (but not including) 26. Instead of hard-coding the value 26 directly into our program, we use the return value from len(LETTERS) so that if we modify LETTERS the program will still work. See the “Encrypt Non-Letter Characters” section in the last chapter to read why.
So the first time the program execution goes through this loop, key will be set to 0 and the ciphertext in message will be decrypted with key 0. (The code inside the for loop does the decrypting.) On the next iteration of line 8’s for loop, key will be set to 1 for the decryption.
You can also pass two integer arguments to the range() function instead of just one. The first argument is where the range should start and the second argument is where the range should stop (up to but not including the second argument). The arguments are separated by a comma:
The range() call evaluates to a value of the “range object” data type.
On line 12, translated is set to the blank string. The decryption code on the next few lines adds the decrypted text to the end of the string in translated. It is important that we reset translated to the blank string at the beginning of this for loop, otherwise the decrypted text will be added to the decrypted text in translated from the last iteration in the loop.
Lines 17 to 31 are almost exactly the same as the code in the Caesar cipher program from the last chapter. It is slightly simpler, because this code only has to decrypt instead of decrypt or encrypt.
First we loop through every symbol in the ciphertext string stored in message on line 17. On each iteration of this loop, line 18 checks if symbol is an uppercase letter (that is, it exists in the LETTERS constant variable which only has uppercase letters) and, if so, decrypts it. Line 19 locates where symbol is in LETTERS with the find() method and stores it in a variable called num.
Then we subtract the key from num on line 20. (Remember, in the Caesar cipher, subtracting the key decrypts and adding the key encrypts.) This may cause num to be less than zero and require “wrap-around”. Line 23 checks for this case and adds 26 (which is what len(LETTERS) returns) if it was less than 0.
Now that num has been modified, LETTERS[num] will evaluate to the decrypted symbol. Line 27 adds this symbol to the end of the string stored in translated.
Of course, if the condition for line 18’s condition was False and symbol was not in LETTERS, we don’t decrypt the symbol at all. If you look at the indentation of line 29’s else statement, you can see that this else statement matches the if statement on line 18.
Line 31 just adds symbol to the end of translated unmodified.
Although line 34 is the only print() function call in our Caesar cipher hacker program, it will print out several lines because it gets called once per iteration of line 8’s for loop.
The argument for the print() function call is something we haven’t used before. It is a string value that makes use of string formatting (also called string interpolation). String formatting with the %s text is a way of placing one string inside another one. The first %s text in the string gets replaced by the first value in the parentheses after the % at the end of the string.
Type the following into the interactive shell:
String formatting is often easier to type than string concatenation with the + operator, especially for larger strings. And one benefit of string formatting is that, unlike string concatenation, you can insert non-string values such as integers into the string. Try typing the following into the interactive shell:
Line 34 uses string formatting to create a string that has the values in both the key and translated variables. Because key stores an integer value, we’ll use string formatting to put it in a string value that is passed to print().
Practice exercises can be found at http://invpy.com/hackingpractice7A.
The critical failure of the Caesar cipher is that there aren’t that many different possible keys that can be used to encrypt a message. Any computer can easily decrypt with all 26 possible keys, and it only takes the cryptanalyst a few seconds to look through them to find the one that is in English. To make our messages more secure, we will need a cipher that has more possible keys. That transposition cipher in the next chapter can provide this for us.