“The internet is the most liberating tool for humanity ever invented, and also the best for surveillance. It’s not one or the other. It’s both.”
—John Perry Barlow, co-founder of the Electronic Frontier Foundation
In Chapter 15, you learned that the affine cipher has about a thousand possible keys but that computers can still brute-force through all of them easily. We need a cipher that has so many possible keys that no computer can brute-force through them all.
The simple substitution cipher is one such cipher that is effectively invulnerable to a brute-force attack because it has an enormous number of possible keys. Even if your computer could try a trillion keys every second, it would still take 12 million years for it to try every one! In this chapter, you’ll write a program to implement the simple substitution cipher and learn some useful Python functions and string methods as well.
To implement the simple substitution cipher, we choose a random letter to encrypt each letter of the alphabet, using each letter only once. The key for the simple substitution cipher is always a string of 26 letters of the alphabet in random order. There are 403,291,461,126,605,635,584,000,000 different possible key orderings for the simple substitution cipher. That’s a lot of keys! More important, this number is so large that it’s impossible to brute-force. (To see how this number was calculated, go to https://www.nostarch.com/crackingcodes/.)
Let’s try using the simple substitution cipher with paper and pencil first. For this example, we’ll encrypt the message “Attack at dawn.” using the key VJZBGNFEPLITMXDWKQUCRYAHSO. First, write out the letters of the alphabet and the corresponding key underneath each letter, as in Figure 16-1.
Figure 16-1: Encryption letters for the example key
To encrypt a message, find the letter in the plaintext in the top row and substitute it with the letter in the bottom row. A encrypts to V, T encrypts to C, C encrypts to Z, and so on. So the message “Attack at dawn.” encrypts to “Vccvzi vc bvax.”
To decrypt the encrypted message, find the letter in the ciphertext in the bottom row and replace it with the corresponding letter in the top row. V decrypts to A, C decrypts to T, Z decrypts to C, and so on.
Unlike the Caesar cipher, in which the bottom row shifts but remains in alphabetical order, in the simple substitution cipher the bottom row is completely scrambled. This results in far more possible keys, which is a huge advantage of using the simple substitution cipher. The disadvantage is that the key is 26 characters long and more difficult to memorize. You may need to write down the key, but if you do, make sure no one else ever reads it!
Open a new file editor window by selecting File▸New File. Enter the following code into the file editor and save it as simpleSubCipher.py. Be sure to place the pyperclip.py file in the same directory as the simpleSubCipher.py file. Press F5 to run the program.
simpleSub
Cipher.py
1. # Simple Substitution Cipher
2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
3.
4. import pyperclip, sys, random
5.
6.
7. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
8.
9. def main():
10. myMessage = 'If a man is offered a fact which goes against his
instincts, he will scrutinize it closely, and unless the evidence
is overwhelming, he will refuse to believe it. If, on the other
hand, he is offered something which affords a reason for acting
in accordance to his instincts, he will accept it even on the
slightest evidence. The origin of myths is explained in this way.
-Bertrand Russell'
11. myKey = 'LFWOAYUISVKMNXPBDCRJTQEGHZ'
12. myMode = 'encrypt' # Set to 'encrypt' or 'decrypt'.
13.
14. if keyIsValid(myKey):
15. sys.exit('There is an error in the key or symbol set.')
16. if myMode == 'encrypt':
17. translated = encryptMessage(myKey, myMessage)
18. elif myMode == 'decrypt':
19. translated = decryptMessage(myKey, myMessage)
20. print('Using key %s' % (myKey))
21. print('The %sed message is:' % (myMode))
22. print(translated)
23. pyperclip.copy(translated)
24. print()
25. print('This message has been copied to the clipboard.')
26.
27.
28. def keyIsValid(key):
29. keyList = list(key)
30. lettersList = list(LETTERS)
31. keyList.sort()
32. lettersList.sort()
33.
34. return keyList == lettersList
35.
36.
37. def encryptMessage(key, message):
38. return translateMessage(key, message, 'encrypt')
39.
40.
41. def decryptMessage(key, message):
42. return translateMessage(key, message, 'decrypt')
43.
44.
45. def translateMessage(key, message, mode):
46. translated = ''
47. charsA = LETTERS
48. charsB = key
49. if mode == 'decrypt':
50. # For decrypting, we can use the same code as encrypting. We
51. # just need to swap where the key and LETTERS strings are used.
52. charsA, charsB = charsB, charsA
53.
54. # Loop through each symbol in the message:
55. for symbol in message:
56. if symbol.upper() in charsA:
57. # Encrypt/decrypt the symbol:
58. symIndex = charsA.find(symbol.upper())
59. if symbol.isupper():
60. translated += charsB[symIndex].upper()
61. else:
62. translated += charsB[symIndex].lower()
63. else:
64. # Symbol is not in LETTERS; just add it:
65. translated += symbol
66.
67. return translated
68.
69.
70. def getRandomKey():
71. key = list(LETTERS)
72. random.shuffle(key)
73. return ''.join(key)
74.
75.
76. if __name__ == '__main__':
77. main()
When you run the simpleSubCipher.py program, the encrypted output should look like this:
Using key LFWOAYUISVKMNXPBDCRJTQEGHZ
The encrypted message is:
Sy l nlx sr pyyacao l ylwj eiswi upar lulsxrj isr sxrjsxwjr, ia esmm
rwctjsxsza sj wmpramh, lxo txmarr jia aqsoaxwa sr pqaceiamnsxu, ia esmm caytra
jp famsaqa sj. Sy, px jia pjiac ilxo, ia sr pyyacao rpnajisxu eiswi lyypcor
l calrpx ypc lwjsxu sx lwwpcolxwa jp isr sxrjsxwjr, ia esmm lwwabj sj aqax
px jia rmsuijarj aqsoaxwa. Jia pcsusx py nhjir sr agbmlsxao sx jisr elh.
-Facjclxo Ctrramm
This message has been copied to the clipboard.
Notice that if the letter in the plaintext is lowercase, it’s lowercase in the ciphertext. Likewise, if the letter is uppercase in the plaintext, it’s uppercase in the ciphertext. The simple substitution cipher doesn’t encrypt spaces or punctuation marks and simply returns those characters as is.
To decrypt this ciphertext, paste it as the value for the myMessage variable on line 10 and change myMode to the string 'decrypt'. When you run the program again, the decryption output should look like this:
Using key LFWOAYUISVKMNXPBDCRJTQEGHZ
The decrypted message is:
If a man is offered a fact which goes against his instincts, he will
scrutinize it closely, and unless the evidence is overwhelming, he will refuse
to believe it. If, on the other hand, he is offered something which affords
a reason for acting in accordance to his instincts, he will accept it even
on the slightest evidence. The origin of myths is explained in this way.
-Bertrand Russell
This message has been copied to the clipboard.
Let’s look at the first lines of simple substitution cipher program’s source code.
1. # Simple Substitution Cipher
2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
3.
4. import pyperclip, sys, random
5.
6.
7. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
Line 4 imports the pyperclip, sys, and random modules. The LETTERS constant variable is set to a string of all the uppercase letters, which is the symbol set for the simple substitution cipher program.
The main() function in simpleSubCipher.py, which is similar to the main() function of cipher programs in the previous chapters, is called when the program is first run. It contains the variables that store the message, key, and mode used for the program.
9. def main():
10. myMessage = 'If a man is offered a fact which goes against his
instincts, he will scrutinize it closely, and unless the evidence
is overwhelming, he will refuse to believe it. If, on the other
hand, he is offered something which affords a reason for acting
in accordance to his instincts, he will accept it even on the
slightest evidence. The origin of myths is explained in this way.
-Bertrand Russell'
11. myKey = 'LFWOAYUISVKMNXPBDCRJTQEGHZ'
12. myMode = 'encrypt' # Set to 'encrypt' or 'decrypt'.
The keys for simple substitution ciphers are easy to get wrong because they’re fairly long and need to have every letter in the alphabet. For example, it’s easy to enter a key that is missing a letter or a key that has the same letter twice. The keyIsValid() function makes sure the key is usable by the encryption and decryption functions, and the function exits the program with an error message if the key is not valid:
14. if keyIsValid(myKey):
15. sys.exit('There is an error in the key or symbol set.')
If line 14 returns False from keyIsValid(), then myKey contains an invalid key and line 15 terminates the program.
Lines 16 through 19 check whether the myMode variable is set to 'encrypt' or 'decrypt' and calls either encryptMessage() or decryptMessage() accordingly:
16. if myMode == 'encrypt':
17. translated = encryptMessage(myKey, myMessage)
18. elif myMode == 'decrypt':
19. translated = decryptMessage(myKey, myMessage)
The return value of encryptMessage() and decryptMessage() is a string of the encrypted or decrypted message that is stored in the translated variable.
Line 20 prints the key that was used to the screen. The encrypted or decrypted message is printed to the screen and also copied to the clipboard.
20. print('Using key %s' % (myKey))
21. print('The %sed message is:' % (myMode))
22. print(translated)
23. pyperclip.copy(translated)
24. print()
25. print('This message has been copied to the clipboard.')
Line 25 is the last line of code in the main() function, so the program execution returns after line 25. When the main() call is done on the last line of the program, the program exits.
Next, we’ll look at how the keyIsValid() function uses the sort() method to test whether the key is valid.
Lists have a sort() method that rearranges the list’s items into numerical or alphabetical order. This ability to sort items in a list comes in handy when you have to check whether two lists contain the same items but don’t list them in the same order.
In simpleSubCipher.py, a simple substitution key string value is valid only if it has each of the characters in the symbol set with no duplicate or missing letters. We can check whether a string value is a valid key by sorting it and checking whether it’s equal to the sorted LETTERS. But because we can sort only lists, not strings (recall that strings are immutable, meaning their values cannot be changed), we’ll obtain list versions of the string values by passing them to list(). Then, after sorting these lists, we can compare the two to see whether or not they’re equal. Although LETTERS is already in alphabetical order, we’ll sort it because we’ll expand it to contain other characters later on.
28. def keyIsValid(key):
29. keyList = list(key)
30. lettersList = list(LETTERS)
31. keyList.sort()
32. lettersList.sort()
The string in key is passed to list() on line 29. The list value returned is stored in a variable named keyList.
On line 30, the LETTERS constant variable (which contains the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') is passed to list(), which returns the list in the following format: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'].
On lines 31 and 32, the lists in keyList and lettersList are then sorted in alphabetical order by calling the sort() list method on them. Note that similar to the append() list method, the sort() list method modifies the list in place and doesn’t have a return value.
When sorted, the keyList and lettersList values should be the same, because keyList was simply the characters in LETTERS with the order scrambled. Line 34 checks whether the values keyList and lettersList are equal:
34. return keyList == lettersList
If keyList and lettersList are equal, you can be sure that keyList and the key parameter don’t have any duplicated characters, because LETTERS doesn’t have duplicates in it. In that case, line 34 returns True. But if keyList and lettersList don’t match, the key is invalid and line 34 returns False.
The encryption code and the decryption code in the simpleSubCipher.py program are almost identical. When you have two very similar pieces of code, it’s best to put them into a function and call it twice rather than enter the code twice. Not only does this save time, but more important, it avoids introducing bugs while copying and pasting code. It’s also advantageous because if there’s ever a bug in the code, you only have to fix the bug in one place instead of in multiple places.
Wrapper functions help you avoid having to enter duplicate code by wrapping the code of another function and returning the value the wrapped function returns. Often, the wrapper function makes a slight change to the arguments or return value of the wrapped function. Otherwise, there would be no need for wrapping because you could just call the function directly.
Let’s look at an example of using wrapper functions in our code to understand how they work. In this case, encryptMessage() and decryptMessage() on lines 37 and 41 are the wrapper functions:
37. def encryptMessage(key, message):
38. return translateMessage(key, message, 'encrypt')
39.
40.
41. def decryptMessage(key, message):
42. return translateMessage(key, message, 'decrypt')
Each of these wrapper functions calls translateMessage(), which is the wrapped function, and returns the value that translateMessage() returns. (We’ll look at the translateMessage() function in the next section.) Because both wrapper functions use the same translateMessage() function, we need to modify only that one function instead of the encryptMessage() and decryptMessage() functions if we need to make any changes to the cipher.
With these wrapper functions, someone who imports the program simpleSubCipher.py can call the functions named encryptMessage() and decryptMessage() just as they can with all the other cipher programs in this book. The wrapper functions have clear names that tell others who use the functions what they do without having to look at the code. As a result, if we want to share our code, others can use it more easily.
Other programs can encrypt a message in various ciphers by importing the cipher programs and calling their encryptMessage() functions, as shown here:
import affineCipher, simpleSubCipher, transpositionCipher
--snip--
ciphertext1 = affineCipher.encryptMessage(encKey1, 'Hello!')
ciphertext2 = transpositionCipher.encryptMessage(encKey2, 'Hello!')
ciphertext3 = simpleSubCipher.encryptMessage(encKey3, 'Hello!')
Naming consistency is helpful, because it makes it easier for someone familiar with one of the cipher programs to use the other cipher programs. For example, you can see that the first parameter is always the key and the second parameter is always the message, which is the convention used for most of the cipher programs in this book. Using the translateMessage() function instead of separate encryptMessage() and decryptMessage() functions would be inconsistent with the other programs.
Let’s look at the translateMessage() function next.
The translateMessage() function is used for both encryption and decryption.
45. def translateMessage(key, message, mode):
46. translated = ''
47. charsA = LETTERS
48. charsB = key
49. if mode == 'decrypt':
50. # For decrypting, we can use the same code as encrypting. We
51. # just need to swap where the key and LETTERS strings are used.
52. charsA, charsB = charsB, charsA
Notice that translateMessage() has the parameters key and message but also a third parameter named mode. When we call translateMessage(), the call in the encryptMessage() function passes 'encrypt' for the mode parameter, and the call in the decryptMessage() function passes 'decrypt'. This is how the translateMessage() function knows whether it should encrypt or decrypt the message passed to it.
The actual encryption process is simple: for each letter in the message parameter, the function looks up that letter’s index in LETTERS and replaces the character with the letter at that same index in the key parameter. Decryption does the opposite: it looks up the index in key and replaces the character with the letter at the same index in LETTERS.
Instead of using LETTERS and key, the program uses the variables charsA and charsB, which allow it to replace the letter in charsA with the letter at the same index in charsB. Being able to change which values are assigned to charsA and charsB makes it easy for the program to switch between encrypting and decrypting. Line 47 sets the characters in charsA to the characters in LETTERS, and line 48 sets the characters in charsB to the characters in key.
The following figures show how the same code can be used to either encrypt or decrypt a letter. Figure 16-2 illustrates the encryption process. The top row in this figure shows the characters in charsA (set to LETTERS), the middle row shows the characters in charsB (set to key), and the bottom row shows the integer indexes corresponding to the characters.
Figure 16-2: Using the index to encrypt plaintext
The code in translateMessage() always looks up the message character’s index in charsA and replaces it with the corresponding character in charsB at that index. So to encrypt, we just leave charsA and charsB as they are. Using the variables charsA and charsB replaces the character in LETTERS with the character in key, because charsA is set to LETTERS and charsB is set to key.
To decrypt, the values in charsA and charsB are switched using charsA, charsB = charsB, charsA on line 52. Figure 16-3 shows the decryption process.
Figure 16-3: Using the index to decrypt ciphertext
Keep in mind that the code in translateMessage() always replaces the character in charsA with the character at that same index in charsB. So when line 52 swaps the values, the code in translateMessage() does the decryption process instead of the encryption process.
The next lines of code show how the program finds the index to use for encryption and decryption.
54. # Loop through each symbol in the message:
55. for symbol in message:
56. if symbol.upper() in charsA:
57. # Encrypt/decrypt the symbol:
58. symIndex = charsA.find(symbol.upper())
The for loop on line 55 sets the symbol variable to a character in the message string on each iteration through the loop. If the uppercase form of this symbol exists in charsA (recall that key and LETTERS have only uppercase characters in them), line 58 finds the index of the uppercase form of symbol in charsA. The symIndex variable stores this index.
We already know that the find() method would never return -1 (a -1 from the find() method means the argument could not be found in the string) because the if statement on line 56 guarantees that symbol.upper() exists in charsA. Otherwise, line 58 wouldn’t have been executed.
Next, we’ll use each encrypted or decrypted symbol to build the string that is returned by the translateMessage() function. But because key and LETTERS are both only in uppercase, we’ll need to check whether the original symbol in message was lowercase and then adjust the decrypted or encrypted symbol to lowercase if it was. To do this, you need to learn two string methods: isupper() and islower().
The isupper() and islower() methods check whether a string is in uppercase or lowercase.
More specifically, the isupper() string method returns True if both of these conditions are met:
The string has at least one uppercase letter.
The string does not have any lowercase letters in it.
The islower() string method returns True if both of these conditions are met:
The string has at least one lowercase letter.
The string does not have any uppercase letters in it.
Non-letter characters in the string don’t affect whether these methods return True or False, although both methods evaluate to False if only non-letter characters exist in the string. Enter the following into the interactive shell to see how these methods work:
>>> 'HELLO'.isupper()
True
➊ >>> 'HELLO WORLD 123'.isupper()
True
➋ >>> 'hello'.islower()
True
>>> '123'.isupper()
False
>>> ''.islower()
False
The example at ➊ returns True because 'HELLO WORLD 123' has at least one uppercase letter in it and no lowercase letters. The numbers in that string don’t affect the evaluation. At ➋, 'hello'.islower() returns True because the string 'hello' has at least one lowercase letter in it and no uppercase letters.
Let’s return to our code to see how it uses the isupper() and islower() string methods.
The simpleSubCipher.py program uses the isupper() and islower() string methods to help ensure that the cases of the plaintext are reflected in the ciphertext.
59. if symbol.isupper():
60. translated += charsB[symIndex].upper()
61. else:
62. translated += charsB[symIndex].lower()
Line 59 tests whether symbol has an uppercase letter. If it does, line 60 concatenates the uppercase version of the character at charsB[symIndex] to translated. This results in the uppercase version of the key character corresponding to the uppercase input. If symbol instead has a lowercase letter, line 62 concatenates the lowercase version of the character at charsB[symIndex] to translated.
If symbol is not a character in the symbol set, such as '5' or '?', line 59 would return False, and line 62 would execute instead of line 60. The reason is that the conditions for isupper() wouldn’t be met because those strings don’t have at least one uppercase letter. In this case, the lower() method call on line 62 would have no effect on the string because it has no letters at all. The lower() method doesn’t change non-letter characters like '5' and '?'. It simply returns the original non-letter characters.
Line 62 in the else block accounts for any lowercase characters and non-letter characters in our symbol string.
The indentation on line 63 indicates that the else statement is paired with the if symbol.upper() in charsA: statement on line 56, so line 63 executes if symbol is not in LETTERS.
63. else:
64. # Symbol is not in LETTERS; just add it:
65. translated += symbol
If symbol is not in LETTERS, line 65 executes. This means we cannot encrypt or decrypt the character in symbol, so we simply concatenate it to the end of translated as is.
At the end of the translateMessage() function, line 67 returns the value in the translated variable, which contains the encrypted or decrypted message:
67. return translated
Next, we’ll look at how to use the getRandomKey() function to generate a valid key for the simple substitution cipher.
Typing a string for a key that contains each letter of the alphabet can be difficult. To help us with this, the getRandomKey() function returns a valid key to use. Lines 71 to 73 randomly scramble the characters in the LETTERS constant.
70. def getRandomKey():
71. key = list(LETTERS)
72. random.shuffle(key)
73. return ''.join(key)
NOTE
Read “Randomly Scrambling a String” on page 123 for an explanation of how to scramble a string using the list(), random.shuffle(), and join() methods.
To use the getRandomKey() function, we need to change line 11 from myKey = 'LFWOAYUISVKMNXPBDCRJTQEGHZ' to this:
11. myKey = getRandomKey()
Because line 20 in our simple substitution cipher program prints the key being used, you’ll be able to see the key the getRandomKey() function returned.
Lines 76 and 77 at the end of the program call main() if simpleSubCipher.py is being run as a program instead of being imported as a module by another program.
76. if __name__ == '__main__':
77. main()
This concludes our study of the simple substitution cipher program.
In this chapter, you learned how to use the sort() list method to order items in a list and how to compare two ordered lists to check for duplicate or missing characters from a string. You also learned about the isupper() and islower() string methods, which check whether a string value is made up of uppercase or lowercase letters. You learned about wrapper functions, which are functions that call other functions, usually adding only slight changes or different arguments.
The simple substitution cipher has far too many possible keys to brute-force through. This makes it impervious to the techniques you used to hack previous cipher programs. You’ll have to make smarter programs to break this code.
In Chapter 17, you’ll learn how to hack the simple substitution cipher. Instead of brute-forcing through all the keys, you’ll use a more intelligent and sophisticated algorithm.