“Encryption is fundamentally a private act.
The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation’s border and the arm of its violence.”
—Eric Hughes, “A Cypherpunk’s Manifesto” (1993)
In Chapter 16, you learned that the simple substitution cipher is impossible to crack using brute force because it has too many possible keys. To hack the simple substitution cipher, we need to create a more sophisticated program that uses dictionary values to map the potential decryption letters of a ciphertext. In this chapter, we’ll write such a program to narrow down the list of potential decryption outputs to the right one.
In brute-force attacks, we try each possible key to check whether it can decrypt the ciphertext. If the key is correct, the decryption results in readable English. But by analyzing the ciphertext first, we can reduce the number of possible keys to try and maybe even find a full or partial key.
Let’s assume the original plaintext consists mostly of words in an English dictionary file, like the one we used in Chapter 11. Although a ciphertext won’t be made of real English words, it will still contain groups of letters broken up by spaces, just like words in regular sentences. We’ll call these cipherwords in this book. In a substitution cipher, every letter of the alphabet has exactly one unique corresponding encryption letter. We’ll call the letters in the ciphertext cipherletters. Because each plaintext letter can encrypt to only one cipherletter, and we’re not encrypting spaces in this version of the cipher, the plaintext and ciphertext will share the same word patterns.
For example, if we had the plaintext MISSISSIPPI SPILL, the corresponding ciphertext might be RJBBJBBJXXJ BXJHH. The number of letters in the first word of the plaintext and the first cipherword are the same. The same is true for the second plaintext word and the second cipherword. The plaintext and ciphertext share the same pattern of letters and spaces. Also notice that letters that repeat in the plaintext repeat the same number of times and in the same places as the ciphertext.
We could therefore assume that a cipherword corresponds to a word in the English dictionary file and that their word patterns would match. Then, if we can find which word in the dictionary the cipherword decrypts to, we can figure out the decryption of each cipherletter in that word. And if we figure out enough cipherletter decryptions using this technique, we may be able to decrypt the entire message.
Let’s examine the word pattern of the cipherword HGHHU. You can see that the cipherword has certain characteristics, which the original plaintext word must share. Both words must have the following in common.
They should be five letters long.
The first, third, and fourth letters should be the same.
They should have exactly three different letters; the first, second, and fifth letters should all be different.
Let’s think of words in the English language that fit this pattern. Puppy is one such word, which is five letters long (P, U, P, P, Y) and uses three different letters (P, U, Y) arranged in that same pattern (P for the first, third, and fourth letter; U for the second letter; and Y for the fifth letter). Mommy, bobby, lulls, and nanny fit the pattern, too. These words, along with any other word in the English dictionary file that matches the criteria, are all possible decryptions of HGHHU.
To represent a word pattern in a way the program can understand, we’ll make each pattern into a set of numbers separated by periods that indicates the pattern of letters.
Creating word patterns is easy: the first letter gets the number 0, and the first occurrence of each different letter thereafter gets the next number. For example, the word pattern for cat is 0.1.2, and the word pattern for classification is 0.1.2.3.3.4.5.4.0.2.6.4.7.8.
In simple substitution ciphers, no matter which key is used to encrypt, a plaintext word and its cipherword always have the same word pattern. The word pattern for the cipherword HGHHU is 0.1.0.0.2, which means the word pattern of the plaintext corresponding to HGHHU is also 0.1.0.0.2.
To decrypt HGHHU, we need to find all the words in an English dictionary file whose word pattern is also 0.1.0.0.2. In this book, we’ll call the plaintext words that have the same word pattern as the cipherword the candidates for that cipherword. Here is a list of candidates for HGHHU:
puppy
mommy
bobby
lulls
nanny
Using word patterns, we can guess which plaintext letters cipherletters might decrypt to, which we’ll call the cipherletter’s potential decryption letters. To crack a message encrypted with the simple substitution cipher, we need to find all the potential decryption letters of each word in the message and determine the actual decryption letters through the process of elimination. Table 17-1 lists the potential decryption letters for HGHHU.
Table 17-1: Potential Decryption Letters of the Cipherletters in HGHHU
Cipherletters |
H |
G |
H |
H |
U |
Potential decryption letters |
P |
U |
P |
P |
Y |
M |
O |
M |
M |
Y |
|
B |
O |
B |
B |
Y |
|
L |
U |
L |
L |
S |
|
N |
A |
N |
N |
Y |
The following is a cipherletter mapping created using Table 17-1:
H has the potential decryption letters P, M, B, L, and N.
G has the potential decryption letters U, O, and A.
U has the potential decryption letters Y and S.
All of the other cipherletters besides H, G, and U have no potential decryption letters in this example.
A cipherletter mapping shows all the letters of the alphabet and their potential decryption letters. As we start to gather encrypted messages, we’ll find potential decryption letters for every letter in the alphabet, but because only the cipherletters H, G, and U were part of our example ciphertext, we don’t have the potential decryption letters of other cipherletters.
Notice also that U has only two potential decryption letters (Y and S) because there are overlaps between the candidates, many of which end in the letter Y. The more overlaps there are, the fewer potential decryption letters there will be, and the easier it will be to figure out what that cipherletter decrypts to.
To represent Table 17-1 in Python code, we’ll use a dictionary value to represent cipherletter mappings as follows (the key-value pairs for 'H', 'G', and 'U' are in bold):
{'A': [], 'B': [], 'C': [], 'D': [], 'E': [], 'F': [], 'G': ['U', 'O', 'A'],
'H': ['P', 'M', 'B', 'L', 'N'], 'I': [], 'J': [], 'K': [], 'L': [], 'M': [],
'N': [], 'O': [], 'P': [], 'Q': [], 'R': [], 'S': [], 'T': [], 'U': ['Y',
'S'], 'V': [], 'W': [], 'X': [], 'Y': [], 'Z': []}
This dictionary has 26 key-value pairs, one key for each letter of the alphabet and a list of potential decryption letters for each letter. It shows potential decryption letters for keys 'H', 'G', and 'U'. The other keys have empty lists, [], for values, because they have no potential decryption letters so far.
If we can reduce the number of potential decryption letters for a cipherletter to just one letter by cross-referencing cipherletter mappings of other encrypted words, we can find what that cipherletter decrypts to. Even if we can’t solve all 26 cipherletters, we might be able to hack most of the cipherletter mappings to decrypt most of the ciphertext.
Now that we’ve covered some of the basic concepts and terminology we’ll be using in this chapter, let’s look at the steps involved in the hacking process.
Hacking the simple substitution cipher is pretty easy using word patterns. We can summarize the major steps of the hacking process as follows:
Find the word pattern for each cipherword in the ciphertext.
Find the English word candidates that each cipherword could decrypt to.
Create a dictionary showing potential decryption letters for each cipherletter to act as the cipherletter mapping for each cipherword.
Combine the cipherletter mappings into a single mapping, which we’ll call an intersected mapping.
Remove any solved cipherletters from the combined mapping.
Decrypt the ciphertext with the solved cipherletters.
The more cipherwords in a ciphertext, the more likely it is for the mappings to overlap with one another and the fewer the potential decryption letters for each cipherletter. This means that in the simple substitution cipher, the longer the ciphertext message, the easier it is to hack.
Before diving into the source code, let’s look at how we can make the first two steps of the hacking process easier. We’ll use the dictionary file we used in Chapter 11 and a module called wordPatterns.py to get the word pattern for every word in the dictionary file and sort them in a list.
To calculate word patterns for every word in the dictionary.txt dictionary file, download makeWordPatterns.py from https://www.nostarch.com/crackingcodes/. Make sure this program and dictionary.txt are both in the folder where you’ll be saving this chapter’s simpleSubHacker.py program.
The makeWordPatterns.py program has a getWordPattern() function that takes a string (such as 'puppy') and returns its word pattern (such as '0.1.0.0.2'). When you run makeWordPatterns.py, it should create the Python module wordPatterns.py. The module contains a single variable assignment statement, as shown here, and is more than 43,000 lines long:
allPatterns = {'0.0.1': ['EEL'],
'0.0.1.2': ['EELS', 'OOZE'],
'0.0.1.2.0': ['EERIE'],
'0.0.1.2.3': ['AARON', 'LLOYD', 'OOZED'],
--snip--
The allPatterns variable contains a dictionary value with the word pattern strings as keys and a list of English words that match the pattern as its values. For example, to find all the English words with the pattern 0.1.2.1.3.4.5.4.6.7.8, enter the following into the interactive shell:
>>> import wordPatterns
>>> wordPatterns.allPatterns['0.1.2.1.3.4.5.4.6.7.8']
['BENEFICIARY', 'HOMOGENEITY', 'MOTORCYCLES']
In the allPatterns dictionary, the key '0.1.2.1.3.4.5.4.6.7.8' has the list value ['BENEFICIARY', 'HOMOGENEITY', 'MOTORCYCLES'], which contains three English words with this particular word pattern.
Now let’s import the wordPatterns.py module to start building the simple substitution hacking program!
NOTE
If you get a ModuleNotFoundError error message when importing wordPatterns into the interactive shell, enter the following into the interactive shell first:
>>> import sys
>>> sys.path.append('name_of_folder')
Replace name_of_folder with the location where wordPatterns.py is saved. This tells the interactive shell to look for modules in the folder you specify.
Open a file editor window by selecting File▸New File. Enter the following code into the file editor and save it as simpleSubHacker.py. Be sure to place the pyperclip.py, simpleSubCipher.py, and wordPatterns.py files in the same directory as simpleSubHacker.py. Press F5 to run the program.
simpleSub
Hacker.py
1. # Simple Substitution Cipher Hacker
2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
3.
4. import os, re, copy, pyperclip, simpleSubCipher, wordPatterns,
makeWordPatterns
5.
6.
7.
8.
9.
10. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
11. nonLettersOrSpacePattern = re.compile('[^A-Z\s]')
12.
13. def main():
14. message = 'Sy l nlx sr pyyacao l ylwj eiswi upar lulsxrj isr
sxrjsxwjr, ia esmm rwctjsxsza sj wmpramh, lxo txmarr jia aqsoaxwa
sr pqaceiamnsxu, ia esmm caytra jp famsaqa sj. Sy, px jia pjiac
ilxo, ia sr pyyacao rpnajisxu eiswi lyypcor l calrpx ypc lwjsxu sx
lwwpcolxwa jp isr sxrjsxwjr, ia esmm lwwabj sj aqax px jia
rmsuijarj aqsoaxwa. Jia pcsusx py nhjir sr agbmlsxao sx jisr elh.
-Facjclxo Ctrramm'
15.
16. # Determine the possible valid ciphertext translations:
17. print('Hacking...')
18. letterMapping = hackSimpleSub(message)
19.
20. # Display the results to the user:
21. print('Mapping:')
22. print(letterMapping)
23. print()
24. print('Original ciphertext:')
25. print(message)
26. print()
27. print('Copying hacked message to clipboard:')
28. hackedMessage = decryptWithCipherletterMapping(message, letterMapping)
29. pyperclip.copy(hackedMessage)
30. print(hackedMessage)
31.
32.
33. def getBlankCipherletterMapping():
34. # Returns a dictionary value that is a blank cipherletter mapping:
35. return {'A': [], 'B': [], 'C': [], 'D': [], 'E': [], 'F': [], 'G': [],
'H': [], 'I': [], 'J': [], 'K': [], 'L': [], 'M': [], 'N': [],
'O': [], 'P': [], 'Q': [], 'R': [], 'S': [], 'T': [], 'U': [],
'V': [], 'W': [], 'X': [], 'Y': [], 'Z': []}
36.
37.
38. def addLettersToMapping(letterMapping, cipherword, candidate):
39. # The letterMapping parameter takes a dictionary value that
40. # stores a cipherletter mapping, which is copied by the function.
41. # The cipherword parameter is a string value of the ciphertext word.
42. # The candidate parameter is a possible English word that the
43. # cipherword could decrypt to.
44.
45. # This function adds the letters in the candidate as potential
46. # decryption letters for the cipherletters in the cipherletter
47. # mapping.
48.
49.
50. for i in range(len(cipherword)):
51. if candidate[i] not in letterMapping[cipherword[i]]:
52. letterMapping[cipherword[i]].append(candidate[i])
53.
54.
55.
56. def intersectMappings(mapA, mapB):
57. # To intersect two maps, create a blank map and then add only the
58. # potential decryption letters if they exist in BOTH maps:
59. intersectedMapping = getBlankCipherletterMapping()
60. for letter in LETTERS:
61.
62. # An empty list means "any letter is possible". In this case just
63. # copy the other map entirely:
64. if mapA[letter] == []:
65. intersectedMapping[letter] = copy.deepcopy(mapB[letter])
66. elif mapB[letter] == []:
67. intersectedMapping[letter] = copy.deepcopy(mapA[letter])
68. else:
69. # If a letter in mapA[letter] exists in mapB[letter],
70. # add that letter to intersectedMapping[letter]:
71. for mappedLetter in mapA[letter]:
72. if mappedLetter in mapB[letter]:
73. intersectedMapping[letter].append(mappedLetter)
74.
75. return intersectedMapping
76.
77.
78. def removeSolvedLettersFromMapping(letterMapping):
79. # Cipherletters in the mapping that map to only one letter are
80. # "solved" and can be removed from the other letters.
81. # For example, if 'A' maps to potential letters ['M', 'N'], and 'B'
82. # maps to ['N'], then we know that 'B' must map to 'N', so we can
83. # remove 'N' from the list of what 'A' could map to. So 'A' then maps
84. # to ['M']. Note that now that 'A' maps to only one letter, we can
85. # remove 'M' from the list of letters for every other letter.
86. # (This is why there is a loop that keeps reducing the map.)
87.
88. loopAgain = True
89. while loopAgain:
90. # First assume that we will not loop again:
91. loopAgain = False
92.
93. # solvedLetters will be a list of uppercase letters that have one
94. # and only one possible mapping in letterMapping:
95. solvedLetters = []
96. for cipherletter in LETTERS:
97. if len(letterMapping[cipherletter]) == 1:
98. solvedLetters.append(letterMapping[cipherletter][0])
99.
100. # If a letter is solved, then it cannot possibly be a potential
101. # decryption letter for a different ciphertext letter, so we
102. # should remove it from those other lists:
103. for cipherletter in LETTERS:
104. for s in solvedLetters:
105. if len(letterMapping[cipherletter]) != 1 and s in
letterMapping[cipherletter]:
106. letterMapping[cipherletter].remove(s)
107. if len(letterMapping[cipherletter]) == 1:
108. # A new letter is now solved, so loop again:
109. loopAgain = True
110. return letterMapping
111.
112.
113. def hackSimpleSub(message):
114. intersectedMap = getBlankCipherletterMapping()
115. cipherwordList = nonLettersOrSpacePattern.sub('',
message.upper()).split()
116. for cipherword in cipherwordList:
117. # Get a new cipherletter mapping for each ciphertext word:
118. candidateMap = getBlankCipherletterMapping()
119.
120. wordPattern = makeWordPatterns.getWordPattern(cipherword)
121. if wordPattern not in wordPatterns.allPatterns:
122. continue # This word was not in our dictionary, so continue.
123.
124. # Add the letters of each candidate to the mapping:
125. for candidate in wordPatterns.allPatterns[wordPattern]:
126. addLettersToMapping(candidateMap, cipherword, candidate)
127.
128. # Intersect the new mapping with the existing intersected mapping:
129. intersectedMap = intersectMappings(intersectedMap, candidateMap)
130.
131. # Remove any solved letters from the other lists:
132. return removeSolvedLettersFromMapping(intersectedMap)
133.
134.
135. def decryptWithCipherletterMapping(ciphertext, letterMapping):
136. # Return a string of the ciphertext decrypted with the letter mapping,
137. # with any ambiguous decrypted letters replaced with an underscore.
138.
139. # First create a simple sub key from the letterMapping mapping:
140. key = ['x'] * len(LETTERS)
141. for cipherletter in LETTERS:
142. if len(letterMapping[cipherletter]) == 1:
143. # If there's only one letter, add it to the key:
144. keyIndex = LETTERS.find(letterMapping[cipherletter][0])
145. key[keyIndex] = cipherletter
146. else:
147. ciphertext = ciphertext.replace(cipherletter.lower(), '_')
148. ciphertext = ciphertext.replace(cipherletter.upper(), '_')
149. key = ''.join(key)
150.
151. # With the key we've created, decrypt the ciphertext:
152. return simpleSubCipher.decryptMessage(key, ciphertext)
153.
154.
155. if __name__ == '__main__':
156. main()
When you run this program, it attempts to hack the ciphertext in the message variable. Its output should look like this:
Hacking...
Mapping:
{'A': ['E'], 'B': ['Y', 'P', 'B'], 'C': ['R'], 'D': [], 'E': ['W'], 'F':
['B', 'P'], 'G': ['B', 'Q', 'X', 'P', 'Y'], 'H': ['P', 'Y', 'K', 'X', 'B'],
'I': ['H'], 'J': ['T'], 'K': [], 'L': ['A'], 'M': ['L'], 'N': ['M'], 'O':
['D'], 'P': ['O'], 'Q': ['V'], 'R': ['S'], 'S': ['I'], 'T': ['U'], 'U': ['G'],
'V': [], 'W': ['C'], 'X': ['N'], 'Y': ['F'], 'Z': ['Z']}
Original ciphertext:
Sy l nlx sr pyyacao l ylwj eiswi upar lulsxrj isr sxrjsxwjr, ia esmm
rwctjsxsza sj wmpramh, lxo txmarr jia aqsoaxwa sr pqaceiamnsxu, ia esmm caytra
jp famsaqa sj. Sy, px jia pjiac ilxo, ia sr pyyacao rpnajisxu eiswi lyypcor
l calrpx ypc lwjsxu sx lwwpcolxwa jp isr sxrjsxwjr, ia esmm lwwabj sj aqax
px jia rmsuijarj aqsoaxwa. Jia pcsusx py nhjir sr agbmlsxao sx jisr elh.
-Facjclxo Ctrramm
Copying hacked message to clipboard:
If a man is offered a fact which goes against his instincts, he will
scrutinize it closel_, and unless the evidence is overwhelming, he will refuse
to _elieve it. If, on the other hand, he is offered something which affords
a reason for acting in accordance to his instincts, he will acce_t it even
on the slightest evidence. The origin of m_ths is e__lained in this wa_.
-_ertrand Russell
Now let’s explore the source code in detail.
Let’s look at the first few lines of the simple substitution hacking program. Line 4 imports seven different modules, more than any other program so far. The global variable LETTERS on line 10 stores the symbol set, which consists of the uppercase letters of the alphabet.
1. # Simple Substitution Cipher Hacker
2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
3.
4. import os, re, copy, pyperclip, simpleSubCipher, wordPatterns,
makeWordPatterns
--snip--
10. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
The re module is the regular expression module, which allows for sophisticated string manipulation using regular expressions. Let’s look at how regular expressions work.
Regular expressions are strings that define a specific pattern that matches certain strings. For example, the string '[^A-Z\s]' on line 11 is a regular expression that tells Python to find any character that is not an uppercase letter from A to Z or a whitespace character (such as a space, tab, or newline character).
11. nonLettersOrSpacePattern = re.compile('[^A-Z\s]')
The re.compile() function creates a regular expression pattern object (abbreviated as regex object or pattern object) that the re module can use. We’ll use this object to remove any non-letter characters from the ciphertext in “The hackSimpleSub() Function” on page 241.
You can perform many sophisticated string manipulations with regular expressions. To learn more about regular expressions, go to https://www.nostarch.com/crackingcodes/.
As with the previous hacking programs in this book, the main() function stores the ciphertext in the message variable, and line 18 passes this variable to the hackSimpleSub() function:
13. def main():
14. message = 'Sy l nlx sr pyyacao l ylwj eiswi upar lulsxrj isr
sxrjsxwjr, ia esmm rwctjsxsza sj wmpramh, lxo txmarr jia aqsoaxwa
sr pqaceiamnsxu, ia esmm caytra jp famsaqa sj. Sy, px jia pjiac
ilxo, ia sr pyyacao rpnajisxu eiswi lyypcor l calrpx ypc lwjsxu sx
lwwpcolxwa jp isr sxrjsxwjr, ia esmm lwwabj sj aqax px jia
rmsuijarj aqsoaxwa. Jia pcsusx py nhjir sr agbmlsxao sx jisr elh.
-Facjclxo Ctrramm'
15.
16. # Determine the possible valid ciphertext translations:
17. print('Hacking...')
18. letterMapping = hackSimpleSub(message)
Instead of returning the decrypted message or None if unable to decrypt it, hackSimpleSub() returns an intersected cipherletter mapping with the decrypted letters removed. (We’ll look at how to create an intersected mapping in “Intersecting Two Mappings” on page 234.) This intersected cipherletter mapping then gets passed to decryptWithCipherletterMapping() to decrypt the ciphertext stored in message into a readable format, as you’ll see in more detail in “Decrypting the Message” on page 243.
The cipherletter mapping stored in letterMapping is a dictionary value that has 26 uppercase single-letter strings as keys that represent the cipherletters. It also lists the uppercase letters of potential decryption letters for each cipherletter as the dictionary’s values. When every cipherletter has just one potential decryption letter associated with it, we have a fully solved mapping and can decrypt any ciphertext using the same cipher and key.
Each cipherletter mapping generated depends on the ciphertext used. In some cases, we’ll have only a partially solved mapping in which some cipherletters have no potential decryptions and other cipherletters have multiple potential decryptions. Shorter ciphertexts that don’t contain every letter of the alphabet are more likely to result in incomplete mappings.
The program then calls the print() function to display letterMapping, the original message, and the decrypted message on the screen:
20. # Display the results to the user:
21. print('Mapping:')
22. print(letterMapping)
23. print()
24. print('Original ciphertext:')
25. print(message)
26. print()
27. print('Copying hacked message to clipboard:')
28. hackedMessage = decryptWithCipherletterMapping(message, letterMapping)
29. pyperclip.copy(hackedMessage)
30. print(hackedMessage)
Line 28 stores the decrypted message in the variable hackedMessage, which is copied to the clipboard and printed to the screen so the user can compare it to the original message. We use decryptWithCipherletterMapping() to find the decrypted message, which is defined later in the program.
Next, let’s look at all the functions that create the cipherletter mappings.
The program needs a cipherletter mapping for each cipherword in the ciphertext. To create a complete mapping, we’ll need several helper functions. One of those helper functions will set up a new cipherletter mapping so we can call it for every cipherword.
Another function will take a cipherword, its current letter mapping, and a candidate decryption word to find all the candidate decryption words. We’ll call this function for each cipherword and each candidate. The function will then add all the potential decryption letters from the candidate word to the cipherword’s letter mapping and return the letter mapping.
When we have letter mappings for several words from the ciphertext, we’ll use a function to merge them together. Then, we’ll use one final helper function to solve as many cipherletters’ decryptions as we can by matching one decryption letter to each cipherletter. As noted, we won’t always be able to solve all the cipherletters, but you’ll find out how to deal with this issue in “Decrypting the Message” on page 243.
First, we’ll need to create a blank cipherletter mapping.
33. def getBlankCipherletterMapping():
34. # Returns a dictionary value that is a blank cipherletter mapping:
35. return {'A': [], 'B': [], 'C': [], 'D': [], 'E': [], 'F': [], 'G': [],
'H': [], 'I': [], 'J': [], 'K': [], 'L': [], 'M': [], 'N': [],
'O': [], 'P': [], 'Q': [], 'R': [], 'S': [], 'T': [], 'U': [],
'V': [], 'W': [], 'X': [], 'Y': [], 'Z': []}
When called, the getBlankCipherletterMapping() function returns a dictionary with the keys set to one-character strings of the 26 letters of the alphabet.
To add letters to a mapping, we define the addLettersToMapping() function on line 38.
38. def addLettersToMapping(letterMapping, cipherword, candidate):
This function takes three parameters: a cipherletter mapping (letterMapping), a cipherword to map (cipherword), and a candidate decryption word the cipherword could decrypt to (candidate). The function maps every letter in candidate to the cipherletter at the corresponding index position in the cipherword and adds that letter to letterMapping if it isn’t already there.
For example, if 'PUPPY' is the candidate for the cipherword 'HGHHU', the addLettersToMapping() function adds the value 'P' to the key 'H' in letterMapping. Then the function moves to the next letter and appends 'U' to the list value paired with the key 'G', and so on.
If the letter is already in the list of potential decryption letters, then addLettersToMapping() doesn’t add that letter to the list again. For example, in 'PUPPY' it would skip adding 'P' to the 'H' key for the next two instances of 'P' because it’s already there. Finally, the function changes the value for key 'U' so it has 'Y' in its list of potential decryption letters.
The code in addLettersToMapping() assumes that len(cipherword) is the same as len(candidate) because we should only pass a cipherword and candidate pair with matching word patterns.
Then the program iterates over each index in the string in cipherword to check if a letter has already been added to the list of potential decryption letters:
50. for i in range(len(cipherword)):
51. if candidate[i] not in letterMapping[cipherword[i]]:
52. letterMapping[cipherword[i]].append(candidate[i])
We’ll use the variable i to iterate through each letter of cipherword and its corresponding potential decryption letter in candidate through indexing. We can do this because the potential decryption letter to be added is candidate[i] for the cipherletter cipherword[i]. For example, if the cipherword was 'HGHHU' and the candidate was 'PUPPY', i would start at index 0, and we would use cipherword[0] and candidate[0] to access the first letters in each string. Then the execution would move on to the if statement on line 51.
The if statement checks that the potential decryption letter, candidate[i], is not already in the list of potential decryption letters for the cipherletter and doesn’t add it if it’s already in the list. It does this by accessing the cipherletter in the mapping with letterMapping[cipherword[i]], because cipherword[i] is the key in letterMapping that needs to be accessed. This check prevents duplicate letters in the list of potential decryption letters.
For example, the first 'P' in 'PUPPY' might be added to the letterMapping at the first iteration of the loop, but when i is equal to 2 in the third iteration, the 'P' from candidate[2] wouldn’t be added to the mapping because it was already added at the first iteration.
If the potential decryption letter isn’t already in the mapping, line 52 adds the new letter, candidate[i], to the list of potential decryption letters in the cipherletter mapping at letterMapping[cipherword[i]].
Recall that because Python passes a copy of the reference to a dictionary passed for the parameter, instead of a copy of the dictionary itself, any changes made to letterMapping in this function will be done outside the addLettersToMapping() function as well. This is because both copies of the reference still refer to the same dictionary passed for the letterMapping parameter in the call to addLettersToMapping() on line 126.
After looping through all the indexes in cipherword, the function is done adding letters to the mapping in the letterMapping variable. Now let’s look at how the program compares this mapping to that of other cipherwords to check for overlaps.
The hackSimpleSub() function uses the intersectMappings() function to take two cipherletter mappings passed as its mapA and mapB parameters and return a merged mapping of mapA and mapB. The intersectMappings() function instructs the program to combine mapA and mapB, create a blank map, and then add the potential decryption letters to the blank map only if they exist in both maps to prevent duplicates.
56. def intersectMappings(mapA, mapB):
57. # To intersect two maps, create a blank map and then add only the
58. # potential decryption letters if they exist in BOTH maps:
59. intersectedMapping = getBlankCipherletterMapping()
First, line 59 creates a cipherletter mapping to store the merged mapping by calling getBlankCipherletterMapping() and storing the returned value in the intersectedMapping variable.
The for loop on line 60 loops through the uppercase letters in the LETTERS constant variable and uses the letter variable as the keys of the mapA and mapB dictionaries:
60. for letter in LETTERS:
61.
62. # An empty list means "any letter is possible". In this case just
63. # copy the other map entirely:
64. if mapA[letter] == []:
65. intersectedMapping[letter] = copy.deepcopy(mapB[letter])
66. elif mapB[letter] == []:
67. intersectedMapping[letter] = copy.deepcopy(mapA[letter])
Line 64 checks whether the list of potential decryption letters for mapA is blank. A blank list means that this cipherletter could potentially decrypt to any letter. In this case, the intersected cipherletter mapping just copies the other mapping’s list of potential decryption letters. For example, if the list of potential decryption letters in mapA is blank, then line 65 sets the intersected mapping’s list to be a copy of the list in mapB, and vice versa on line 67. Note that if both mappings’ lists are blank, the condition on line 64 is still True, and then line 65 simply copies the blank list in mapB to the intersected mapping.
The else block on line 68 handles the case in which neither mapA nor mapB is blank:
68. else:
69. # If a letter in mapA[letter] exists in mapB[letter],
70. # add that letter to intersectedMapping[letter]:
71. for mappedLetter in mapA[letter]:
72. if mappedLetter in mapB[letter]:
73. intersectedMapping[letter].append(mappedLetter)
74.
75. return intersectedMapping
When the maps are not blank, line 71 loops through the uppercase letter strings in the list at mapA[letter]. Line 72 checks whether the uppercase letter in mapA[letter] also exists in the list of uppercase letter strings in mapB[letter]. If it does, then intersectedMapping[letter] on line 73 adds this common letter to the list of potential decryption letters.
After the for loop that started on line 60 has finished, the cipherletter mapping in intersectedMapping should only have the potential decryption letters that exist in the lists of potential decryption letters of both mapA and mapB. Line 75 returns this completely intersected cipherletter mapping. Next, let’s look at an example output of an intersected mapping.
Now that we’ve defined the letter-mapping helper functions, let’s try using them in the interactive shell to better understand how they work together. Let’s create an intersected cipherletter map for the ciphertext 'OLQIHXIRCKGNZ PLQRZKBZB MPBKSSIPLC', which contains just three cipherwords. We’ll do this by creating a mapping for each word and then combining the mappings.
Import simpleSubHacker.py into the interactive shell:
>>> import simpleSubHacker
Next, we call getBlankCipherletterMapping() to create a blank letter mapping and store this mapping in a variable named letterMapping1:
>>> letterMapping1 = simpleSubHacker.getBlankCipherletterMapping()
>>> letterMapping1
{'A': [], 'C': [], 'B': [], 'E': [], 'D': [], 'G': [], 'F': [], 'I': [],
'H': [], 'K': [], 'J': [], 'M': [], 'L': [], 'O': [], 'N': [], 'Q': [],
'P': [], 'S': [], 'R': [], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [],
'X': [], 'Z': []}
Let’s start hacking the first cipherword, 'OLQIHXIRCKGNZ'. First, we need to get the word pattern for this cipherword by calling the makeWordPattern module’s getWordPattern() function, as shown here:
>>> import makeWordPatterns
>>> makeWordPatterns.getWordPattern('OLQIHXIRCKGNZ')
0.1.2.3.4.5.3.6.7.8.9.10.11
To figure out which English words in the dictionary have the word pattern 0.1.2.3.4.5.3.6.7.8.9.10.11 (that is, to figure out the candidates for the cipherword 'OLQIHXIRCKGNZ'), we import the wordPatterns module and look up this pattern:
>>> import wordPatterns
>>> candidates = wordPatterns.allPatterns['0.1.2.3.4.5.3.6.7.8.9.10.11']
>>> candidates
['UNCOMFORTABLE', 'UNCOMFORTABLY']
Two English words match the word pattern for 'OLQIHXIRCKGNZ'; therefore, the only two words the first cipherword could decrypt to are 'UNCOMFORTABLE' and 'UNCOMFORTABLY'. These words are our candidates, so we’ll store them in the candidates variable (not to be confused with the candidate parameter in the addLettersToMapping() function) as a list.
Next, we need to map their letters to the cipherword’s letters using addLettersToMapping(). First, we’ll map 'UNCOMFORTABLE' by accessing the first member of the candidates list, like so:
>>> simpleSubHacker.addLettersToMapping(letterMapping1,
'OLQIHXIRCKGNZ', candidates[0])
>>> letterMapping1
{'A': [], 'C': ['T'], 'B': [], 'E': [], 'D': [], 'G': ['B'], 'F': [], 'I':
['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M': [], 'L': ['N'], 'O': ['U'], 'N':
['L'], 'Q': ['C'], 'P': [], 'S': [], 'R': ['R'], 'U': [], 'T': [], 'W': [],
'V': [], 'Y': [], 'X': ['F'], 'Z': ['E']}
From the letterMapping1 value, you can see that the letters in 'OLQIHXIRCKGNZ' map to the letters in 'UNCOMFORTABLE': 'O' maps to ['U'], 'L' maps to ['N'], 'Q' maps to ['C'], and so on.
But because the letters in 'OLQIHXIRCKGNZ' could also possibly decrypt to 'UNCOMFORTABLY', we also need to add it to the cipherletter mapping. Enter the following into the interactive shell:
>>> simpleSubHacker.addLettersToMapping(letterMapping1,
'OLQIHXIRCKGNZ', candidates[1])
>>> letterMapping1
{'A': [], 'C': ['T'], 'B': [], 'E': [], 'D': [], 'G': ['B'], 'F': [],
'I': ['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M': [], 'L': ['N'], 'O': ['U'],
'N': ['L'], 'Q': ['C'], 'P': [], 'S': [], 'R': ['R'], 'U': [], 'T': [],
'W': [], 'V': [], 'Y': [], 'X': ['F'], 'Z': ['E', 'Y']}
Notice that not much has changed in letterMapping1 except the cipherletter mapping in letterMapping1 now has 'Z' map to 'Y' in addition to 'E'. That’s because addLettersToMapping() adds the letter to the list only if the letter is not already there.
Now we have a cipherletter mapping for the first of the three cipherwords. We need to get a new mapping for the second cipherword, 'PLQRZKBZB’, and repeat the process:
>>> letterMapping2 = simpleSubHacker.getBlankCipherletterMapping()
>>> wordPat = makeWordPatterns.getWordPattern('PLQRZKBZB')
>>> candidates = wordPatterns.allPatterns[wordPat]
>>> candidates
['CONVERSES', 'INCREASES', 'PORTENDED', 'UNIVERSES']
>>> for candidate in candidates:
... simpleSubHacker.addLettersToMapping(letterMapping2,
'PLQRZKBZB', candidate)
...
>>> letterMapping2
{'A': [], 'C': [], 'B': ['S', 'D'], 'E': [], 'D': [], 'G': [], 'F': [], 'I':
[], 'H': [], 'K': ['R', 'A', 'N'], 'J': [], 'M': [], 'L': ['O', 'N'], 'O': [],
'N': [], 'Q': ['N', 'C', 'R', 'I'], 'P': ['C', 'I', 'P', 'U'], 'S': [], 'R':
['V', 'R', 'T'], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': [], 'Z':
['E']}
Instead of entering four calls to addLettersToMapping() for each of these four candidate words, we can write a for loop that goes through the list in candidates and calls addLettersToMapping() on each of them. This finishes the cipherletter mapping for the second cipherword.
Next, we need to get the intersection of the cipherletter mappings in letterMapping1 and letterMapping2 by passing them to intersectMappings(). Enter the following into the interactive shell:
>>> intersectedMapping = simpleSubHacker.intersectMappings(letterMapping1,
letterMapping2)
>>> intersectedMapping
{'A': [], 'C': ['T'], 'B': ['S', 'D'], 'E': [], 'D': [], 'G': ['B'], 'F': [],
'I': ['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M': [], 'L': ['N'], 'O': ['U'],
'N': ['L'], 'Q': ['C'], 'P': ['C', 'I', 'P', 'U'], 'S': [], 'R': ['R'],
'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': ['F'], 'Z': ['E']}
Now the list of potential decryption letters for any cipherletter in the intersected mapping should be only the potential decryption letters that are in both letterMapping1 and letterMapping2.
For example, the list in intersectedMapping for the 'Z' key is just ['E'] because letterMapping1 had ['E', 'Y'] but letterMapping2 had only ['E'].
Next, we repeat all the preceding steps for the third cipherword, 'MPBKSSIPLC', as follows:
>>> letterMapping3 = simpleSubHacker.getBlankCipherletterMapping()
>>> wordPat = makeWordPatterns.getWordPattern('MPBKSSIPLC')
>>> candidates = wordPatterns.allPatterns[wordPat]
>>> for i in range(len(candidates)):
... simpleSubHacker.addLettersToMapping(letterMapping3,
'MPBKSSIPLC', candidates[i])
...
>>> letterMapping3
{'A': [], 'C': ['Y', 'T'], 'B': ['M', 'S'], 'E': [], 'D': [], 'G': [],
'F': [], 'I': ['E', 'O'], 'H': [], 'K': ['I', 'A'], 'J': [], 'M': ['A', 'D'],
'L': ['L', 'N'], 'O': [], 'N': [], 'Q': [], 'P': ['D', 'I'], 'S': ['T', 'P'],
'R': [], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': [], 'Z': []}
Enter the following into the interactive shell to intersect letterMapping3 with intersectedMapping, which is the intersected mapping of letterMapping1 and letterMapping2:
>>> intersectedMapping = simpleSubHacker.intersectMappings(intersectedMapping,
letterMapping3)
>>> intersectedMapping
{'A': [], 'C': ['T'], 'B': ['S'], 'E': [], 'D': [], 'G': ['B'], 'F': [],
'I': ['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M': ['A', 'D'], 'L': ['N'],
'O': ['U'], 'N': ['L'], 'Q': ['C'], 'P': ['I'], 'S': ['T', 'P'], 'R': ['R'],
'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': ['F'], 'Z': ['E']}
In this example, we’re able to find solutions for the keys that have only one value in their list. For example, 'K' decrypts to 'A'. But notice that key 'M' could decrypt to 'A' or 'D'. Because we know that 'K' decrypts to 'A', we can deduce that key 'M' must decrypt to 'D', not 'A'. After all, if the solved letter is used by one cipherletter, it can’t be used by another cipherletter, because the simple substitution cipher encrypts a plaintext letter to exactly one cipherletter.
Let’s look at how the removeSolvedLettersFromMapping() function finds these solved letters and removes them from the list of potential decryption letters. We’ll need the intersectedMapping we just created, so don’t close the IDLE window just yet.
The removeSolvedLettersFromMapping() function searches for any cipherletters in the letterMapping parameter that have only one potential decryption letter. These cipherletters are considered solved, which means that any other cipherletters with this solved letter in their list of potential decryption letters can’t possibly decrypt to this letter. This could cause a chain reaction, because when one potential decryption letter is removed from other lists of potential decryption letters holding just two letters, the result could be a new solved cipherletter. The program handles this situation by looping and removing the newly solved letter from the entire cipherletter mapping.
78. def removeSolvedLettersFromMapping(letterMapping):
--snip--
88. loopAgain = True
89. while loopAgain:
90. # First assume that we will not loop again:
91. loopAgain = False
Because a reference to a dictionary is passed for the letterMapping parameter, that dictionary will contain the changes made in the function removeSolvedLettersFromMapping() even after the function returns. Line 88 creates loopAgain, a variable that holds a Boolean value, which determines whether the code needs to loop again when it finds another solved letter.
If the loopAgain variable is set to True on line 88, the program execution enters the while loop on line 89. At the beginning of the loop, line 91 sets loopAgain to False. The code assumes that this is the last iteration through the while loop on line 89. The loopAgain variable is only set to True if the program finds a new solved cipherletter during this iteration.
The next part of the code creates a list of cipherletters that have exactly one potential decryption letter. These are the solved letters that will be removed from the mapping.
93. # solvedLetters will be a list of uppercase letters that have one
94. # and only one possible mapping in letterMapping:
95. solvedLetters = []
96. for cipherletter in LETTERS:
97. if len(letterMapping[cipherletter]) == 1:
98. solvedLetters.append(letterMapping[cipherletter][0])
The for loop on line 96 goes through all 26 possible cipherletters and looks at the cipherletter mapping’s list of potential decryption letters for that cipherletter (that is, the list at letterMapping[cipherletter]).
Line 97 checks whether the length of this list is 1. If it is, we know there’s only one letter that the cipherletter could decrypt to and the cipherletter is solved. Line 98 adds the solved decryption letter to the solvedLetters list. The solved letter is always at letterMapping[cipherletter][0] because letterMapping[cipherletter] is a list of potential decryption letters that has only one string value in it at index 0 of the list.
After the previous for loop that started on line 96 has finished, the solvedLetters variable should contain a list of all the decryptions of a cipherletter. Line 98 stores these decrypted strings in solvedLetters as a list.
At this point, the program is done identifying all the solved letters. Then it checks whether they’re listed as potential decryption letters for other cipherletters and removes them.
To do this, the for loop on line 103 loops through all 26 possible cipherletters and looks at the cipherletter mapping’s list of potential decryption letters.
103. for cipherletter in LETTERS:
104. for s in solvedLetters:
105. if len(letterMapping[cipherletter]) != 1 and s in
letterMapping[cipherletter]:
106. letterMapping[cipherletter].remove(s)
107. if len(letterMapping[cipherletter]) == 1:
108. # A new letter is now solved, so loop again:
109. loopAgain = True
110. return letterMapping
For each cipherletter examined, line 104 loops through the letters in solvedLetters to check whether any of them exists in the list of potential decryption letters for letterMapping[cipherletter].
Line 105 checks whether a list of potential decryption letters isn’t solved by checking whether len(letterMapping[cipherletter]) != 1 and by checking whether the solved letter exists in the list of potential decryption letters. If both criteria are met, this condition returns True, and line 106 removes the solved letter in s from the list of potential decryption letters.
If this removal leaves only one letter in the list of potential decryption letters, line 109 sets the loopAgain variable to True so the code can remove this newly solved letter from the cipherletter mapping on the next iteration of the loop.
After the while loop on line 89 has gone through a full iteration without loopAgain being set to True, the program moves beyond the loop and line 110 returns the cipherletter mapping stored in letterMapping.
The variable letterMapping should now contain a partially or potentially fully solved cipherletter mapping.
Let’s see removeSolvedLetterFromMapping() in action by testing it in the interactive shell. Return to the interactive shell window you had open when you created intersectedMapping. (If you closed the window, don’t worry; you can just reenter the instructions in “How the Letter-Mapping Helper Functions Work” on page 235 and then follow along with this example.)
To remove the solved letters from intersectedMapping, enter the following into the interactive shell:
>>> letterMapping = simpleSubHacker.removeSolvedLettersFromMapping(
intersectedMapping)
>>> intersectedMapping
{'A': [], 'C': ['T'], 'B': ['S'], 'E': [], 'D': [], 'G': ['B'], 'F': [],
'I': ['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M': ['D'], 'L': ['N'], 'O':
['U'], 'N': ['L'], 'Q': ['C'], 'P': ['I'], 'S': ['P'], 'R': ['R'], 'U': [],
'T': [], 'W': [], 'V': [], 'Y': [], 'X': ['F'], 'Z': ['E']}
When you remove the solved letters from intersectedMapping, notice that 'M' now has just one potential decryption letter, 'D', which is what we predicted would be the case. Now each cipherletter has just one potential decryption letter, so we can use the cipherletter mapping to start decrypting. We’ll need to return to this interactive shell example one more time, so keep its window open.
Now that you’ve seen how the functions getBlankCipherletterMapping(), addLettersToMapping(), intersectMappings(), and removeSolvedLettersFromMapping() manipulate the cipherletter mappings you pass them, let’s use them in our simpleSubHacker.py program to decrypt a message.
Line 113 defines the hackSimpleSub() function, which takes a ciphertext message and uses the letter-mapping helper functions to return a partially or fully solved cipherletter mapping:
113. def hackSimpleSub(message):
114. intersectedMap = getBlankCipherletterMapping()
115. cipherwordList = nonLettersOrSpacePattern.sub('',
message.upper()).split()
On line 114, we create a new cipherletter mapping that we store in the intersectedMap variable. This variable will eventually hold the intersected mappings of each of the cipherwords.
On line 115, we remove any non-letter characters from message. The regex object in nonLettersOrSpacePattern matches any string that isn’t a letter or whitespace character. The sub() method is called on a regular expression and takes two arguments. The function searches the string in the second argument for matches, and it replaces those matches with the string in the first argument. Then it returns a string with all these replacements. In this example, the sub() method tells the program to go through the uppercased message string and replace all the non-letter characters with a blank string (''). This makes sub() return a string with all punctuation and number characters removed, and this string is stored in the cipherwordList variable.
After line 115 executes, the cipherwordList variable should contain a list of uppercase strings of the individual cipherwords previously in message.
The for loop on line 116 assigns each string in the message list to the cipherword variable. Inside this loop, the code creates a blank map, gets the cipherword’s candidates, adds the candidates’ letters to a cipherletter mapping, and then intersects this mapping with intersectedMap.
116. for cipherword in cipherwordList:
117. # Get a new cipherletter mapping for each ciphertext word:
118. candidateMap = getBlankCipherletterMapping()
120. wordPattern = makeWordPatterns.getWordPattern(cipherword)
121. if wordPattern not in wordPatterns.allPatterns:
122. continue # This word was not in our dictionary, so continue.
124. # Add the letters of each candidate to the mapping:
125. for candidate in wordPatterns.allPatterns[wordPattern]:
126. addLettersToMapping(candidateMap, cipherword, candidate)
128. # Intersect the new mapping with the existing intersected mapping:
129. intersectedMap = intersectMappings(intersectedMap, candidateMap)
Line 118 gets a new, blank cipherletter mapping from the function getBlankCipherletterMapping() and stores it in the candidateMap variable.
To find the candidates for the current cipherword, line 120 calls getWordPattern() in the makeWordPatterns module. In some cases, the cipherword may be a name or a very uncommon word that doesn’t exist in the dictionary, in which case, its word pattern likely won’t exist in wordPatterns either. If the word pattern of the cipherword doesn’t exist in the keys of the wordPatterns.allPatterns dictionary, the original plaintext word doesn’t exist in the dictionary file. In that case, the cipherword doesn’t get a mapping, and the continue statement on line 122 returns to the next cipherword in the list on line 116.
If the execution reaches line 125, we know the word pattern exists in wordPatterns.allPatterns. The values in the allPatterns dictionary are lists of strings of the English words with the pattern in wordPattern. Because the values are in the form of a list, we use a for loop to iterate over them. The variable candidate is set to each of these English word strings on each iteration of the loop.
The for loop on line 125 calls addLettersToMapping() on line 126 to update the cipherletter mapping in candidateMap using the letters in each of the candidates. The addLettersToMapping() function modifies the list directly, so candidateMap is modified by the time the function call returns.
After all the letters in the candidates are added to the cipherletter mapping in candidateMap, line 129 intersects candidateMap with intersectedMap and returns the new value of intersectedMap.
At this point, the program execution returns to the beginning of the for loop on line 116 to create a new mapping for the next cipherword in the cipherwordList list, and the mapping for the next cipherword is also intersected with intersectedMap. The loop continues mapping cipherwords until it reaches the last word in cipherWordList.
When we have the final intersected cipherletter mapping that contains the mappings of all the cipherwords in the ciphertext, we pass it to removeSolvedLettersFromMapping() on line 132 to remove any solved letters.
131. # Remove any solved letters from the other lists:
132. return removeSolvedLettersFromMapping(intersectedMap)
The cipherletter mapping returned from removeSolvedLettersFromMapping() is then returned for the hackSimpleSub() function. Now we have part of the cipher’s solution, so we can start decrypting the message.
The replace() string method returns a new string with replaced characters. The first argument is the substring to look for, and the second argument is the string to replace those substrings with. Enter the following into the interactive shell to see an example:
>>> 'mississippi'.replace('s', 'X')
'miXXiXXippi'
>>> 'dog'.replace('d', 'bl')
'blog'
>>> 'jogger'.replace('ger', 's')
'jogs'
We’ll use the replace() string method in decryptMessage() in the simpleSubHacker.py program.
To decrypt our message, we’ll use the function simpleSubstitutionCipher.decryptMessage() that we already programmed in simpleSubstitutionCipher.py. But simpleSubstitutionCipher.decryptMessage() decrypts using keys only, not letter mappings, so we can’t use the function directly. To address this issue, we’ll create a decryptWithCipherletterMapping() function that takes a letter mapping, converts the mapping into a key, and then passes the key and message to simpleSubstitutionCipher.decryptMessage(). The function decryptWithCipherletterMapping() will return a decrypted string. Recall that the simple substitution keys are strings of 26 characters and the character at index 0 in the key string is the encrypted character for A, the character at index 1 is the encrypted character for B, and so on.
To convert a mapping into a decryption output we can read easily, we’ll need to first create a placeholder key, which will look like this: ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x']. The lowercase 'x' can be used in the placeholder key because the actual key uses only uppercase letters. (You can use any character that isn’t an uppercase letter as a placeholder.) Because not all the letters will have a decryption, we need to be able to distinguish between parts of the key list that have been filled with the decryption letters and those where the decryption hasn’t been solved. The 'x' indicates letters that haven’t been solved.
Let’s see how this all comes together in the source code:
135. def decryptWithCipherletterMapping(ciphertext, letterMapping):
136. # Return a string of the ciphertext decrypted with the letter mapping,
137. # with any ambiguous decrypted letters replaced with an underscore.
138.
139. # First create a simple sub key from the letterMapping mapping:
140. key = ['x'] * len(LETTERS)
141. for cipherletter in LETTERS:
142. if len(letterMapping[cipherletter]) == 1:
143. # If there's only one letter, add it to the key:
144. keyIndex = LETTERS.find(letterMapping[cipherletter][0])
145. key[keyIndex] = cipherletter
Line 140 creates the placeholder list by replicating the single-item list ['x'] 26 times. Because LETTERS is a string of the letters of the alphabet, len(LETTERS) evaluates to 26. When used on a list and integer, the multiplication operator (*) performs list replication.
The for loop on line 141 checks each of the letters in LETTERS for the cipherletter variable, and if the cipherletter is solved (that is, letterMapping[cipherletter] has only one letter in it), it replaces the 'x' placeholder with that letter.
The letterMapping[cipherletter][0] on line 144 is the decryption letter, and keyIndex is the index of the decryption letter in LETTERS, which is returned from the find() call. Line 145 sets this index in the key list to the decryption letter.
However, if the cipherletter doesn’t have a solution, the function inserts an underscore for that cipherletter to indicate which characters remain unsolved. Line 147 replaces the lowercase letters in cipherletter with an underscore, and line 148 replaces the uppercase letters with an underscore:
146. else:
147. ciphertext = ciphertext.replace(cipherletter.lower(), '_')
148. ciphertext = ciphertext.replace(cipherletter.upper(), '_')
After replacing all the parts in the list in key with the solved letters, the function combines the list of strings into a single string using the join() method to create a simple substitution key. This string is passed to the decryptMessage() function in the simpleSubCipher.py program.
149. key = ''.join(key)
150.
151. # With the key we've created, decrypt the ciphertext:
152. return simpleSubCipher.decryptMessage(key, ciphertext)
Finally, line 152 returns the decrypted message string from the decryptMessage() function. We now have all the functions we need to find an intersected letter mapping, hack a key, and decrypt a message. Let’s look at a quick example of how these functions work in the interactive shell.
Let’s return to the example we used in “How the Letter-Mapping Helper Functions Work” on page 235. We’ll use the intersectedMapping variable we created in our earlier shell examples to decrypt the ciphertext message 'OLQIHXIRCKGNZ PLQRZKBZB MPBKSSIPLC'.
Enter the following into the interactive shell:
>>> simpleSubHacker.decryptWithCipherletterMapping('OLQIHXIRCKGNZ PLQRZKBZB
MPBKSSIPLC', intersectedMapping)
UNCOMFORTABLE INCREASES DISAPPOINT
The ciphertext decrypts to the message “Uncomfortable increases disappoint”. As you can see, the decryptWithCipherletterMapping() function worked perfectly and returned the fully decrypted string. But this example doesn’t show what happens when we don’t have all the letters that appear in the ciphertext solved. To see what happens when we’re missing a cipherletter’s decryption, let’s remove the solution for the cipherletters 'M' and 'S' from intersectedMapping by using the following instructions:
>>> intersectedMapping['M'] = []
>>> intersectedMapping['S'] = []
Then try to decrypt the ciphertext with intersectedMapping again:
>>> simpleSubHacker.decryptWithCipherletterMapping('OLQIHXIRCKGNZ PLQRZKBZB
MPBKSSIPLC', intersectedMapping)
UNCOMFORTABLE INCREASES _ISA__OINT
This time, part of the ciphertext wasn’t decrypted. The cipherletters without a decryption letter were replaced with underscores.
This is a rather short ciphertext to hack. Normally, encrypted messages would be much longer. (This example was specifically chosen to be hackable. Messages as short as this example usually cannot be hacked using the word pattern method.) To hack longer encryptions, you’ll need to create a cipherletter mapping for each cipherword in the longer messages and then intersect them all together. The hackSimpleSub() function calls the other functions in our program to do exactly this.
Lines 155 and 156 call the main() function to run simpleSubHacker.py if it’s being run directly instead of being imported as a module by another Python program:
155. if __name__ == '__main__':
156. main()
That completes our discussion of all the functions the simpleSubHacker.py program uses.
NOTE
Our hacking approach works only if the spaces are not encrypted. You can expand the symbol set so the cipher program encrypts spaces, numbers, and punctuation characters as well as letters, making your encrypted messages even harder (but not impossible) to hack. Hacking such messages would involve updating the frequencies of not just letters, but all the symbols in the symbol set. This makes hacking more complicated, which is the reason this book encrypted letters only.
Whew! The simpleSubHacker.py program is fairly complicated. You learned how to use cipherletter mapping to model the possible decryption letters for each ciphertext letter. You also learned how to narrow down the number of possible keys by adding potential letters to the mapping, intersecting them, and removing solved letters from other lists of potential decryption letters. Instead of brute-forcing 403,291,461,126,605,635,584,000,000 possible keys, you can use some sophisticated Python code to figure out most (if not all) of the original simple substitution key.
The main advantage of the simple substitution cipher is its large number of possible keys. The disadvantage is that it’s relatively easy to compare cipherwords to words in a dictionary file to determine which cipherletters decrypt to which letters. In Chapter 18, we’ll explore a more powerful polyalphabetic substitution cipher called the Vigenère cipher, which was considered impossible to break for several hundred years.