Note: The second edition of this book is available under the title Cracking Codes with Python

Hacking the Simple Substitution Cipher

Topics Covered In This Chapter:

· Word patterns, candidates, potential decryption letters, and cipherletter mappings.

· The pprint.pprint() and pprint.pformat() functions

· Building strings using the list-append-join process

· Regular expressions

· The sub() regex method

“Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation’s border and the arm of its violence.”

Eric Hughes, “A Cypherpunk’s Manifesto”, 1993

http://invpy.com/cypherpunk

Computing Word Patterns

There are too many possible keys to brute-force a simple substitution cipher-encrypted message. We need to employ a more intelligent attack if we want to crack a substitution ciphertext. Let’s examine one possible word from an example ciphertext:

HGHHU

Think about what we can learn from this one word of ciphertext (which we will call a cipherword in this book). We can tell that whatever the original plaintext word is, it must:

1. Be five letters long.

2. Have the first, third, and fourth letters be the same.

3. Have exactly three different letters in the word, where the first, second, and fifth letters in the word are all different from each other.

What words in the English language fit this pattern? “Puppy” is one word that fits this pattern. It is five letters long (P, U, P, P, Y) using three different letters (P, U, Y) in that same pattern (P for the first, third, and fourth letter and U for the second letter and Y for the fifth letter). “Mommy”, “Bobby”, “lulls”, “nanny”, and “lilly” fit the pattern too. (“Lilly” is a name, not to be confused with “Lily” the flower. But since “Lilly” can appear in an Engish message it is a possible word that fits the pattern.) If we had a lot of time on our hands, we could go through an entire dictionary and find all the words that fit this pattern. Even better, we could have a computer go through a dictionary file for us.

In this book a word pattern will be a set of numbers with periods in between the numbers that tells us the pattern of letters for a word, in either ciphertext or plaintext.

Creating word patterns for cipherwords is easy: the first letter gets the number 0 and the first occurrence of each different letter after that gets the next number. For example:

· The word pattern for “cat” is 0.1.2.

· The word pattern for “catty” is 0.1.2.2.3.

· The word pattern for “roofer” is 0.1.1.2.3.0.

· The word pattern for “blimp” is 0.1.2.3.4.

· The word pattern for “classification” is 0.1.2.3.3.4.5.4.0.2.6.4.7.8.

A plaintext word and its cipherword will always have the same word pattern, no matter which simple substitution key was used to do the encryption.

Getting a List of Candidates for a Cipherword

To take a guess at what HGHHU could decrypt to, we can go through the dictionary file and find all of the words that also have a word pattern of 0.1.0.0.2. In this book, we will call these plaintext words (that have the same word pattern as the cipherword) the candidates for that cipherword:

Ciphertext word:	H G H H U
Word pattern:	0.1.0.0.2
Candidates:	p u p p y
	m o m m y
	b o b b y
	l u l l s
	n a n n y
	l i l l y

So if we look at the letters in the cipherword (which will be called cipherletters in this book), we can guess which letters they may decrypt to (we will call these letters the cipherletter’s potential decryption letters in this book):

Cipherletters:	H	G	U
Potential decryption letters:	p	u	y
	m	o	y
	b	o	y
	l	u	s
	n	a	y
	l	i	y

From this table we can create a cipherletter mapping:

· The cipher letter H has the potential decryption letters P, M, B, L, and N

· The cipher letter G has the potential decryption letters U, O, A, and I

· The cipher letter U has the potential decryption letters Y and S

· All of the other cipher letters besides H, G, and U will have no potential decryption letters.

When we represent a cipherletter mapping in Python code, we will use a dictionary value:

{'A': [], 'B': [], 'C': [], 'D': [], 'E': [], 'F': [], 'G': ['U', 'O', 'A', 'I'], 'H': ['P', 'B', 'L', 'N'], 'I': [], 'J': [], 'K': [], 'L': [], 'M': [], 'N': [], 'O': [], 'P': [], 'Q': [], 'R': [], 'S': [], 'T': [], 'U': ['Y', 'S'], 'V': [], 'W': [], 'X': [], 'Y': [], 'Z': []}

In our program, a cipherletter mapping dictionary will have 26 keys, one for each letter. The mapping above has potential decryption letters for 'H', 'G', and 'U' above. The other keys have no potential decryption letters, which is why they have empty lists for values.

If we reduce the number of potential decryption letters for a cipherletter to just one letter, then we have solved what that cipherletter decrypts to. Even if we do not solve all 26 cipherletters, we might be able to hack most of the ciphertext’s cipherletters.

But first we must find the pattern for every word in the dictionary file and sort them in a list so it will be easy to get a list of all the candidates for a given cipherword’s word pattern. We can use the same dictionary file from Chapter 12, which you can download from http://invpy.com/dictionary.txt.

(Note that the terms “word pattern”, “candidate”, and “cipherletter mapping” are terms I came up with to describe things in this particular hacking program. These are not general cryptography terms.)

Practice Exercises, Chapter 18, Set A

Practice exercises can be found at http://invpy.com/hackingpractice18A.

Source Code of the Word Pattern Module

Since the word patterns for words never change, we can just calculate the word pattern for every word in a dictionary file once and store them in another file. Our makeWordPatterns.py program creates a file named wordPatterns.py that will contain a dictionary value with the word pattern for every word in the dictionary file. Our hacking program can then just import wordPatterns to look up the candidates for a certain word pattern.

Source code for makeWordPatterns.py

1. # Makes the wordPatterns.py File

2. # http://inventwithpython.com/hacking (BSD Licensed)

4. # Creates wordPatterns.py based on the words in our dictionary

5. # text file, dictionary.txt. (Download this file from

6. # http://invpy.com/dictionary.txt)

8. import pprint

10.

11. def getWordPattern(word):

12. # Returns a string of the pattern form of the given word.

13. # e.g. '0.1.2.3.4.1.2.3.5.6' for 'DUSTBUSTER'

14. word = word.upper()

15. nextNum = 0

16. letterNums = {}

17. wordPattern = []

18.

19. for letter in word:

20. if letter not in letterNums:

21. letterNums[letter] = str(nextNum)

22. nextNum += 1

23. wordPattern.append(letterNums[letter])

24. return '.'.join(wordPattern)

25.

26.

27. def main():

28. allPatterns = {}

29.

30. fo = open('dictionary.txt')

31. wordList = fo.read().split('\n')

32. fo.close()

33.

34. for word in wordList:

35. # Get the pattern for each string in wordList.

36. pattern = getWordPattern(word)

37.

38. if pattern not in allPatterns:

39. allPatterns[pattern] = [word]

40. else:

41. allPatterns[pattern].append(word)

42.

43. # This is code that writes code. The wordPatterns.py file contains

44. # one very, very large assignment statement.

45. fo = open('wordPatterns.py', 'w')

46. fo.write('allPatterns = ')

47. fo.write(pprint.pformat(allPatterns))

48. fo.close()

49.

50.

51. if __name__ == '__main__':

52. main()

Sample Run of the Word Pattern Module

Running this program doesn’t print anything out to the screen. Instead it silently creates a file named wordPatterns.py in the same folder as makeWordPatterns.py. Open this file in IDLE’s file editor, and you will see it looks like this:

allPatterns = {'0.0.1': ['EEL'],

'0.0.1.2': ['EELS', 'OOZE'],

'0.0.1.2.0': ['EERIE'],

'0.0.1.2.3': ['AARON', 'LLOYD', 'OOZED'],

...the rest has been cut for brevity...

The makeWordPatterns.py program creates wordPatterns.py. Our Python program creates a Python program! The entire wordPatterns.py program is just one (very big) assignment statement for a variable named allPatterns. Even though this assignment statement stretches over many lines in the file, it is considered one “line of code” because Python knows that if a line ends with a comma but it is currently in the middle of a dictionary value, it ignores the indentation of the next line and just considers it part of the previous line. (This is a rare exception for Python’s significant indentation rules.)

The allPatterns variable contains a dictionary value where the keys are all the word patterns made from the English words in the dictionary file. The keys’ values are lists of strings of English words with that pattern. When wordPatterns.py is imported as a module, our program will be able to look up all the English words for any given word pattern.

After running the makeWordPatterns.py program to create the wordPatterns.py file, try typing the following into the interactive shell:

>>> import wordPatterns

>>> wordPatterns.allPatterns['0.1.2.1.1.3.4']

['BAZAARS', 'BESEECH', 'REDEEMS', 'STUTTER']

>>>

>>> wordPatterns.allPatterns['0.1.2.2.3.2.4.1.5.5']

['CANNONBALL']

>>>

>>> wordPatterns.allPatterns['0.1.0.1.0.1']

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

KeyError: '0.1.0.1.0.1'

>>>

>>> '0.1.0.1.0.1' in wordPatterns.allPatterns

False

>>>

The pattern '0.1.0.1.0.1' does not exist in the dictionary. This is why the expression wordPatterns.allPatterns['0.1.0.1.0.1'] causes an error (because there is no '0.1.0.1.0.1' key in allPatterns) and why '0.1.0.1.0.1' in wordPatterns.allPatterns evaluates to False.

How the Program Works

makeWordPatterns.py

1. # Makes the wordPatterns.py File

2. # http://inventwithpython.com/hacking (BSD Licensed)

4. # Creates wordPatterns.py based on the words in our dictionary

5. # text file, dictionary.txt. (Download this file from

6. # http://invpy.com/dictionary.txt)

The top part of this file has the usual comments describing what the program is.

The pprint.pprint() and pprint.pformat() Functions

makeWordPatterns.py

8. import pprint

The pprint module has functions for pretty printing values, which is useful for printing dictionary and list values on the screen. The print() function simply prints these values going left to right:

>>> print(someListOfListsVar))

[['ant'], ['baboon', 'badger', 'bat', 'bear', 'beaver'], ['camel', 'cat', 'clam', 'cobra', 'cougar', 'coyote', 'crow'], ['deer', 'dog', 'donkey', 'duck'], ['eagle'], ['ferret', 'fox', 'frog'], ['goat']]

The pprint module has a function named pprint(). The value passed to pprint.pprint() will be “pretty printed” to the screen so that it is easier to read:

>>> import pprint

>>> pprint.pprint(someListOfListsVar))

[['ant'],

['baboon', 'badger', 'bat', 'bear', 'beaver'],

['camel', 'cat', 'clam', 'cobra', 'cougar', 'coyote', 'crow'],

['deer', 'dog', 'donkey', 'duck'],

['eagle'],

['ferret', 'fox', 'frog'],

['goat']]

However, if you want to have this “prettified” text as a string value instead of displaying it on the screen, you can use the pprint.pformat() function, which returns the prettified string:

>>> import pprint

>>> prettifiedString = pprint.pformat(someListOfListsVar)

>>> print(prettifiedString)

[['ant'],

['baboon', 'badger', 'bat', 'bear', 'beaver'],

['camel', 'cat', 'clam', 'cobra', 'cougar', 'coyote', 'crow'],

['deer', 'dog', 'donkey', 'duck'],

['eagle'],

['ferret', 'fox', 'frog'],

['goat']]

>>>

When we write the value of allPatterns to the wordPatterns.py file, we will use the pprint module to prevent it from being printed crammed together all on one line.

Building Strings in Python with Lists

Almost all of our programs have done some form of “building a string” code. That is, a variable will start as a blank string and then new characters are added with string concatenation. (We’ve done this in many previous cipher programs with the translated variable.) This is usually done with the + operator to do string concatenation, as in the following short program:

# The slow way to build a string using string concatenation.

building = ''

for c in 'Hello world!':

building += c

print(building)

The above program loops through each character in the string 'Hello world!' and concatenates it to the end of the string stored in building. At the end of the loop, building holds the complete string.

This seems like a straightforward way to do this. However, it is very inefficient for Python to concatenate strings. The reasons are technical and beyond the scope of this book, but it is much faster to start with a blank list instead of a blank string, and then use the append() list method instead of string concatenation. After you are done building the list of strings, you can convert the list of strings to a single string value with the join() method. The following short program does exactly the same thing as the previous example, but faster:

# The fast way to build a string using a list, append(), and join().

building = []

for c in 'Hello world!':

building.append(c)

building = ''.join(building)

print(building)

Using this approach for building up strings in your code will result in much faster programs. We will be using this list-append-join process to build strings in the remaining programs of this book.

Calculating the Word Pattern

makeWordPatterns.py

11. def getWordPattern(word):

12. # Returns a string of the pattern form of the given word.

13. # e.g. '0.1.2.3.4.1.2.3.5.6' for 'DUSTBUSTER'

14. word = word.upper()

15. nextNum = 0

16. letterNums = {}

17. wordPattern = []

The getWordPattern() function takes one string argument and returns a string of that word’s pattern. For example, if getWordPattern() were passed the string 'Buffoon' as an argument then getWordPattern() would return the string '0.1.2.2.3.3.4'.

First, in order to make sure all the letters have the same case, line 14 changes the word parameter to an uppercase version of itself. We then need three variables:

· nextNum stores the next number used when a new letter is found.

· letterNums stores a dictionary with keys of single-letter strings of single letters, and values of the integer number for that letter. As we find new letters in the word, the letter and its number are stored in letterNums.

· wordPattern will be the string that is returned from this function. But we will be building this string one character at a time, so we will use the list-append-join process to do this. This is why wordPattern starts as a blank list instead of a blank string.

· makeWordPatterns.py

19. for letter in word:

20. if letter not in letterNums:

21. letterNums[letter] = str(nextNum)

22. nextNum += 1

Line 19’s for loop will loop through each character in the word parameter, assigning each character to a variable named letter.

Line 20 checks if letter has not been seen before by checking that letter does not exist as a key in the letterNums dictionary. (On the first iteration of the loop, the condition on line 20 will always be True because letterNums will be a blank dictionary that doesn’t have anything in it.)

If we have not seen this letter before, line 21 adds this letter as the key and the string form of nextNum as the key’s value to the letterNums dictionary. For the next new letter we find we want to use the next integer after the one currently in nextNum anymore, so line 22 increments the integer in nextNum by 1.

makeWordPatterns.py

23. wordPattern.append(letterNums[letter])

On line 23, letterNums[letter] evaluates to the integer used for the letter in the letter variable, so this is appended to the end of wordPattern. The letterNums dictionary is guaranteed to have letter for a key, because if it hadn’t, then lines 20 to 22 would have handled adding it to letterNums before line 23.

makeWordPatterns.py

24. return '.'.join(wordPattern)

After the for loop on line 19 is finished looping, the wordPattern list will contain all the strings of the complete word pattern. Our word patterns have periods separating the integers, so that we could tell the difference between “1.12” and “11.2”. To put these periods in between each of the strings in the wordPattern list, line 24 calls the join() method on the string '.'. This will evaluate to a string such as '0.1.2.2.3.3.4'. The completely-built string that join() returns will be the return value of getWordPattern().

The Word Pattern Program’s main() Function

makeWordPatterns.py

27. def main():

28. allPatterns = {}

The value stored in allPatterns is what we will write to the wordPatterns.py file. It is a dictionary whose keys are strings of word patterns (such as '0.1.2.3.0.4.5' or '0.1.1.2') and the keys’ values are a list of strings of English words that match that pattern. For example, here’s one of the key-value pairs that will end up in allPatterns:

'0.1.0.2.3.1.4': ['DEDUCER', 'DEDUCES', 'GIGABIT', 'RARITAN']

But at the beginning of the main() function on line 28, the allPatterns variable will start off as a blank dictionary value.

makeWordPatterns.py

30. fo = open('dictionary.txt')

31. wordList = fo.read().split('\n')

32. fo.close()

Lines 30 to 32 read in the contents of the dictionary file into wordList. Chapter 11 covered these file-related functions in more detail. Line 30 opens the dictionary.txt file in “reading” mode and returns a file object. Line 31 calls the file object’s read() method which returns a string of all text from this file. The rest of line 31 splits it up whenever there is a \n newline character, and returns a list of strings: one string per line in the file. This list value returned from split() is stored in the wordList variable. At this point we are done reading the file, so line 34 calls the file object’s close() method.

The wordList variable will contain a list of tens of thousands of strings. Since the dictionary.txt file has one English word per line of text, each string in the wordList variable will be one English word.

makeWordPatterns.py

34. for word in wordList:

35. # Get the pattern for each string in wordList.

36. pattern = getWordPattern(word)

The for loop on line 34 will iterate over each string in the wordList list and store it in the word variable. The word variable is passed to the getWordPattern() function, which returns a word pattern string for the string in word. The word pattern string is stored in a variable named pattern.

makeWordPatterns.py

38. if pattern not in allPatterns:

39. allPatterns[pattern] = [word]

40. else:

41. allPatterns[pattern].append(word)

There must be a value for the pattern key first before we can append word to allPatterns[pattern], otherwise this would cause an error. So, first line 38 will check if the pattern is not already in allPatterns. If pattern is not a key in allPatterns yet, line 39 creates a list with word in it to store in allPatterns[pattern].

If the pattern already is in allPatterns, we do not have to create the list. Line 41 will just append the word to the list value that is already there.

By the time the for loop that started on line 34 finishes, the allPatterns dictionary will contain the word pattern of each English word that was in wordList as its keys. Each of these keys has a value that is a list of the words that produce the word pattern. With our data organized this way, given a word pattern we can easily look up all the English words that produce that particular pattern.

makeWordPatterns.py

43. # This is code that writes code. The wordPatterns.py file contains

44. # one very, very large assignment statement.

45. fo = open('wordPatterns.py', 'w')

46. fo.write('allPatterns = ')

47. fo.write(pprint.pformat(allPatterns))

48. fo.close()

Now that we have this very large dictionary in allPatterns, we want to save it to a file on the hard drive. The last part of the main() function will create a file called wordPatterns.py which will just have one huge assignment statement in it.

Line 45 creates a new file by passing the 'wordPatterns.py' string for the filename and 'w' to indicate that this file will be opened in “write” mode. If there is already a file with the name 'wordPatterns.py', opening it in write mode will cause the file to be deleted to make way for the new file we are creating.

Line 46 starts the file off with 'allPatterns = ', which is the first part of the assignment statement. Line 47 finishes it by writing a prettified version of allPatterns to the file. Line 48 closes the file since we are done writing to it.

makeWordPatterns.py

51. if __name__ == '__main__':

52. main()

Lines 51 and 52 call the main() function if this program was run by itself (to create the wordPattern.py file) rather than imported by another program that wants to use its getWordPattern() function.

Hacking the Simple Substitution Cipher

The hacking program uses the abstract concepts of “word patterns” and “cipherletter mappings”. But don’t worry, in our Python program “word patterns” are represented by string values and “cipherletter mappings” are represented with dictionary values. The previous sections explained what word patterns are and how to generate them from a string. Cipherletter mappings are used in the hacking program to keep track of the possible letters that each of the 26 cipherletters could decrypt to. Go ahead and type in the source code for the simpleSubHacker.py program.

Source Code of the Simple Substitution Hacking Program

Source code for simpleSubHacker.py

1. # Simple Substitution Cipher Hacker

2. # http://inventwithpython.com/hacking (BSD Licensed)

4. import os, re, copy, pprint, pyperclip, simpleSubCipher, makeWordPatterns

6. if not os.path.exists('wordPatterns.py'):

7. makeWordPatterns.main() # create the wordPatterns.py file

8. import wordPatterns

10. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

11. nonLettersOrSpacePattern = re.compile('[^A-Z\s]')

12.

13. def main():

14. message = 'Sy l nlx sr pyyacao l ylwj eiswi upar lulsxrj isr sxrjsxwjr, ia esmm rwctjsxsza sj wmpramh, lxo txmarr jia aqsoaxwa sr pqaceiamnsxu, ia esmm caytra jp famsaqa sj. Sy, px jia pjiac ilxo, ia sr pyyacao rpnajisxu eiswi lyypcor l calrpx ypc lwjsxu sx lwwpcolxwa jp isr sxrjsxwjr, ia esmm lwwabj sj aqax px jia rmsuijarj aqsoaxwa. Jia pcsusx py nhjir sr agbmlsxao sx jisr elh. -Facjclxo Ctrramm'

15.

16. # Determine the possible valid ciphertext translations.

17. print('Hacking...')

18. letterMapping = hackSimpleSub(message)

19.

20. # Display the results to the user.

21. print('Mapping:')

22. pprint.pprint(letterMapping)

23. print()

24. print('Original ciphertext:')

25. print(message)

26. print()

27. print('Copying hacked message to clipboard:')

28. hackedMessage = decryptWithCipherletterMapping(message, letterMapping)

29. pyperclip.copy(hackedMessage)

30. print(hackedMessage)

31.

32.

33. def getBlankCipherletterMapping():

34. # Returns a dictionary value that is a blank cipherletter mapping.

35. return {'A': [], 'B': [], 'C': [], 'D': [], 'E': [], 'F': [], 'G': [], 'H': [], 'I': [], 'J': [], 'K': [], 'L': [], 'M': [], 'N': [], 'O': [], 'P': [], 'Q': [], 'R': [], 'S': [], 'T': [], 'U': [], 'V': [], 'W': [], 'X': [], 'Y': [], 'Z': []}

36.

37.

38. def addLettersToMapping(letterMapping, cipherword, candidate):

39. # The letterMapping parameter is a "cipherletter mapping" dictionary

40. # value that the return value of this function starts as a copy of.

41. # The cipherword parameter is a string value of the ciphertext word.

42. # The candidate parameter is a possible English word that the

43. # cipherword could decrypt to.

44.

45. # This function adds the letters of the candidate as potential

46. # decryption letters for the cipherletters in the cipherletter

47. # mapping.

48.

49. letterMapping = copy.deepcopy(letterMapping)

50. for i in range(len(cipherword)):

51. if candidate[i] not in letterMapping[cipherword[i]]:

52. letterMapping[cipherword[i]].append(candidate[i])

53. return letterMapping

54.

55.

56. def intersectMappings(mapA, mapB):

57. # To intersect two maps, create a blank map, and then add only the

58. # potential decryption letters if they exist in BOTH maps.

59. intersectedMapping = getBlankCipherletterMapping()

60. for letter in LETTERS:

61.

62. # An empty list means "any letter is possible". In this case just

63. # copy the other map entirely.

64. if mapA[letter] == []:

65. intersectedMapping[letter] = copy.deepcopy(mapB[letter])

66. elif mapB[letter] == []:

67. intersectedMapping[letter] = copy.deepcopy(mapA[letter])

68. else:

69. # If a letter in mapA[letter] exists in mapB[letter], add

70. # that letter to intersectedMapping[letter].

71. for mappedLetter in mapA[letter]:

72. if mappedLetter in mapB[letter]:

73. intersectedMapping[letter].append(mappedLetter)

74.

75. return intersectedMapping

76.

77.

78. def removeSolvedLettersFromMapping(letterMapping):

79. # Cipher letters in the mapping that map to only one letter are

80. # "solved" and can be removed from the other letters.

81. # For example, if 'A' maps to potential letters ['M', 'N'], and 'B'

82. # maps to ['N'], then we know that 'B' must map to 'N', so we can

83. # remove 'N' from the list of what 'A' could map to. So 'A' then maps

84. # to ['M']. Note that now that 'A' maps to only one letter, we can

85. # remove 'M' from the list of letters for every other

86. # letter. (This is why there is a loop that keeps reducing the map.)

87. letterMapping = copy.deepcopy(letterMapping)

88. loopAgain = True

89. while loopAgain:

90. # First assume that we will not loop again:

91. loopAgain = False

92.

93. # solvedLetters will be a list of uppercase letters that have one

94. # and only one possible mapping in letterMapping

95. solvedLetters = []

96. for cipherletter in LETTERS:

97. if len(letterMapping[cipherletter]) == 1:

98. solvedLetters.append(letterMapping[cipherletter][0])

99.

100. # If a letter is solved, than it cannot possibly be a potential

101. # decryption letter for a different ciphertext letter, so we

102. # should remove it from those other lists.

103. for cipherletter in LETTERS:

104. for s in solvedLetters:

105. if len(letterMapping[cipherletter]) != 1 and s in letterMapping[cipherletter]:

106. letterMapping[cipherletter].remove(s)

107. if len(letterMapping[cipherletter]) == 1:

108. # A new letter is now solved, so loop again.

109. loopAgain = True

110. return letterMapping

111.

112.

113. def hackSimpleSub(message):

114. intersectedMap = getBlankCipherletterMapping()

115. cipherwordList = nonLettersOrSpacePattern.sub('', message.upper()).split()

116. for cipherword in cipherwordList:

117. # Get a new cipherletter mapping for each ciphertext word.

118. newMap = getBlankCipherletterMapping()

119.

120. wordPattern = makeWordPatterns.getWordPattern(cipherword)

121. if wordPattern not in wordPatterns.allPatterns:

122. continue # This word was not in our dictionary, so continue.

123.

124. # Add the letters of each candidate to the mapping.

125. for candidate in wordPatterns.allPatterns[wordPattern]:

126. newMap = addLettersToMapping(newMap, cipherword, candidate)

127.

128. # Intersect the new mapping with the existing intersected mapping.

129. intersectedMap = intersectMappings(intersectedMap, newMap)

130.

131. # Remove any solved letters from the other lists.

132. return removeSolvedLettersFromMapping(intersectedMap)

133.

134.

135. def decryptWithCipherletterMapping(ciphertext, letterMapping):

136. # Return a string of the ciphertext decrypted with the letter mapping,

137. # with any ambiguous decrypted letters replaced with an _ underscore.

138.

139. # First create a simple sub key from the letterMapping mapping.

140. key = ['x'] * len(LETTERS)

141. for cipherletter in LETTERS:

142. if len(letterMapping[cipherletter]) == 1:

143. # If there's only one letter, add it to the key.

144. keyIndex = LETTERS.find(letterMapping[cipherletter][0])

145. key[keyIndex] = cipherletter

146. else:

147. ciphertext = ciphertext.replace(cipherletter.lower(), '_')

148. ciphertext = ciphertext.replace(cipherletter.upper(), '_')

149. key = ''.join(key)

150.

151. # With the key we've created, decrypt the ciphertext.

152. return simpleSubCipher.decryptMessage(key, ciphertext)

153.

154.

155. if __name__ == '__main__':

156. main()

Hacking the Simple Substitution Cipher (in Theory)

Hacking the simple substitution cipher is pretty easy. The five steps are:

1. Find the word pattern for each cipherword in the ciphertext.

2. Find the list of English word candidates that each cipherword could decrypt to.

3. Create one cipherletter mapping for each cipherword using the cipherword’s list of candidates. (A cipherletter mapping is just a dictionary value.)

4. Intersect each of the cipherletter mappings into a single intersected cipherletter mapping.

5. Remove any solved letters from the intersected cipherletter mapping.

The more cipher words that are in the ciphertext, the more cipherletter mappings we have that can be intersected. The more cipherletter mappings we intersect together, the fewer the number of potential decryption letters there will be for each cipher letter. This means that the longer the ciphertext message, the more likely we are to hack and decrypt it.

Explore the Hacking Functions with the Interactive Shell

We’ve already described the steps used to hack a simple substitution encrypted message by using word patterns. Before we learn how the code in these functions works, let’s use the interactive shell to call them and see what values they return depending on what arguments we pass them.

Here is the example we will hack: OLQIHXIRCKGNZ PLQRZKBZB MPBKSSIPLC

The getBlankCipherletterMapping() function returns a cipherletter mapping. A cipherletter mapping is just a dictionary with 26 keys of uppercase single-letter strings and values of lists of single-letter uppercase strings like 'A' or 'Q'. We will store this blank cipherletter mapping in a variable named letterMapping1. Try typing the following into the interactive shell:

>>> letterMapping1 = simpleSubHacker.getBlankCipherletterMapping()

>>> letterMapping1

{'A': [], 'C': [], 'B': [], 'E': [], 'D': [], 'G': [], 'F': [], 'I': [], 'H': [], 'K': [], 'J': [], 'M': [], 'L': [], 'O': [], 'N': [], 'Q': [], 'P': [], 'S': [], 'R': [], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': [], 'Z': []}

>>>

Let’s start hacking the first cipherword, OLQIHXIRCKGNZ. First we will need to get the word pattern for this cipherword by calling the makeWordPattern module’s getWordPattern() function. Try typing the following into the interactive shell:

>>> import makeWordPatterns

>>> wordPat = makeWordPatterns.getWordPattern('OLQIHXIRCKGNZ')

>>> wordPat

0.1.2.3.4.5.3.6.7.8.9.10.11

>>>

To figure out which English words in the dictionary have the word pattern 0.1.2.3.4.5.3.6.7.8.9.10.11 (that is, to figure out the candidates for the cipherword OLQIHXIRCKGNZ) we will import the wordPatterns module and look up this pattern. Try typing the following into the interactive shell:

>>> import wordPatterns

>>> candidates = wordPatterns.allPatterns['0.1.2.3.4.5.3.6.7.8.9.10.11']

>>> candidates

['UNCOMFORTABLE', 'UNCOMFORTABLY']

>>>

There are two English words that OLQIHXIRCKGNZ could decrypt to (that is, only two English words that have the same word pattern that OLQIHXIRCKGNZ does): UNCOMFORTABLE and UNCOMFORTABLY. (It’s also possible that the cipherword decrypts to a word that does not exist in our dictionary, but we will just have to assume that’s not the case.) We need to create a cipherletter mapping that has the cipherletters in OLQIHXIRCKGNZ map to letters in UNCOMFORTABLE and UNCOMFORTABLY as potential decryption letters. That is, O maps to U, L maps to N, Q maps to C, and so on. Z will map to two different letters: E and Y.

We can do this with the addLettersToMapping() function. We will need to pass it our (currently blank) cipherletter mapping in letterMapping1, the string 'OLQIHXIRCKGNZ', and the string 'UNCOMFORTABLE' (which is the first string in the candidates list). Try typing the following into the interactive shell:

>>> letterMapping1 = simpleSubHacker.addLettersToMapping(letterMapping1, 'OLQIHXIRCKGNZ', candidates[0])

>>> letterMapping1

{'A': [], 'C': ['T'], 'B': [], 'E': [], 'D': [], 'G': ['B'], 'F': [], 'I': ['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M': [], 'L': ['N'], 'O': ['U'], 'N': ['L'], 'Q': ['C'], 'P': [], 'S': [], 'R': ['R'], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': ['F'], 'Z': ['E']}

>>>

From the letterMapping1 value, you can see that the letters in OLQIHXIRCKGNZ map to the letters in UNCOMFORTABLE: 'O' maps to ['U'], 'L' maps to ['N'], 'Q' maps to ['C'], and so on.

But since the letters in OLQIHXIRCKGNZ could also possibly decrypt to UNCOMFORTABLY, we also need to add UNCOMFORTABLY to the cipherletter mapping. Try typing the following into the interactive shell:

>>> letterMapping1 = simpleSubHacker.addLettersToMapping(letterMapping1, 'OLQIHXIRCKGNZ', candidates[1])

>>> letterMapping1

>>>

You’ll notice that not much has changed in letterMapping1. The cipherletter mapping in letterMapping1 now has 'Z' map to both 'E' and 'Y'. That’s because the candidates for OLQIHXIRCKGNZ (that is, UNCOMFORTABLE and UNCOMFORTABLY) are very similar to each other and addLettersToMapping() only adds the letter to the list if the letter is not already there. This is why 'O' maps to ['U'] instead of ['U', 'U'].

We now have a cipherletter mapping for the first of the three cipherwords. We need to get a new mapping for the second cipherword, PLQRZKBZB. Call getBlankCipherletterMapping() and store the returned dictionary value in a variable named letterMapping2. Get the word pattern for PLQRZKBZB and use it to look up all the candidates in wordPatterns.allPatterns. This is done by typing the following into the interactive shell:

>>> letterMapping2 = simpleSubHacker.getBlankCipherletterMapping()

>>> wordPat = makeWordPatterns.getWordPattern('PLQRZKBZB')

>>> candidates = wordPatterns.allPatterns[wordPat]

>>> candidates

['CONVERSES', 'INCREASES', 'PORTENDED', 'UNIVERSES']

>>>

Instead of typing out four calls to addLettersToMapping() for each of these four candidate words, we can write a for loop that will go through the list in candidates and call addLettersToMapping() each time.

>>> for candidate in candidates:

... letterMapping2 = simpleSubHacker.addLettersToMapping(letterMapping2, 'PLQRZKBZB', candidate)

...

>>> letterMapping2

{'A': [], 'C': [], 'B': ['S', 'D'], 'E': [], 'D': [], 'G': [], 'F': [], 'I': [], 'H': [], 'K': ['R', 'A', 'N'], 'J': [], 'M': [], 'L': ['O', 'N'], 'O': [], 'N': [], 'Q': ['N', 'C', 'R', 'I'], 'P': ['C', 'I', 'P', 'U'], 'S': [], 'R': ['V', 'R', 'T'], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': [], 'Z': ['E']}

>>>

This finishes the cipherletter mapping for our second cipherword. Now we need to get the intersection of the cipherletter mappings in letterMapping1 and letterMapping2 by passing them to intersectMappings(). Try typing the following into the interactive shell:

>>> intersectedMapping = simpleSubHacker.intersectMappings(letterMapping1, letterMapping2)

>>> intersectedMapping

{'A': [], 'C': ['T'], 'B': ['S', 'D'], 'E': [], 'D': [], 'G': ['B'], 'F': [], 'I': ['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M': [], 'L': ['N'], 'O': ['U'], 'N': ['L'], 'Q': ['C'], 'P': ['C', 'I', 'P', 'U'], 'S': [], 'R': ['R'], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': ['F'], 'Z': ['E']}

>>>

The intersected mapping is just a cipherletter mapping. The list of potential decryption letters for any cipherletter in the intersected mapping will only be the potential decryption letters that were in the cipherletter’s list in both letterMapping1 and letterMapping2.

For example, this is why intersectedMapping’s list for the 'Z' key is just ['E']: because letterMapping1 had ['E', 'Y'] but letterMapping2 had ['E']. The intersection of ['E', 'Y'] and ['E'] is just the potential decryption letters that exist in both mappings: ['E']

There is an exception. If one of the mapping’s lists was blank, then all of the potential decryption letters in the other mapping are put into the intersected mapping. This is because in our program a blank map represents any possible letter can be used since nothing is known about the mapping.

Then we do all these steps for the third cipherword, MPBKSSIPLC. Try typing the following into the interactive shell:

>>> letterMapping3 = simpleSubHacker.getBlankCipherletterMapping()

>>> wordPat = makeWordPatterns.getWordPattern('MPBKSSIPLC')

>>> candidates = wordPatterns.allPatterns[wordPat]

>>> candidates

['ADMITTEDLY', 'DISAPPOINT']

>>> for i in range(len(candidates)):

... letterMapping3 = simpleSubHacker.addLettersToMapping(letterMapping3, 'MPBKSSIPLC', candidates[i])

...

>>> letterMapping3

{'A': [], 'C': ['Y', 'T'], 'B': ['M', 'S'], 'E': [], 'D': [], 'G': [], 'F': [], 'I': ['E', 'O'], 'H': [], 'K': ['I', 'A'], 'J': [], 'M': ['A', 'D'], 'L': ['L', 'N'], 'O': [], 'N': [], 'Q': [], 'P': ['D', 'I'], 'S': ['T', 'P'], 'R': [], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': [], 'Z': []}

We intersect letterMapping3 with intersectedMapping. This also ends up indirectly intersecting letterMapping3 with letterMapping1 and letterMapping2, since intersectedMapping is currently the intersection of letterMapping1 and letterMapping2. Try typing the following into the interactive shell:

>>> intersectedMapping = simpleSubHacker.intersectMappings(intersectedMapping, letterMapping3)

>>> intersectedMapping

{'A': [], 'C': ['T'], 'B': ['S'], 'E': [], 'D': [], 'G': ['B'], 'F': [], 'I': ['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M': ['A', 'D'], 'L': ['N'], 'O': ['U'], 'N': ['L'], 'Q': ['C'], 'P': ['I'], 'S': ['T', 'P'], 'R': ['R'], 'U': [], 'T': [], 'W': [], 'V': [], 'Y': [], 'X': ['F'], 'Z': ['E']}

>>>

We can now pass the intersected cipherletter mapping to decryptWithCipherletterMapping() to decrypt the ciphertext. Try typing the following into the interactive shell:

>>> simpleSubHacker.decryptWithCipherletterMapping('OLQIHXIRCKGNZ PLQRZKBZB MPBKSSIPLC', intersectedMapping)

UNCOMFORTABLE INCREASES _ISA__OINT

>>>

The intersected mapping is not yet complete. Notice how the intersected mapping has a solution for the cipherletter K, because the key 'K'’s value to a list with just one string in it: ['A']. Because we know that the K cipherletters will decrypt to A, no other cipherletter can possibly decrypt to A.

In the intersected mapping, the cipherletter M maps to ['A', 'D']. This means that judging from the candidates for the cipherwords in our encrypted message, the cipherletter M could decrypt to A or D.

But since we know K decrypts to A, we can remove A from the list of potential decryption letters for cipherletter M. This shortens the list down to just ['D']. Because this new list only has one string in it, we’ve also solved the cipherletter M!

The removeSolvedLettersFromMapping() function takes a cipherletter mapping and removes these solved potential decryption letters from the other cipherletters’ lists. Try typing the following into the interactive shell:

>>> letterMapping = simpleSubHacker.removeSolvedLettersFromMapping(letterMapping)

>>> intersectedMapping

{'A': [], 'C': ['T'], 'B': ['S'], 'E': [], 'D': [], 'G': ['B'], 'F': [], 'I': ['O'], 'H': ['M'], 'K': ['A'], 'J': [], 'M

': ['D'], 'L': ['N'], 'O': ['U'], 'N': ['L'], 'Q': ['C'], 'P': ['I'], 'S': ['P'], 'R': ['R'], 'U': [], 'T': [], 'W': [],

'V': [], 'Y': [], 'X': ['F'], 'Z': ['E']}

>>>

Now when we pass the intersected mapping to decryptWithCipherletterMapping(), it gives us the full solution. Try typing the following into the interactive shell:

>>> simpleSubHacker.decryptWithCipherletterMapping('OLQIHXIRCKGNZ PLQRZKBZB MPBKSSIPLC', intersectedMapping)

UNCOMFORTABLE INCREASES DISAPPOINT

>>>

The ciphertext OLQIHXIRCKGNZ PLQRZKBZB MPBKSSIPLC decrypts to the message, “Uncomfortable increases disappoint”.

This is a rather short ciphertext to hack. Normally the encrypted messages we hack will be much longer. (Messages as short as our example usually cannot be hacked with our word pattern method.) We’ll have to create a cipherletter mapping for each cipherword in these longer messages and then intersect them all together, which is exactly what the hackSimpleSub() function does.

Now that we know the basic steps and what each function does, let’s learn how the code in these functions work.

How the Program Works

simpleSubHacker.py

1. # Simple Substitution Cipher Hacker

2. # http://inventwithpython.com/hacking (BSD Licensed)

The comments at the top of the source code explain what the program is.

Import All the Things

simpleSubHacker.py

4. import os, re, copy, pprint, pyperclip, simpleSubCipher, makeWordPatterns

Our simple substitution hacking program imports eight different modules, more than any other program so far. By reusing the code in these modules, our hacking program becomes much shorter and easier to write.

The re module is a module we haven’t seen before. This is the regular expression module which lets our code do sophisticated string manipulation. Regular expressions are explained in the next section.

simpleSubHacker.py

6. if not os.path.exists('wordPatterns.py'):

7. makeWordPatterns.main() # create the wordPatterns.py file

8. import wordPatterns

The simple substitution cipher also needs the wordPatterns module. The .py file for this module is created when the makeWordPatterns.py program is run. But makeWordPatterns.py might not have been run before our hacking program has. In this case, our hacking program checks if this file exists on line 6 and if it doesn’t, the makeWordPatterns.main() function is called.

Remember, the main() function is the function that is run in our programs when they are run as programs (rather than just imported with an import statement.) When we imported the makeWordPatterns module on line 4, the main() function in makeWordPatterns.py was not run. Since main() is the function that creates the wordPatterns.py file, we will call makeWordPatterns.main() if wordPatterns.py does not yet exist.

Either way, by the time the program execution reaches line 8, the wordPatterns module will exist and can be imported.

A Brief Intro to Regular Expressions and the sub() Regex Method

simpleSubHacker.py

10. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

11. nonLettersOrSpacePattern = re.compile('[^A-Z\s]')

The simple substitution hacking program will have a LETTERS global variable like many of our previous cipher programs.

The re.compile() function is new. This function compiles (that is, creates) a new regular expression pattern object, or “regex object” or “pattern object” for short. Regular expressions are strings that define a specific pattern that matches certain strings. Regular expressions can do many special things with strings that are beyond the scope of this book, but you can learn about them at http://invpy.com/regex.

The string '[^A-Za-z\s]' is a regular expression that matches any character that is not a letter from A to Z or a “whitespace” character (e.g. a space, tab, or newline character). The pattern object has a sub() method (short for “substitute”) that works very similar to the replace() string method. The first argument to sub() is the string that replaces any instances of the pattern in the second string argument. Try typing the following into the interactive shell:

>>> pat = re.compile('[^A-Z\s]')

>>> pat.sub('abc', 'ALL! NON!LETTERS? AND123 NONSPACES. REPLACED')

'ALLabc NONabcLETTERSabc ANDabcabcabc NONSPACESabc REPLACED'

>>> pat.sub('', 'ALL! NON!LETTERS? AND123 NONSPACES. REPLACED')

'ALL NONLETTERS AND NONSPACES REPLACED'

>>>

There are many sophisticated string manipulations you can perform if you learn more about regular expressions, but we will only use them in this book to remove characters from a string that are not uppercase letters or spaces.

The Hacking Program’s main() Function

simpleSubHacker.py

13. def main():

15.

16. # Determine the possible valid ciphertext translations.

17. print('Hacking...')

18. letterMapping = hackSimpleSub(message)

Like all our previous hacking programs, the main() function will store the ciphertext to be hacked in a variable named message. We will pass this variable to the hackSimpleSub() function. However, unlike our previous hacking programs, the hacking function will not return a string of the decrypted message (or None if it was unable to decrypt it).

Instead, hackSimpleSub() will return a cipherletter mapping (specifically, an intersected cipherletter mapping that had the solved letters removed, like the kind we made in our interactive shell exercise). This returned cipherletter mapping will be passed to decryptWithCipherletterMapping() to decrypt the ciphertext in message.

Partially Hacking the Cipher

simpleSubHacker.py

20. # Display the results to the user.

21. print('Mapping:')

22. pprint.pprint(letterMapping)

23. print()

Since the cipherletter mapping stored in letterMapping is a dictionary, we can use the pprint.pprint() “pretty print” function to display it on the screen. It will look something like this:

{'A': ['E'],

'B': ['B', 'W', 'P'],

'C': ['R'],

'D': [],

'E': ['K', 'W'],

'F': ['B', 'P'],

'G': ['B', 'Q', 'X', 'Y', 'P', 'W'],

'H': ['B', 'K', 'P', 'W', 'X', 'Y'],

'I': ['H'],

'J': ['T'],

'K': [],

'L': ['A'],

'M': ['L'],

'N': ['M'],

'O': ['D'],

'P': ['O'],

'Q': ['V'],

'R': ['S'],

'S': ['I'],

'T': ['U'],

'U': ['G'],

'V': [],

'W': ['C'],

'X': ['N'],

'Y': ['F'],

'Z': ['Z']}

In the above example, the cipherletters A, C, I, J, L, M, N, O, P, Q, R, S, T, U, X, Y, and Z all have one and only one potential decryption letter. These cipher letters have been solved. The decryptWithCipherletterMapping() function, explained later, will print underscores for any cipherletters that have not been solved (that is, B, D, E, F, G, H, K, and V.)

simpleSubHacker.py

24. print('Original ciphertext:')

25. print(message)

26. print()

First the original encrypted message is displayed on the screen so the programmer can compare it to the decryption.

simpleSubHacker.py

27. print('Copying hacked message to clipboard:')

28. hackedMessage = decryptWithCipherletterMapping(message, letterMapping)

29. pyperclip.copy(hackedMessage)

30. print(hackedMessage)

Next the decrypted message is returned from the decryptWithCipherletterMapping() function on line 28. This hacked message is copied to the clipboard on line 29 and printed to the screen on line 30.

Next, let’s look at all the functions that are called by main().

Blank Cipherletter Mappings

simpleSubHacker.py

33. def getBlankCipherletterMapping():

34. # Returns a dictionary value that is a blank cipherletter mapping.

Our program will need a cipherletter mapping for each cipherword in the ciphertext, so we will create the getBlankCipherletterMapping() function which can return a new, blank mapping when called.

Adding Letters to a Cipherletter Mapping

simpleSubHacker.py

38. def addLettersToMapping(letterMapping, cipherword, candidate):

The addLettersToMapping() function attempts to make sure that every letter in the candidate can be mapped to a letter in the cipherword. It checks over each letter in candidate and adds its corresponding letter in cipherword to letterMapping if it wasn't already there.

For example, if 'PUPPY' is our candidate word for the 'HGHHU' cipherword, the addLettersToMapping() function will change letterMapping so that the key 'H' has 'P' added to its list of potential decryption letters. Then the function will change the key 'G' so that its list has 'U' appended to it.

If the letter is already in the list of potential decryption letters, the addLettersToMapping() will not add a letter to the list. We can skip adding 'P' to the 'H' key the next two times since it’s already been done. Finally, the function will change the key 'U' so that it has 'Y' in its list of potential decryption letters.

The code in this function assumes that len(cipherword) is the same as len(candidate).

simpleSubHacker.py

49. letterMapping = copy.deepcopy(letterMapping)

To avoid changing the original dictionary value passed for the letterMapping parameter, line 49 will copy the dictionary in letterMapping and make this copy the new value in letterMapping. (We have to do this because letterMapping was passed a copy of a dictionary reference value, instead of a copy of the dictionary value. See the “List Reference” section in Chapter 10 for an explanation of references.)

simpleSubHacker.py

50. for i in range(len(cipherword)):

Line 50 will iterate over each index in the string in cipherword. We need the index (which is stored in the variable i) because the potential decryption letter to be added will be candidate[i] for the cipherletter cipherword[i].

simpleSubHacker.py

51. if candidate[i] not in letterMapping[cipherword[i]]:

52. letterMapping[cipherword[i]].append(candidate[i])

The if statement on line 51 checks that the potential decryption letter is not already in the list of potential decryption letters for the cipherletter. This prevents the list of potential decryption letters in the cipherletter mapping from having duplicate letters in it. For example, we never want the list to be a value like ['U', 'U'].

Line 52 adds the new potential decryption letter (that is, candidate[i]) to the list of potential decryption letters in the cipherletter mapping (that is, letterMapping[cipherword[i]]).

simpleSubHacker.py

53. return letterMapping

After looping through all the indexes in cipherword, the additions to the cipherletter mapping are complete and the dictionary in letterMapping is returned.

Intersecting Two Letter Mappings

simpleSubHacker.py

56. def intersectMappings(mapA, mapB):

57. # To intersect two maps, create a blank map, and then add only the

58. # potential decryption letters if they exist in BOTH maps.

59. intersectedMapping = getBlankCipherletterMapping()

60. for letter in LETTERS:

The intersectMappings() function will return a new cipherletter mapping that is an intersection of the two cipherletter mappings passed for the mapA and mapB parameters. Line 59 creates a new cipherletter mapping by calling getBlankCipherletterMapping() and storing the returned value in the intersectedMapping variable.

The for loop will loop through each of the uppercase letters in the LETTERS constant variable, and the letter variable can be used for the keys of the mapA and mapB dictionaries.

simpleSubHacker.py

62. # An empty list means "any letter is possible". In this case just

63. # copy the other map entirely.

64. if mapA[letter] == []:

65. intersectedMapping[letter] = copy.deepcopy(mapB[letter])

66. elif mapB[letter] == []:

67. intersectedMapping[letter] = copy.deepcopy(mapA[letter])

If the list of potential decryption letters for a cipherletter in a cipherletter mapping is blank, this means that this cipherletter could potentially decrypt to any letter. In this case, the intersected cipherletter mapping will just be a copy of the other mapping’s list of potential decryption letters.

That is, if mapA’s list of potential decryption letters is blank, then set the intersected mapping’s list to be a copy of mapB’s list. Or if mapB’s list is blank, then set the intersected mapping’s list to be a copy of mapA’s list.

(If both mappings’ lists were blank, then line 65 will simply copy mapB’s blank list to the intersected mapping. This is the behavior we want: if both lists are blank then the intersected mapping will have a blank list.)

simpleSubHacker.py

68. else:

69. # If a letter in mapA[letter] exists in mapB[letter], add

70. # that letter to intersectedMapping[letter].

71. for mappedLetter in mapA[letter]:

72. if mappedLetter in mapB[letter]:

73. intersectedMapping[letter].append(mappedLetter)

The else block handles the case where neither mapA nor mapB are blank. In this case, line 71 loops through the uppercase letter strings in the list at mapA[letter]. Line 72 checks if this uppercase letter in mapA[letter] also exists in the list of uppercase letter strings in mapB[letter]. If it does, then line 73 will add this common letter to the list of potential decryption letters at intersectedMapping[letter].

simpleSubHacker.py

75. return intersectedMapping

Once the for loop that started on line 60 has finished, the cipherletter mapping in intersectedMapping will only have the potential decryption letters that exist in the lists of potential decryption letters of both mapA and mapB. This completely intersected cipherletter mapping is returned on line 75.

Removing Solved Letters from the Letter Mapping

simpleSubHacker.py

78. def removeSolvedLettersFromMapping(letterMapping):

79. # Cipher letters in the mapping that map to only one letter are

80. # "solved" and can be removed from the other letters.

81. # For example, if 'A' maps to potential letters ['M', 'N'], and 'B'

82. # maps to ['N'], then we know that 'B' must map to 'N', so we can

83. # remove 'N' from the list of what 'A' could map to. So 'A' then maps

84. # to ['M']. Note that now that 'A' maps to only one letter, we can

85. # remove 'M' from the list of potential letters for every other

86. # key. (This is why there is a loop that keeps reducing the map.)

87. letterMapping = copy.deepcopy(letterMapping)

88. loopAgain = True

The removeSolvedLettersFromMapping() function searches for any cipherletters in the letterMapping parameter which have one and only one potential decryption letter. These cipherletters are considered solved: the cipherletter must decrypt to that one potential decryption letter. This means that any other cipherletters that have this solved letter can have that letter removed from their lists of potential decryption letters.

This could cause a chain reaction, because when the one potential decryption letter is removed from other lists of potential decryption letters, it could result in a new solved cipherletter. In that case, the program will loop and perform the solved letter removal over the whole cipherletter mapping again.

The cipherletter mapping in letterMapping is copied on line 87 so that changes made to it in the function do not affect the dictionary value outside the function. Line 88 creates loopAgain, which is a variable that holds a Boolean value that tells us if the code found a new solved letter and needs to loop again. In that case the loopAgain variable is set to True on line 88 so that the program execution will enter the while loop on line 89.

simpleSubHacker.py

89. while loopAgain:

90. # First assume that we will not loop again:

91. loopAgain = False

At the very beginning of the loop, line 91 will set loopAgain to False. The code assumes that this will be the last iteration through line 89’s while loop. The loopAgain variable is only set to True if we find a new solved cipherletter during this iteration.

simpleSubHacker.py

93. # solvedLetters will be a list of uppercase letters that have one

94. # and only one possible mapping in letterMapping

95. solvedLetters = []

96. for cipherletter in LETTERS:

97. if len(letterMapping[cipherletter]) == 1:

98. solvedLetters.append(letterMapping[cipherletter][0])

The next part of the code creates a list of cipherletters that have exactly one potential decryption letter. We will put these cipherletter strings in a list that is in solvedLetters. The solvedLetters variable starts off as a blank list on line 95.

The for loop on line 96 goes through all 26 possible cipherletters and looks at the cipherletter mapping’s list of potential decryption letters for that cipherletter. (That is, the list is at letterMapping[cipherletter].)

If the length of this list is 1 (which is checked on line 97), then we know that there is only one letter that the cipherletter could decrypt to and the cipherletter is solved. We will add the letter (the potential decryption letter, not the cipherletter) to the solvedLetters list on line 98. The solved letter will always be at letterMapping[cipherletter][0] because letterMapping[cipherletter] is a list of potential decryption letters that only has one string value in it at index 0 of the list.

simpleSubHacker.py

100. # If a letter is solved, than it cannot possibly be a potential

101. # decryption letter for a different ciphertext letter, so we

102. # should remove it from those other lists.

103. for cipherletter in LETTERS:

104. for s in solvedLetters:

105. if len(letterMapping[cipherletter]) != 1 and s in letterMapping[cipherletter]:

106. letterMapping[cipherletter].remove(s)

After the previous for loop that started on line 96 has finished, the solvedLetters variable will be a list of all the letters that are solved decryptions of a cipherletter. The for loop on line 103 loops through all 26 possible cipherletters and looks at the cipherletter mapping’s list of potential decryption letters.

For each cipherletter that is examined, the letters in solvedLetters are looped through on line 104 to check if each of them exist in the list of potential decryption letters for letterMapping[cipherletter]. Line 105 checks if a list of potential decryption letters is not solved (that is, if len(letterMapping[cipherletter]) != 1) and the solved letter exists in the list of potential decryption letters. If this condition is True, then the solved letter in s is removed from the list of potential decryption letters on line 106.

simpleSubHacker.py

107. if len(letterMapping[cipherletter]) == 1:

108. # A new letter is now solved, so loop again.

109. loopAgain = True

If by chance this removal caused the list of potential decryption letters to now only have one letter in it, then the loopAgain variable is set to True on line 109 so that the code will check for this new solved letter in the cipherletter mapping on the next iteration.

simpleSubHacker.py

110. return letterMapping

After the code in line 89’s while loop has gone through a full iteration without loopAgain being set to True, the program execution goes past the loop and returns the cipherletter mapping stored in letterMapping.

Hacking the Simple Substitution Cipher

simpleSubHacker.py

113. def hackSimpleSub(message):

114. intersectedMap = getBlankCipherletterMapping()

Now that we’ve created the getBlankCipherletterMapping(), addLettersToMapping(), intersectMappings(), and removeSolvedLettersFromMapping() functions that can manipulate the cipherletter mappings we pass them, we can use them all together to hack a simple substitution message.

Remember the steps from our interactive shell exercise for hacking a simple substitution cipher message: for each cipherword, get all the candidates based on the cipherword’s word pattern, then add these candidates to a cipherletter mapping. Then take the cipherletter mapping for each cipherword and intersect them together.

The intersectedMap variable will hold the intersected cipherletter mapping of each cipherword’s cipherletter mapping. At the beginning of the hackSimpleSub() function, it will start as a blank cipherletter mapping.

simpleSubHacker.py

115. cipherwordList = nonLettersOrSpacePattern.sub('', message.upper()).split()

The sub() regex method will substitute (that is, replace) any occurrences of the string pattern in the second argument (message.upper()) with the first argument (a blank string). Regular expressions and the sub() method were explained earlier in this chapter.

On line 115, the regex object in nonLettersOrSpacePattern matches any string that is not a letter or whitespace character. The sub() method will return a string that is the message variable with all non-letter and non-space characters replaced by a blank string. This effectively returns a string that has all punctuation and number characters removed from message.

This string then has the upper() method called on it to return an uppercase version of the string, and that string has the split() method called on it to return the individual words in the string in a list. To see what each part of line 115 does, type the following into the interactive shell:

>>> import re

>>> nonLettersOrSpacePattern = re.compile('[^A-Z\s]')

>>> message = 'Hello, this is my 1st test message.'

>>> message = nonLettersOrSpacePattern.sub('', message.upper())

>>> message

'HELLO THIS IS MY ST TEST MESSAGE'

>>> cipherwordList = message.split()

>>> cipherwordList

['HELLO', 'THIS', 'IS', 'MY', 'ST', 'TEST', 'MESSAGE']

After line 115 executes, the cipherwordList variable will contain a list of uppercase strings of the individual words that were previously in message.

simpleSubHacker.py

116. for cipherword in cipherwordList:

117. # Get a new cipherletter mapping for each ciphertext word.

118. newMap = getBlankCipherletterMapping()

The for loop on line 116 will assign each string in the message list to the cipherword variable. Inside this loop we will get the cipherword’s candidates, add the candidates to a cipherletter mapping, and then intersect this mapping with intersectedMap.

First, line 118 will get a new, blank cipherletter mapping from getBlankCipherletterMapping() and store it in the newMap variable.

simpleSubHacker.py

120. wordPattern = makeWordPatterns.getWordPattern(cipherword)

121. if wordPattern not in wordPatterns.allPatterns:

122. continue # This word was not in our dictionary, so continue.

To find the candidates for the current cipherword, we call getWordPattern() in the makeWordPatterns module on line 120. If the word pattern of the cipherword does not exist in the keys of the wordPatterns.allPatterns dictionary, then whatever the cipherword decrypts to does not exist in our dictionary file. In that case the continue statement on line 122 will skip back to line 116, to the next cipherword in the list.

simpleSubHacker.py

124. # Add the letters of each candidate to the mapping.

125. for candidate in wordPatterns.allPatterns[wordPattern]:

126. newMap = addLettersToMapping(newMap, cipherword, candidate)

On line 125, we know the word pattern exists in wordPatterns.allPatterns. The values in the allPatterns dictionary are lists of strings of the English words with the pattern in wordPattern. Since it is a list, we can use a for loop to iterate over this list. The variable candidate will be set to each of these English word strings on each iteration of the loop.

The only line inside line 125’s for loop is the call to addLettersToMapping() on line 126. We will use this to update the cipherletter mapping in newMap with the letters in each of the candidates.

simpleSubHacker.py

128. # Intersect the new mapping with the existing intersected mapping.

129. intersectedMap = intersectMappings(intersectedMap, newMap)

Once all of the letters in the candidates are added to the cipherletter mapping in newMap, line 129 will intersect newMap with intersectedMap, and make the return value the new value of intersectedMap.

At this point the program execution jumps back to the beginning of the for loop on line 116 to run the code on the next cipherword in the cipherwordList list.

simpleSubHacker.py

131. # Remove any solved letters from the other lists.

132. return removeSolvedLettersFromMapping(intersectedMap)

Once we have the final intersected cipherletter mapping, we can remove any solved letters from it by passing it to removeSolvedLettersFromMapping(). The cipherletter mapping that is returned from the function will be the return value for hackSimpleSubstitution().

Creating a Key from a Letter Mapping

simpleSubHacker.py

135. def decryptWithCipherletterMapping(ciphertext, letterMapping):

136. # Return a string of the ciphertext decrypted with the letter mapping,

137. # with any ambiguous decrypted letters replaced with an _ underscore.

138.

139. # First create a simple sub key from the letterMapping mapping.

140. key = ['x'] * len(LETTERS)

Since the simpleSubstitutionCipher.decryptMessage() function only decrypts with keys instead of letter mappings, we need the decryptWithCipherletterMapping() function to convert a letter mapping into a string key.

The simple substitution keys are strings of 26 characters. The character at index 0 in the key string is the substitution for A, the character at index 1 is the substitution for B, and so on.

Since the letter mapping might only have solutions for some of the letters, we will start out with a key of ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x']. This list is created on line 140 by using list replication to replicate the list ['x'] 26 times. Since LETTERS is a string of the letters of the alphabet, len(LETTERS) evaluates to 26. When the multiplication operator is used on a list and integer, it does list replication.

We don’t have to use 'x', we can use any lowercase letter. The reason we need to use a lowercase letter is because it acts as a “placeholder” for the simple substitution key. The way simpleSubCipher.py works, since LETTERS only contains uppercase letters, any lowercase letters in the key will not be used to decrypt a message.

The 26-item list in key will be joined together into a 26-character string at the end of the decryptWithCipherletterMapping() function.

simpleSubHacker.py

141. for cipherletter in LETTERS:

142. if len(letterMapping[cipherletter]) == 1:

143. # If there's only one letter, add it to the key.

144. keyIndex = LETTERS.find(letterMapping[cipherletter][0])

145. key[keyIndex] = cipherletter

The for loop on line 141 will let us go through each of the letters in LETTERS for the cipherletter variable, and if the cipherletter is solved (that is, letterMapping[cipherletter] has only one letter in it) then we can replace an 'x' in the key with the letter.

So on line 144 letterMapping[cipherletter][0] is the decryption letter, and keyIndex is the index of the decryption letter in LETTERS (which is returned from the find() call). This index in the key list is set to the decryption letter on line 145.

simpleSubHacker.py

146. else:

147. ciphertext = ciphertext.replace(cipherletter.lower(), '_')

148. ciphertext = ciphertext.replace(cipherletter.upper(), '_')

Or else, if the cipherletter does not have a solution, then we want to replace everywhere that cipherletter appears in the ciphertext with an underscore so the user can tell which characters were unsolved. Line 147 handles replacing the lowercase form of cipherletter with an underscore and line 148 handles replacing the uppercase form of cipherletter with an underscore.

simpleSubHacker.py

149. key = ''.join(key)

150.

151. # With the key we've created, decrypt the ciphertext.

152. return simpleSubCipher.decryptMessage(key, ciphertext)

When we have finally replaced all the parts in the list in key with the solved letters, we convert the list of strings into a single string with the join() method to create a simple substitution key. This string is passed to the decryptMessage() function in our simpleSubCipher.py program.

The decrypted message string returned from decryptMessage() is then returned from decryptWithCipherletterMapping() on line 152.

simpleSubHacker.py

155. if __name__ == '__main__':

156. main()

That completes all the functions our hacking program needs. Lines 155 and 156 just call the main() function to run the program if simpleSubHacker.py is being run directly, instead of being imported as a module by another Python program.

Couldn’t We Just Encrypt the Spaces Too?

Yes. Our hacking approach only works if the spaces were not encrypted. You can modify the simple substitution cipher from the previous chapter to encrypt spaces, numbers, and punctuation characters as well as letters, and it would make your encrypted messages harder (but not impossible) to hack. However, since the spaces will probably be the most common symbol in the ciphertext, you can write a program to replace it back to spaces, and then hack the ciphertext as normal. So encrypting the space characters would not offer much more protection.

Summary

Whew! This hacking program was fairly complicated. The cipherletter mapping is the main tool for modeling the possible letters that each ciphertext letter can decrypt to. By adding letters (based on the candidates for each cipherword) to the mapping, and then intersecting mappings and removing solved letters from other lists of potential decryption letters, we can narrow down the number of possible keys. Instead of trying all 403,291,461,126,605,635,584,000,000 possible keys we can use some sophisticated Python code to figure out exactly what most (if not all) of the original simple substitution key was.

The main strength of the simple substitution cipher is the large number of possible keys. But the downfall is that it is easy enough to compare the cipherwords to words in a dictionary file to slowly figure out which cipherletters decrypt to which letters. The next chapter’s cipher is much more powerful. For several hundred years, it was considered impossible to break. It is a “polyalphabetic” substitution cipher called the Vigenère cipher.