“It is poor civic hygiene to install technologies that could someday facilitate a police state.”
—Bruce Schneier, Secrets and Lies
The transposition programs seem to work pretty well at encrypting and decrypting different messages with various keys, but how do you know they always work? You can’t be absolutely sure the programs always work unless you test the encryptMessage() and decryptMessage() functions with all sorts of message and key parameter values. But this would take a lot of time because you’d have to type a message in the encryption program, set the key, run the encryption program, paste the ciphertext into the decryption program, set the key, and then run the decryption program. You’d also need to repeat that process with several different keys and messages, resulting in a lot of boring work!
Instead, let’s write another program that generates a random message and a random key to test the cipher programs. This new program will encrypt the message with encryptMessage() from transpositionEncrypt.py and then pass the ciphertext to decryptMessage() from transpositionDecrypt.py. If the plaintext returned by decryptMessage() is the same as the original message, the program will know that the encryption and decryption programs work. The process of testing a program automatically using another program is called automated testing.
Several different message and key combinations need to be tried, but it takes the computer only a minute or so to test thousands of combinations. If all of those tests pass, you can be more certain that your code works.
Open a new file editor window by selecting File▸New File. Enter the following code into the file editor and save it as transpositionTest.py. Then press F5 to run the program.
transposition
Test.py
1. # Transposition Cipher Test
2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
3.
4. import random, sys, transpositionEncrypt, transpositionDecrypt
5.
6. def main():
7. random.seed(42) # Set the random "seed" to a static value.
8.
9. for i in range(20): # Run 20 tests.
10. # Generate random messages to test.
11.
12. # The message will have a random length:
13. message = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' * random.randint(4, 40)
14.
15. # Convert the message string to a list to shuffle it:
16. message = list(message)
17. random.shuffle(message)
18. message = ''.join(message) # Convert the list back to a string.
19.
20. print('Test #%s: "%s..."' % (i + 1, message[:50]))
21.
22. # Check all possible keys for each message:
23. for key in range(1, int(len(message)/2)):
24. encrypted = transpositionEncrypt.encryptMessage(key, message)
25. decrypted = transpositionDecrypt.decryptMessage(key, encrypted)
26.
27. # If the decryption doesn't match the original message, display
28. # an error message and quit:
29. if message != decrypted:
30. print('Mismatch with key %s and message %s.' % (key,
message))
31. print('Decrypted as: ' + decrypted)
32. sys.exit()
33.
34. print('Transposition cipher test passed.')
35.
36.
37. # If transpositionTest.py is run (instead of imported as a module) call
38. # the main() function:
39. if __name__ == '__main__':
40. main()
When you run the transpositionTest.py program, the output should look like this:
Test #1: "JEQLDFKJZWALCOYACUPLTRRMLWHOBXQNEAWSLGWAGQQSRSIUIQ..."
Test #2: "SWRCLUCRDOMLWZKOMAGVOTXUVVEPIOJMSBEQRQOFRGCCKENINV..."
Test #3: "BIZBPZUIWDUFXAPJTHCMDWEGHYOWKWWWSJYKDQVSFWCJNCOZZA..."
Test #4: "JEWBCEXVZAILLCHDZJCUTXASSZZRKRPMYGTGHBXPQPBEBVCODM..."
--snip--
Test #17: "KPKHHLPUWPSSIOULGKVEFHZOKBFHXUKVSEOWOENOZSNIDELAWR..."
Test #18: "OYLFXXZENDFGSXTEAHGHPBNORCFEPBMITILSSJRGDVMNSOMURV..."
Test #19: "SOCLYBRVDPLNVJKAFDGHCQMXIOPEJSXEAAXNWCCYAGZGLZGZHK..."
Test #20: "JXJGRBCKZXPUIEXOJUNZEYYSEAEGVOJWIRTSSGPUWPNZUBQNDA..."
Transposition cipher test passed.
The tester program works by importing the transpositionEncrypt.py and transpositionDecrypt.py programs as modules. Then the tester program calls encryptMessage() and decryptMessage() from the encryption and decryption programs. The tester program creates a random message and chooses a random key. It doesn’t matter that the message is just random letters, because the program only needs to check that encrypting and then decrypting the message results in the original message.
Using a loop, the program repeats this test 20 times. If at any point the string returned from transpositionDecrypt() isn’t the same as the original message, the program prints an error and exits.
Let’s explore the source code in more detail.
The program starts by importing modules, including two you’ve already seen that come with Python, random and sys:
1. # Transposition Cipher Test
2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
3.
4. import random, sys, transpositionEncrypt, transpositionDecrypt
We also need to import the transposition cipher programs (that is, transpositionEncrypt.py and transpositionDecrypt.py) by just typing their names without the .py extension.
To create random numbers to generate the messages and keys, we’ll use the random module’s seed() function. Before we delve into what the seed does, let’s look at how random numbers work in Python by trying out the random.randint() function. The random.randint() function that we’ll use later in the program takes two integer arguments and returns a random integer between those two integers (including the integers). Enter the following into the interactive shell:
>>> import random
>>> random.randint(1, 20)
20
>>> random.randint(1, 20)
18
>>> random.randint(100, 200)
107
Of course, the numbers you get will probably be different from those shown here because they’re random numbers.
But the numbers generated by Python’s random.randint() function are not truly random. They’re produced from a pseudorandom number generator algorithm, which takes an initial number and produces other numbers based on a formula.
The initial number that the pseudorandom number generator starts with is called the seed. If you know the seed, the rest of the numbers the generator produces are predictable, because when you set the seed to a specific number, the same numbers will be generated in the same order. These random-looking but predictable numbers are called pseudorandom numbers. Python programs for which you don’t set a seed use the computer’s current clock time to set a seed. You can reset Python’s random seed by calling the random.seed() function.
To see proof that the pseudorandom numbers aren’t completely random, enter the following into the interactive shell:
>>> import random
➊ >>> random.seed(42)
➋ >>> numbers = []
>>> for i in range(20):
... numbers.append(random.randint(1, 10))
...
➌ [2, 1, 5, 4, 4, 3, 2, 9, 2, 10, 7, 1, 1, 2, 4, 4, 9, 10, 1, 9]
>>> random.seed(42)
>>> numbers = []
>>> for i in range(20):
... numbers.append(random.randint(1, 10))
...
➍ [2, 1, 5, 4, 4, 3, 2, 9, 2, 10, 7, 1, 1, 2, 4, 4, 9, 10, 1, 9]
In this code, we generate 20 numbers twice using the same seed. First, we import random and set the seed to 42 ➊. Then we set up a list called numbers ➋ where we’ll store our generated numbers. We use a for loop to generate 20 numbers and append each one to the numbers list, which we print so we can see every number that was generated ➌.
When the seed for Python’s pseudorandom number generator is set to 42, the first “random” number between 1 and 10 will always be 2. The second number will always be 1, the third number will always be 5, and so on. When you reset the seed to 42 and generate numbers with the seed again, the same set of pseudorandom numbers is returned from random.randint(), as you can see by comparing the numbers list at ➌ and ➍.
Random numbers will become important for ciphers in later chapters, because they’re used not only for testing ciphers but also for encrypting and decrypting in more complex ciphers. Random numbers are so important that one common security flaw in encryption software is using predictable random numbers. If the random numbers in your programs can be predicted, a cryptanalyst can use this information to break your cipher.
Selecting encryption keys in a truly random manner is necessary for the security of a cipher, but for other uses, such as this code test, pseudorandom numbers are fine. We’ll use pseudorandom numbers to generate test strings for our tester program. You can generate truly random numbers with Python by using the random.SystemRandom().randint() function, which you can learn more about at https://www.nostarch.com/crackingcodes/.
Now that you’ve learned how to use random.randint() and random.seed() to create random numbers, let’s return to the source code. To completely automate our encryption and decryption programs, we’ll need to automatically generate random string messages.
To do this, we’ll take a string of characters to use in the messages, duplicate it a random number of times, and store that as a string. Then, we’ll take the string of the duplicated characters and scramble them to make them more random. We’ll generate a new random string for each test so we can try many different letter combinations.
First, let’s set up the main() function, which contains code that tests the cipher programs. It starts by setting a seed for the pseudorandom string:
6. def main():
7. random.seed(42) # Set the random "seed" to a static value.
Setting the random seed by calling random.seed() is useful for the tester program because you want predictable numbers so the same pseudorandom messages and keys are chosen each time the program is run. As a result, if you notice one message fails to encrypt and decrypt properly, you’ll be able to reproduce this failing test case.
Next, we’ll duplicate a string using a for loop.
We’ll use a for loop to run 20 tests and to generate our random message:
9. for i in range(20): # Run 20 tests.
10. # Generate random messages to test.
11.
12. # The message will have a random length:
13. message = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' * random.randint(4, 40)
Each time the for loop iterates, the program will create and test a new message. We want this program to run multiple tests because the more tests we try, the more certain we’ll be that the programs work.
Line 13 is the first line of testing code and creates a message of a random length. It takes a string of uppercase letters and uses randint() and string replication to duplicate the string a random number of times between 4 and 40. Then it stores the new string in the message variable.
If we leave the message string as it is now, it will always just be the alphabet string repeated a random number of times. Because we want to test different combinations of characters, we’ll need to take things a step further and scramble the characters in message. To do that, let’s first learn a bit more about lists.
Variables store lists differently than they store other values. A variable will contain a reference to the list, rather than the list itself. A reference is a value that points to some bit of data, and a list reference is a value that points to a list. This results in slightly different behavior for your code.
You already know that variables store strings and integer values. Enter the following into the interactive shell:
>>> spam = 42
>>> cheese = spam
>>> spam = 100
>>> spam
100
>>> cheese
42
We assign 42 to the spam variable, and then copy the value in spam and assign it to the variable cheese. When we later change the value in spam to 100, the new number doesn’t affect the value in cheese because spam and cheese are different variables that store different values.
But lists don’t work this way. When we assign a list to a variable, we are actually assigning a list reference to the variable. The following code makes this distinction easier to understand. Enter this code into the interactive shell:
➊ >>> spam = [0, 1, 2, 3, 4, 5]
➋ >>> cheese = spam
➌ >>> cheese[1] = 'Hello!'
>>> spam
[0, 'Hello!', 2, 3, 4, 5]
>>> cheese
[0, 'Hello!', 2, 3, 4, 5]
This code might look odd to you. The code changed only the cheese list, but both the cheese and spam lists have changed.
When we create the list ➊, we assign a reference to it in the spam variable. But the next line ➋ copies only the list reference in spam to cheese, not the list value. This means the values stored in spam and cheese now both refer to the same list. There is only one underlying list because the actual list was never actually copied. So when we modify the first element of cheese ➌, we are modifying the same list that spam refers to.
Remember that variables are like boxes that contain values. But list variables don’t actually contain lists—they contain references to lists. (These references will have ID numbers that Python uses internally, but you can ignore them.) Using boxes as a metaphor for variables, Figure 9-1 shows what happens when a list is assigned to the spam variable.
Figure 9-1: stores a reference to a list, not the actual list.
Then, in Figure 9-2, the reference in spam is copied to cheese. Only a new reference was created and stored in cheese, not a new list. Notice that both references refer to the same list.
Figure 9-2: copies the reference, not the list.
When we alter the list that cheese refers to, the list that spam refers to also changes, because both cheese and spam refer to the same list. You can see this in Figure 9-3.
Although Python variables technically contain references to list values, people often casually say that the variable “contains the list.”
Figure 9-3: modifies the list that both variables refer to.
References are particularly important for understanding how arguments are passed to functions. When a function is called, the arguments’ values are copied to the parameter variables. For lists, this means a copy of the reference is used for the parameter. To see the consequences of this action, open a new file editor window, enter the following code, and save it as passingReference.py. Press F5 to run the code.
passing
Reference.py
def eggs(someParameter):
someParameter.append('Hello')
spam = [1, 2, 3]
eggs(spam)
print(spam)
When you run the code, notice that when eggs() is called, a return value isn’t used to assign a new value to spam. Instead, the list is modified directly. When run, this program produces the following output:
[1, 2, 3, 'Hello']
Even though spam and someParameter contain separate references, they both refer to the same list. This is why the append('Hello') method call inside the function affects the list even after the function call has returned.
Keep this behavior in mind: forgetting that Python handles list variables this way can lead to confusing bugs.
If you want to copy a list value, you can import the copy module to call the copy.deepcopy() function, which returns a separate copy of the list it is passed:
>>> spam = [0, 1, 2, 3, 4, 5]
>>> import copy
>>> cheese = copy.deepcopy(spam)
>>> cheese[1] = 'Hello!'
>>> spam
[0, 1, 2, 3, 4, 5]
>>> cheese
[0, 'Hello!', 2, 3, 4, 5]
Because the copy.deepcopy() function was used to copy the list in spam to cheese, when an item in cheese is changed, spam is unaffected.
We’ll use this function in Chapter 17 when we hack the simple substitution cipher.
With a foundation in how references work, you can now understand how the random.shuffle() function that we’ll use next works. The random.shuffle() function is part of the random module and accepts a list argument whose items it randomly rearranges. Enter the following into the interactive shell to see how random.shuffle() works:
>>> import random
>>> spam = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> spam
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> random.shuffle(spam)
>>> spam
[3, 0, 5, 9, 6, 8, 2, 4, 1, 7]
>>> random.shuffle(spam)
>>> spam
[1, 2, 5, 9, 4, 7, 0, 3, 6, 8]
An important detail to note is that shuffle() does not return a list value. Instead, it changes the list value that is passed to it (because shuffle() modifies the list directly from the list reference value it is passed). The shuffle() function modifies the list in place, which is why we execute random.shuffle(spam) instead of spam = random.shuffle(spam).
Let’s return to transpositionTest.py. To shuffle the characters in a string value, we first need to convert the string to a list using list():
15. # Convert the message string to a list to shuffle it:
16. message = list(message)
17. random.shuffle(message)
18. message = ''.join(message) # Convert the list back to a string.
The return value from list() is a list value with one-character strings of each character in the string passed to it; so on line 16, we’re reassigning message to be a list of its characters. Next, shuffle() randomizes the order of the items in message. Then the program converts the list of strings back to a string value using the join() string method. This shuffling of the message string allows us to test many different messages.
Now that the random message has been made, the program tests the encryption and decryption functions with it. We’ll have the program print some feedback so we can see what it’s doing while it’s testing:
20. print('Test #%s: "%s..."' % (i + 1, message[:50]))
Line 20 has a print() call that displays which test number the program is on (we need to add 1 to i because i starts at 0 and the test numbers should start at 1). Because the string in message can be long, we use string slicing to show only the first 50 characters of message.
Line 20 also uses string interpolation. The value that i + 1 evaluates to replaces the first %s in the string, and the value that message[:50] evaluates to replaces the second %s. When using string interpolation, be sure the number of %s in the string matches the number of values that are between the parentheses after it.
Next, we’ll test all the possible keys. Although the key for the Caesar cipher could be an integer from 0 to 65 (the length of the symbol set), the key for the transposition cipher can be between 1 and half the length of the message. The for loop on line 23 runs the test code with the keys 1 up to (but not including) the length of the message divided by two.
22. # Check all possible keys for each message:
23. for key in range(1, int(len(message)/2)):
24. encrypted = transpositionEncrypt.encryptMessage(key, message)
25. decrypted = transpositionDecrypt.decryptMessage(key, encrypted)
Line 24 encrypts the string in message using the encryptMessage() function. Because this function is inside the transpositionEncrypt.py file, we need to add transpositionEncrypt. (with the period at the end) to the front of the function name.
The encrypted string that is returned from encryptMessage() is then passed to decryptMessage(). We need to use the same key for both function calls. The return value from decryptMessage() is stored in a variable named decrypted. If the functions worked, the string in message should be the same as the string in decrypted. We’ll look at how the program checks this next.
After we’ve encrypted and decrypted the message, we need to check whether both processes worked correctly. To do that, we simply need to check whether the original message is the same as the decrypted message.
27. # If the decryption doesn't match the original message, display
28. # an error message and quit:
29. if message != decrypted:
30. print('Mismatch with key %s and message %s.' % (key,
message))
31. print('Decrypted as: ' + decrypted)
32. sys.exit()
33.
34. print('Transposition cipher test passed.')
Line 29 tests whether message and decrypted are equal. If they aren’t, Python displays an error message on the screen. Lines 30 and 31 print the key, message, and decrypted values as feedback to help us figure out what went wrong. Then the program exits.
Normally, programs exit when the execution reaches the end of the code and there are no more lines to execute. However, when sys.exit() is called, the program ends immediately and stops testing new messages (because you’ll want to fix your cipher programs if even one test fails!).
But if the values in message and decrypted are equal, the program execution skips the if statement’s block and the call to sys.exit(). The program continues looping until it finishes running all of its tests. After the loop ends, the program runs line 34, which you know is outside of line 9’s loop because it has different indentation. Line 34 prints 'Transposition cipher test passed.'.
As with our other programs, we want to check whether the program is being imported as a module or being run as the main program.
37. # If transpositionTest.py is run (instead of imported as a module) call
38. # the main() function:
39. if __name__ == '__main__':
40. main()
Lines 39 and 40 do the trick, checking whether the special variable __name__ is set to '__main__' and if so, calling the main() function.
We’ve written a program that tests the transposition cipher programs, but how do we know that the test program works? What if the test program has a bug, and it just indicates that the transposition cipher programs work when they really don’t?
We can test the test program by purposely adding bugs to the encryption or decryption functions. Then, if the test program doesn’t detect a problem, we know that it isn’t running as expected.
To add a bug to the program, we open transpositionEncrypt.py and add + 1 to line 36:
transposition
Encrypt.py
35. # Move currentIndex over:
36. currentIndex += key + 1
Now that the encryption code is broken, when we run the test program, it should print an error, like this:
Test #1: "JEQLDFKJZWALCOYACUPLTRRMLWHOBXQNEAWSLGWAGQQSRSIUIQ..."
Mismatch with key 1 and message
JEQLDFKJZWALCOYACUPLTRRMLWHOBXQNEAWSLGWAGQQSRSIUIQTRGJHDVCZECRESZJARAVIPFOBWZ
XXTBFOFHVSIGBWIBBHGKUWHEUUDYONYTZVKNVVTYZPDDMIDKBHTYJAHBNDVJUZDCEMFMLUXEONCZX
WAWGXZSFTMJNLJOKKIJXLWAPCQNYCIQOFTEAUHRJODKLGRIZSJBXQPBMQPPFGMVUZHKFWPGNMRYXR
OMSCEEXLUSCFHNELYPYKCNYTOUQGBFSRDDMVIGXNYPHVPQISTATKVKM.
Decrypted as:
JQDKZACYCPTRLHBQEWLWGQRIITGHVZCEZAAIFBZXBOHSGWBHKWEUYNTVNVYPDIKHYABDJZCMMUENZ
WWXSTJLOKJLACNCQFEUROKGISBQBQPGVZKWGMYRMCELSFNLPKNTUGFRDVGNPVQSAKK
The test program failed at the first message after we purposely inserted a bug, so we know that it’s working exactly as we planned!
You can use your new programming skills for more than just writing programs. You can also program the computer to test programs you write to make sure they work for different inputs. Writing code to test code is a common practice.
In this chapter, you learned how to use the random.randint() function to produce pseudorandom numbers and how to use random.seed() to reset the seed to create more pseudorandom numbers. Although pseudorandom numbers aren’t random enough to use in cryptography programs, they’re good enough to use in this chapter’s testing program.
You also learned the difference between a list and list reference and that the copy.deepcopy() function will create copies of list values instead of reference values. Additionally, you learned how the random.shuffle() function can scramble the order of items in a list value by shuffling list items in place using references.
All of the programs we’ve created so far encrypt only short messages. In Chapter 10, you’ll learn how to encrypt and decrypt entire files.