Chapter 24 - Programming the Public Key Cipher

24
PROGRAMMING THE PUBLIC KEY CIPHER

“A colleague once told me that the world was full of bad security systems designed by people who read Applied Cryptography.”
—Bruce Schneier, author of Applied Cryptography

In Chapter 23, you learned how public key cryptography works and how to generate public and private key files using the public key generation program. Now you’re ready to send your public key file to others (or post it online) so they can

use it to encrypt their messages before sending them to you. In this chapter, you’ll write the public key cipher program that encrypts and decrypts messages using the public and private keys you generated.

WARNING

The implementation of public key cipher in this book is based on the RSA cipher. However, many additional details that we won’t cover here make this cipher unsuitable for real-world use. Although the public key cipher is impervious to the cryptanalysis techniques in this book, it is vulnerable to the advanced techniques professional cryptographers use today.

How the Public Key Cipher Works

Similar to the previous ciphers we’ve programmed, the public key cipher converts characters into numbers and then performs math on those numbers to encrypt them. What sets the public key cipher apart from other ciphers you’ve learned about is that it converts multiple characters into one integer called a block and then encrypts one block at a time.

The reason the public key cipher needs to work on a block that represents multiple characters is that if we used the public key encryption algorithm on a single character, the same plaintext characters would always encrypt to the same ciphertext characters. Therefore, the public key cipher would be no different from a simple substitution cipher with fancy mathematics.

Creating Blocks

In cryptography, a block is a large integer that represents a fixed number of text characters. The publicKeyCipher.py program we’ll create in this chapter converts the message string value into blocks, and each block is an integer that represents 169 text characters. The maximum block size depends on the symbol set size and key size. In our program, the symbol set, which contains the only characters a message can have, will be the 66 character string 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890 !?.'.

The equation 2^{key size} > symbol set size^{block size} must hold true. For example, when you use a 1024-bit key and a symbol set of 66 characters, the maximum block size is an integer up to 169 characters because 2¹⁰²⁴ is greater than 66¹⁶⁹. However, 2¹⁰²⁴ is not greater than 66¹⁷⁰. If you use blocks that are too large, the math of the public key cipher won’t work, and you won’t be able to decrypt the ciphertext the program produced.

Let’s explore how to convert a message string into a large integer block.

Converting a String into a Block

As in our previous cipher programs, we can use the index of the symbol set string to convert text characters to integers and vice versa. We’ll store the symbol set in a constant named SYMBOLS where the character at index 0 is 'A', the character at index 1 is 'B', and so on. Enter the following into the interactive shell to see how this conversion works:

>>> SYMBOLS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz123456
7890 !?.'
>>> len(SYMBOLS)
66
>>> SYMBOLS[0]
'A'
>>> SYMBOLS[30]
'e'

We can represent text characters by their integer indexes in the symbol set. However, we also need a way to combine these small integers into a large integer that represents a block.

To create a block, we multiply the symbol set index of a character by the symbol set size raised to increasing powers. The block is the sum of all of these numbers. Let’s look at an example of combining small integers into one large block for the string 'Howdy' using the following steps.

The integer for the block starts at 0, and the symbol set is 66 characters long. Enter the following into the interactive shell using these numbers as an example:

>>> blockInteger = 0
>>> len(SYMBOLS)
66

The first character in the 'Howdy' message is 'H', so we find the symbol set index for that character, like this:

>>> SYMBOLS.index('H')
7

Because this is the first character of the message, we multiply its symbol set index by 66⁰ (in Python, the exponent operator is **), which evaluates to 7. We add that value to the block:

>>> 7 * (66 ** 0)
7
>>> blockInteger = blockInteger + 7

Then we find the symbol set index for 'o', the second character in 'Howdy'. Because this is the second character of the message, we multiply the symbol set index for 'o' by 66¹ instead of 66⁰, and then add it to the block:

>>> SYMBOLS.index('o')
40
>>> blockInteger += 40 * (66 ** 1)
>>> blockInteger
2647

The block is now 2647. We can shorten the process of finding the symbol set index for each character using a single line of code:

>>> blockInteger += SYMBOLS.index('w') * (len(SYMBOLS) ** 2)
>>> blockInteger += SYMBOLS.index('d') * (len(SYMBOLS) ** 3)
>>> blockInteger += SYMBOLS.index('y') * (len(SYMBOLS) ** 4)
>>> blockInteger
957285919

Encoding 'Howdy' into a single large integer block yields the integer 957,285,919, which uniquely refers to the string. By continuing to use larger and larger powers of 66, we can use a large integer to represent a string of any length up to the block size. For example, 277,981 is a block that represents the string '42!' and 10,627,106,169,278,065,987,481,042,235,655,809,080,528 represents the string 'I named my cat Zophie.'.

Because our block size is 169, we can only encrypt up to 169 characters in a single block. If the message we want to encode is longer than 169 characters, we can just use more blocks. In the publicKeyCipher.py program, we’ll use commas to separate the blocks so the user can identify when one block ends and the next one begins.

Table 24-1 contains an example message split into blocks and shows the integer that represents each block. Each block can store at most 169 characters of the message.

Table 24-1: A Message Split into Blocks

	Message	Block integer
1st block (169 characters)	Alan Mathison Turing was a British cryptanalyst and computer scientist. He was highly influential in the development of computer science and provided a formalisation of	3013810338120027658120611166332270159047154 7608326152595431391575797140707837485089852 659286061395648657712401264848061468979996 871106525448961558640277994456848107158423 162065952633246425985956987627719631460939 2565956887693059829154012923414594664511373 093526087354321666137736234609864038110994 85392482698
2nd block (169 characters)	the concepts of algorithm and computation with the Turing machine. Turing is widely considered to be the father of computer science and artificial intelligence. During W	1106890780922147455215935080195634373132680 102708192713651484085475402677752791958075 872272026708702634070281109709555761008584 1376819190225258032442691476944762174257333 902148064107269871669093655004577014280290 4244524711751435049117398986044838791597315 078937194860112574798016587564452792451567 15863348631
3rd block (82 characters)	orld War II he worked for the Government Code and Cypher School at Bletchley Park.	1583679754961601914428952447217583697875837 635974864128047509439056559022732095918077 290541944859809053286915764228326887495095 27709935741799076979034

In this example, the 420-character message consists of two 169-character blocks and needs a third block for the remaining 82 characters.

The Mathematics of Public Key Cipher Encryption and Decryption

Now that you know how to convert characters into block integers, let’s explore how the public key cipher uses math to encrypt each block.

Here are the general equations for the public key cipher:

C = M^e mod n
M = C^d mod n

We use the first equation to encrypt each integer block and the second equation to decrypt. M represents a message block integer, and C is a ciphertext block integer. The numbers e and n make up the public key for encryption, and the numbers d and n make up the private key. Recall that everyone, including the cryptanalyst, has access to the public key (e, n).

Typically, we create the encrypted message by raising every block integer, like those we calculated in the previous section, to the e power and modding the result by n. This calculation results in an integer that represents the encrypted block C. Combining all the blocks results in the complete encrypted message.

For example, let’s encrypt the five-character string 'Howdy' and send it to Alice. When converted to an integer block, the message is [957285919] (the full message fits into one block, so there is only one integer in the list value). Alice’s public key is 64 bits, which is too small to be secure, but we’ll use it to simplify our output in this example. Its n is 116,284,564,958,604,315,258,674,918,142,848,831,759 and e is 13,805,220,545,651,593,223. (These numbers would be much larger for 1024-bit keys.)

To encrypt, we calculate (957,285,919^{13,805,220,545,651,593,223}) % 116,284,564,958,604,315,258,674,918,142,848,831,759 by passing these numbers to Python’s pow() function, like this:

>>> pow(957285919, 13805220545651593223,
116284564958604315258674918142848831759)
43924807641574602969334176505118775186

Python’s pow() function uses a mathematical trick called modular exponentiation to quickly calculate such a large exponent. In fact, evaluating the expression (957285919 ** 13805220545651593223) % 116284564958604315258674918142848831759 would yield the same answer but take hours to complete. The integer that pow() returns is a block that represents the encrypted message.

To decrypt, the encrypted message’s recipient needs to have the private key (d, n), raise each encrypted block integer to the d power, and then mod by n. When all the decrypted blocks are decoded into characters and combined, the recipient would get the original plaintext message.

For example, Alice tries to decrypt the block integer 43,924,807,641,574,602,969,334,176,505,118,775,186. Her private key’s n is the same as her public key’s n, but her private key has a d of 72,424,475,949,690,145,396,970,707,764,378,340,583. To decrypt, she runs the following:

>>> pow(43924807641574602969334176505118775186,
72424475949690145396970707764378340583,
116284564958604315258674918142848831759)
957285919

When we convert the block integer 957285919 to a string, we get 'Howdy', which is the original message. Next, you’ll learn how to convert a block into a string.

Converting a Block to a String

To decrypt a block to the original block integer, the first step is to convert it to the small integers for each text character. This process starts with the last character that was added to the block. We use floor division and mod operators to calculate the small integers for each text character.

Recall that the block integer in the previous 'Howdy' example was 957285919; the original message was five characters long, making the last character’s index 4; and the symbol set used for the message was 66 characters long. To determine the symbol set index of the last character, we calculate 957,285,919 / 66⁴ and round down, which results in 50. We can use the integer division operator (//) to divide and round down. The character at index 50 in the symbol set (SYMBOLS[50]) is 'y', which is indeed the last character of the 'Howdy' message.

In the interactive shell, we calculate this block integer using the following code:

>>> blockInteger = 957285919
>>> SYMBOLS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz123456
7890 !?.'
>>> blockInteger // (66 ** 4)
50
>>> SYMBOLS[50]
'y'

The next step is to mod the block integer by 66⁴ to get the next block integer. Calculating 957,285,919 % (66⁴) results in 8,549,119, which happens to be the block integer value for the string 'Howd'. We can determine the last character of this block by using floor division of (66³). Enter the following into the interactive shell to do this:

>>> blockInteger = 8549119
>>> SYMBOLS[blockInteger // (len(SYMBOLS) ** 3)]
'd'

The last character of this block is 'd', making the converted string so far 'dy'. We can remove this character from the block integer as we did before:

>>> blockInteger = blockInteger % (len(SYMBOLS) ** 3)
>>> blockInteger
211735

The integer 211735 is the block for the string 'How'. By continuing the process, we can recover the full string from the block, like so:

>>> SYMBOLS[blockInteger // (len(SYMBOLS) ** 2)]
'w'
>>> blockInteger = blockInteger % (len(SYMBOLS) ** 2)
>>> SYMBOLS[blockInteger // (len(SYMBOLS) ** 1)]
'o'
>>> blockInteger = blockInteger % (len(SYMBOLS) ** 1)
>>> SYMBOLS[blockInteger // (len(SYMBOLS) ** 0)]
'H'

Now you know how the characters from the string 'Howdy' are retrieved from the original block integer value 957285919.

Why We Can’t Hack the Public Key Cipher

All the different types of cryptographic attacks we’ve used in this book are useless against the public key cipher when it’s implemented correctly. Here are a few reasons why:

The brute-force attack won’t work because there are too many possible keys to check.
A dictionary attack won’t work because the keys are based on numbers, not words.
A word pattern attack won’t work because the same plaintext word can be encrypted differently depending on where in the block it appears.
Frequency analysis won’t work because a single encrypted block represents several characters; we can’t get a frequency count of the individual characters.

Because the public key (e, n) is known to all, if a cryptanalyst can intercept the ciphertext, they would know e, n, and C. But without knowing d, it’s mathematically impossible to solve for M, the original message.

Recall from Chapter 23 that e is relatively prime with the number (p – 1) × (q – 1) and that d is the modular inverse of e and (p – 1) × (q – 1). In Chapter 13, you learned that the modular inverse of two numbers is calculated by finding i for the equation (ai) % m = 1, where a and m are two numbers in the modular problem a mod m. This means that the cryptanalyst knows that d is the inverse of e mod (p – 1) × (q – 1), so we can find d to get the whole decryption key by solving the equation (ed) mod (p – 1) × (q – 1) = 1; however, there’s no way of knowing what (p – 1) × (q – 1) is.

We know the key sizes from the public key file, so the cryptanalyst knows that p and q are less than 2¹⁰²⁴ and that e is relatively prime with (p – 1) × (q – 1). But e is relatively prime with a lot of numbers, and finding (p – 1) × (q – 1) from a range of 0 to 2¹⁰²⁴ possible numbers is too large a problem to brute-force.

Although it isn’t enough to crack the code, the cryptanalyst can glean another hint from the public key. The public key is made up of the two numbers (e, n), and we know n = p × q because that was how we calculated n when we created the public and private keys in Chapter 23. And because p and q are prime numbers, for a given number n, there can be exactly two numbers that can be p and q.

Recall that a prime number has no factors besides 1 and itself. Therefore, if you multiply two prime numbers, the product will have 1 and itself and the two prime numbers you started with as its only factors.

Therefore, to hack the public key cipher, all we need to do is figure out the factors of n. Because we know that two and only two numbers can be multiplied to get n, we won’t have too many different numbers to choose from. After we figure out which two prime numbers (p and q) when multiplied evaluate to n, we can calculate (p – 1) × (q – 1) and then use that result to calculate d. This calculation seems pretty easy to do. Let’s use the isPrime() function we wrote in the primeNum.py program in Chapter 22 to do the calculation.

We can modify isPrime() to return the first factors it finds, because we know that there can be only two factors of n besides 1 and n:

def isPrime(num):
    # Returns (p,q) where p and q are factors of num.
    # See if num is divisible by any number up to the square root of num:
    for i in range(2, int(math.sqrt(num)) + 1):
        if num % i == 0:
            return (i, num / i)
    return None # No factors exist for num; num must be prime.

If we wrote a public key cipher hacker program, we could just call this function, pass n to it (which we would get from the public key file), and wait for it to find the factors p and q. Then we could find what (p – 1) × (q – 1) is, which means that we could calculate the mod inverse of e mod (p – 1) × (q – 1) to get d, the decryption key. Then it would be easy to calculate M, the plaintext message.

But there’s a problem. Recall that n is a number that is approximately 600 digits long. Python’s math.sqrt() function can’t handle a number that big, so it gives you an error message. But even if it could process this number, Python would be executing that for loop for a very long time. For example, even if your computer continued to run 5 billion years from now, there’s still almost no chance that it would find the factors of n. That’s how big these numbers are.

And this is exactly the strength of the public key cipher: mathematically, there is no shortcut to finding the factors of a number. It’s easy to come up with two prime numbers p and q and multiply them together to get n. But it’s nearly impossible to take a number n and figure out what p and q would be. For example, when you look at a small number like 15, you can easily say that 5 and 3 are two numbers that when multiplied equal 15. But it’s another thing entirely to try to figure out the factors of a number like 178,565,887,643,607,245,654,502,737. This fact makes the public key cipher virtually impossible to break.

Source Code for the Public Key Cipher Program

Open a new file editor window by selecting File▸New File. Enter the following code into the file editor and save it as publicKeyCipher.py.

publicKey
Cipher.py

  1. # Public Key Cipher
  2. # https://www.nostarch.com/crackingcodes/ (BSD Licensed)
  3.
  4. import sys, math
  5.
  6. # The public and private keys for this program are created by
  7. # the makePublicPrivateKeys.py program.
  8. # This program must be run in the same folder as the key files.
  9.
10. SYMBOLS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12345
       67890 !?.'
11.
12. def main():
13.     # Runs a test that encrypts a message to a file or decrypts a message
14.     # from a file.
15.     filename = 'encrypted_file.txt' # The file to write to/read from.
16.     mode = 'encrypt' # Set to either 'encrypt' or 'decrypt'.
17.
18.     if mode == 'encrypt':
19.         message = 'Journalists belong in the gutter because that is where
               the ruling classes throw their guilty secrets. Gerald Priestland.
               The Founding Fathers gave the free press the protection it must
               have to bare the secrets of government and inform the people.
               Hugo Black.'
20.         pubKeyFilename = 'al_sweigart_pubkey.txt'
21.         print('Encrypting and writing to %s...' % (filename))
22.         encryptedText = encryptAndWriteToFile(filename, pubKeyFilename,
               message)
23.
24.         print('Encrypted text:')
25.         print(encryptedText)
26.
27.     elif mode == 'decrypt':
28.         privKeyFilename = 'al_sweigart_privkey.txt'
29.         print('Reading from %s and decrypting...' % (filename))
30.         decryptedText = readFromFileAndDecrypt(filename, privKeyFilename)
31.
32.         print('Decrypted text:')
33.         print(decryptedText)
34.
35.
36. def getBlocksFromText(message, blockSize):
37.     # Converts a string message to a list of block integers.
38.     for character in message:
39.         if character not in SYMBOLS:
40.             print('ERROR: The symbol set does not have the character %s' %
                   (character))
41.             sys.exit()
42.     blockInts = []
43.     for blockStart in range(0, len(message), blockSize):
44.         # Calculate the block integer for this block of text:
45.         blockInt = 0
46.         for i in range(blockStart, min(blockStart + blockSize,
               len(message))):
47.             blockInt += (SYMBOLS.index(message[i])) * (len(SYMBOLS) **
                   (i % blockSize))
48.         blockInts.append(blockInt)
49.     return blockInts
50.
51.
52. def getTextFromBlocks(blockInts, messageLength, blockSize):
53.     # Converts a list of block integers to the original message string.
54.     # The original message length is needed to properly convert the last
55.     # block integer.
56.     message = []
57.     for blockInt in blockInts:
58.         blockMessage = []
59.         for i in range(blockSize - 1, -1, -1):
60.             if len(message) + i < messageLength:
61.                 # Decode the message string for the 128 (or whatever
62.                 # blockSize is set to) characters from this block integer:
63.                 charIndex = blockInt // (len(SYMBOLS) ** i)
64.                 blockInt = blockInt % (len(SYMBOLS) ** i)
65.                 blockMessage.insert(0, SYMBOLS[charIndex])
66.         message.extend(blockMessage)
67.     return ''.join(message)
68.
69.
70. def encryptMessage(message, key, blockSize):
71.     # Converts the message string into a list of block integers, and then
72.     # encrypts each block integer. Pass the PUBLIC key to encrypt.
73.     encryptedBlocks = []
74.     n, e = key
75.
76.     for block in getBlocksFromText(message, blockSize):
77.         # ciphertext = plaintext ^ e mod n
78.         encryptedBlocks.append(pow(block, e, n))
79.     return encryptedBlocks
80.
81.
82. def decryptMessage(encryptedBlocks, messageLength, key, blockSize):
83.     # Decrypts a list of encrypted block ints into the original message
84.     # string. The original message length is required to properly decrypt
85.     # the last block. Be sure to pass the PRIVATE key to decrypt.
86.     decryptedBlocks = []
87.     n, d = key
88.     for block in encryptedBlocks:
89.         # plaintext = ciphertext ^ d mod n
90.         decryptedBlocks.append(pow(block, d, n))
91.     return getTextFromBlocks(decryptedBlocks, messageLength, blockSize)
92.
93.
94. def readKeyFile(keyFilename):
95.     # Given the filename of a file that contains a public or private key,
96.     # return the key as a (n,e) or (n,d) tuple value.
97.     fo = open(keyFilename)
98.     content = fo.read()
99.     fo.close()
100.     keySize, n, EorD = content.split(',')
101.     return (int(keySize), int(n), int(EorD))
102.
103.
104. def encryptAndWriteToFile(messageFilename, keyFilename, message,
       blockSize=None):
105.     # Using a key from a key file, encrypt the message and save it to a
106.     # file. Returns the encrypted message string.
107.     keySize, n, e = readKeyFile(keyFilename)
108.     if blockSize == None:
109.         # If blockSize isn't given, set it to the largest size allowed by
               the key size and symbol set size.
110.         blockSize = int(math.log(2 ** keySize, len(SYMBOLS)))
111.     # Check that key size is large enough for the block size:
112.     if not (math.log(2 ** keySize, len(SYMBOLS)) >= blockSize):
113.         sys.exit('ERROR: Block size is too large for the key and symbol
               set size. Did you specify the correct key file and encrypted
               file?')
114.     # Encrypt the message:
115.     encryptedBlocks = encryptMessage(message, (n, e), blockSize)
116.
117.     # Convert the large int values to one string value:
118.     for i in range(len(encryptedBlocks)):
119.         encryptedBlocks[i] = str(encryptedBlocks[i])
120.     encryptedContent = ','.join(encryptedBlocks)
121.
122.     # Write out the encrypted string to the output file:
123.     encryptedContent = '%s_%s_%s' % (len(message), blockSize,
           encryptedContent)
124.     fo = open(messageFilename, 'w')
125.     fo.write(encryptedContent)
126.     fo.close()
127.     # Also return the encrypted string:
128.     return encryptedContent
129.
130.
131. def readFromFileAndDecrypt(messageFilename, keyFilename):
132.     # Using a key from a key file, read an encrypted message from a file
133.     # and then decrypt it. Returns the decrypted message string.
134.     keySize, n, d = readKeyFile(keyFilename)
135.
136.
137.     # Read in the message length and the encrypted message from the file:
138.     fo = open(messageFilename)
139.     content = fo.read()
140.     messageLength, blockSize, encryptedMessage = content.split('_')
141.     messageLength = int(messageLength)
142.     blockSize = int(blockSize)
143.
144.     # Check that key size is large enough for the block size:
145.     if not (math.log(2 ** keySize, len(SYMBOLS)) >= blockSize):
146.         sys.exit('ERROR: Block size is too large for the key and symbol
               set size. Did you specify the correct key file and encrypted
               file?')
147.
148.     # Convert the encrypted message into large int values:
149.     encryptedBlocks = []
150.     for block in encryptedMessage.split(','):
151.         encryptedBlocks.append(int(block))
152.
153.     # Decrypt the large int values:
154.     return decryptMessage(encryptedBlocks, messageLength, (n, d),
           blockSize)
155.
156.
157. # If publicKeyCipher.py is run (instead of imported as a module), call
158. # the main() function.
159. if __name__ == '__main__':
160.     main()

Sample Run of the Public Key Cipher Program

Let’s try running the publicKeyCipher.py program to encrypt a secret message. To send a secret message to someone using this program, get that person’s public key file and place it in the same directory as the program file.

To encrypt a message, make sure the mode variable on line 16 is set to the string 'encrypt'. Update the message variable on line 19 to the message string you want to encrypt. Then set the pubKeyFilename variable on line 20 to the public key file’s filename. The filename variable on line 21 holds a filename that the ciphertext is written to. The filename, pubKeyFilename, and message variables are all passed to encryptAndWriteToFile() to encrypt the message and save it to a file.

When you run the program, the output should look like this:

Encrypting and writing to encrypted_file.txt...
Encrypted text:
258_169_45108451524907138236859816039483721219475907590237903918239237768643699
4856660301323157253724978022861702098324427738284225530186213380188880577329634
8339229890890464969556937797072434314916522839692277034579463594713843559898418
9307234650088689850744361262707129971782407610450208047927129687841621734776965
7018277918490297215785759257290855812221088907016904983025542174471606494779673
6015310089155876234277883381345247353680624585629672939709557016107275469388284
5124192568409483737233497304087969624043516158221689454148096020738754656357140
74772465708958607695479122809498585662785064751254235489968738346795649,1253384
3336975115539761332250402699868835150623017582438116840049236083573741817645933
3719456453133658476271176035248597021972316454526545069452838766387599839340542
4066877721135511313454252589733971962219016066614978390378611175964456773669860
9429545605901714339082542725015140530985685117232598778176545638141403657010435
3859244660091910391099621028192177415196156469972977305212676293746827002983231
4668240693230032141097312556400629961518635799478652196072316424918648787555631
6339424948975804660923616682767242948296301678312041828934473786824809308122356
133539825048880814063389057192492939651199537310635280371

The program writes this output to a file named encrypted_file.txt. This is the encryption of the string in the message variable on line 19. Because the public key you’re using is likely different from mine, the output you get may be different, but the output’s format should be the same. As you can see in this example, the encryption is divided into two blocks, or two large integers, separated by a comma.

The number 258 at the start of the encryption represents the original message length and is followed by an underscore and another number 169, which represents the block size. To decrypt this message, change the mode variable to 'decrypt' and run the program again. As with encryption, make sure privKeyFilename on line 28 is set to the private key filename and that this file is in the same folder as publicKeyCipher.py. In addition, make sure that the encrypted file, encrypted_file.txt, is in the same folder as publicKeyCipher.py. When you run the program, the encrypted message in encrypted_file.txt is decrypted, and the output should look like this:

Reading from encrypted_file.txt and decrypting...
Decrypted text:
Journalists belong in the gutter because that is where the ruling classes throw
their guilty secrets. Gerald Priestland. The Founding Fathers gave the free
press the protection it must have to bare the secrets of government and inform
the people. Hugo Black.

Note that the publicKeyCipher.py program can only encrypt and decrypt plain (simple) text files.

Let’s take a closer look at the source code of the publicKeyCipher.py program.

Setting Up the Program

The public key cipher works with numbers, so we’ll convert our string message into an integer. This integer is calculated based on indexes in the symbol set, which is stored in the SYMBOLS variable on line 10.

How the Program Determines Whether to Encrypt or Decrypt

The publicKeyCipher.py program determines whether to encrypt or decrypt a file, and which key file to use, by storing values in variables. While we’re looking at how these variables work, we’ll also look at how the program prints the encryption and decryption output.

We tell the program whether it should encrypt or decrypt inside main():

12. def main():
13.     # Runs a test that encrypts a message to a file or decrypts a message
14.     # from a file.
15.     filename = 'encrypted_file.txt' # The file to write to/read from.
16.     mode = 'encrypt' # Set to either 'encrypt' or 'decrypt'.

If mode on line 16 is set to 'encrypt', the program encrypts a message by writing it to the file specified in filename. If mode is set to 'decrypt', the program reads the contents of an encrypted file (specified by filename) to decrypt it.

Lines 18 to 25 specify what the program should do if it confirms that the user wants to encrypt a file.

18.     if mode == 'encrypt':
19.         message = 'Journalists belong in the gutter because that is where
               the ruling classes throw their guilty secrets. Gerald Priestland.
               The Founding Fathers gave the free press the protection it must
               have to bare the secrets of government and inform the people.
               Hugo Black.'
20.         pubKeyFilename = 'al_sweigart_pubkey.txt'
21.         print('Encrypting and writing to %s...' % (filename))
22.         encryptedText = encryptAndWriteToFile(filename, pubKeyFilename,
               message)
23.
24.         print('Encrypted text:')
25.         print(encryptedText)

The message variable on line 19 contains the text to be encrypted, and pubKeyFilename on line 20 contains the filename of the public key file, which is al_sweigart_pubkey.txt in this example. Keep in mind that the message can only contain characters in the SYMBOLS variable, the symbol set for this cipher. Line 22 calls the encryptAndWriteToFile() function, which encrypts message using the public key and writes the encrypted message to the file specified by filename.

Lines 27 to 28 tell the program what to do if mode is set to 'decrypt'. Instead of encrypting, the program reads from the private key file in privKeyFilename on line 28.

27.     elif mode == 'decrypt':
28.         privKeyFilename = 'al_sweigart_privkey.txt'
29.         print('Reading from %s and decrypting...' % (filename))
30.         decryptedText = readFromFileAndDecrypt(filename, privKeyFilename)
31.
32.         print('Decrypted text:')
33.         print(decryptedText)

Then we pass the filename and privKeyFilename variables to the readFromFileAndDecrypt() function (defined later in the program), which returns the decrypted message. Line 30 stores the return value from readFromFileAndDecrypt() in decryptedText, and line 33 prints it to the screen. This is the end of the main() function.

Now let’s look at how to perform the other steps of the public key cipher, such as converting the message into blocks.

Converting Strings to Blocks with getBlocksFromText()

Let’s look at how the program converts a message string into 128-byte blocks. The getBlocksFromText() function on line 36 takes a message and a block size as parameters to return a list of blocks, or a list of large integer values, that represents the message.

36. def getBlocksFromText(message, blockSize):
37.     # Converts a string message to a list of block integers.
38.     for character in message:
39.         if character not in SYMBOLS:
40.             print('ERROR: The symbol set does not have the character %s' %
                   (character))
41.             sys.exit()

Lines 38 to 41 ensure that the message parameter contains only text characters that are in the symbol set in the SYMBOLS variable. The blockSize parameter is optional and can take any block size. To create blocks, we first convert the string to bytes.

To make a block, we combine all the symbol set indexes into one large integer, as we did in “Converting a String into a Block” on page 350. We’ll use the blockInts empty list on line 42 to store the blocks when we create them.

42. blockInts = []

We want the blocks to be blockSize bytes long, but when a message isn’t evenly divisible by blockSize, the last block will be less than blockSize characters long. To handle that situation, we use the min() function.

The min() and max() Functions

The min() function returns the smallest value of its arguments. Enter the following into the interactive shell to see how the min() function works:

>>> min(13, 32, 13, 15, 17, 39)
13

You can also pass a single list or tuple value as an argument to min(). Enter the following into the interactive shell to see an example:

>>> min([31, 26, 20, 13, 12, 36])
12
>>> spam = (10, 37, 37, 43, 3)
>>> min(spam)
3

In this case, min(spam) returns the smallest value in the list or tuple. The opposite of min() is max(), which returns the largest value of its arguments, like this:

>>> max(18, 15, 22, 30, 31, 34)
34

Let’s return to our code to see how the publicKeyCipher.py program uses min() to make sure the last block of message is truncated to the appropriate size.

Storing Blocks in blockInt

The code inside the for loop on line 43 creates the integers for each block by setting the value in blockStart to the index of the block being created.

43.     for blockStart in range(0, len(message), blockSize):
44.         # Calculate the block integer for this block of text:
45.         blockInt = 0
46.         for i in range(blockStart, min(blockStart + blockSize,
               len(message))):

We’ll store the block we create in blockInt, which we initially set to 0 on line 45. The for loop on line 46 sets i to be the indexes of all the characters that will be in the block from message. The indexes should start at blockStart and go up to blockStart + blockSize or len(message), whichever is smaller. The min() call on line 46 returns the smaller of these two expressions.

The second argument to range() on line 46 should be the smaller of blockStart + blockSize and len(message) because each block is always made up of 128 (or whatever value is in blockSize) characters except for the last block. The last block might be exactly 128 characters, but it’s more likely that it will be less than the full 128 characters. In that case, we want i to stop at len(message) because that is the last index in message.

After we have the characters that make up the block, we use math to turn the characters into one large integer. Recall in “Converting a String into a Block” on page 350 that we created a large integer by multiplying the symbol set index integer value of each character by 66^{index-of-character} (66 is the length of the SYMBOLS string). To do this in code, we calculate SYMBOLS.index(message[i]) (the symbol set index integer value of the character) multiplied by (len(SYMBOLS) ** (i % blockSize)) for each character and add each result to blockInt.

47. blockInt += (SYMBOLS.index(message[i])) * (len(SYMBOLS) **
(i % blockSize))

We want the exponent to be the index relative to the current iteration’s block, which is always from 0 to blockSize. We can’t use the variable i directly as the index-of-character part of the equation, because it refers to the index in the entire message string, which has indexes from 0 up to len(message). Using i would result in an integer much larger than 66. By modding i by blockSize, we can get the index relative to the block, which is why line 47 is len(SYMBOLS) ** (i % blockSize) instead of simply len(SYMBOLS) ** i.

After the for loop on line 46 completes, the integer for the block has been calculated. We use the code on line 48 to append this block integer to the blockInts list. The next iteration of the for loop on line 43 calculates the block integer for the next block of the message.

48. blockInts.append(blockInt)
49. return blockInts

After the for loop on line 43 finishes, all the block integers should have been calculated and stored in the blockInts list. Line 49 returns blockInts from getBlocksFromText().

At this point, we’ve converted the entire message string into block integers, but we also need a way to turn block integers back into the original plaintext message for the decryption process, which is what we’ll do next.

Using getTextFromBlocks() to Decrypt

The getTextFromBlocks() function on line 52 does the opposite of getBlocksFromText(). This function takes a list of block integers as the blockInts parameter, the message’s length, and the blockSize to return the string value that these blocks represent. We need the length of the encoded message in messageLength, because the getTextFromBlocks() function uses this information to get the string from the last block integer when it’s not blockSize characters in size. This process was described in “Converting a Block to a String” on page 354.

52. def getTextFromBlocks(blockInts, messageLength, blockSize):
53.     # Converts a list of block integers to the original message string.
54.     # The original message length is needed to properly convert the last
55.     # block integer.
56.     message = []

The message list, which is created as a blank list on line 56, stores a string value for each character, which we’ll compute from the block integers in blockInts.

The for loop on line 57 iterates over each block integer in the blockInts list. Inside the for loop, the code on lines 58 to 65 calculates the letters that are in the current iteration’s block.

57.     for blockInt in blockInts:
58.         blockMessage = []
59.         for i in range(blockSize - 1, -1, -1):

The code in getTextFromBlocks() splits each block integer into blockSize integers, where each represents the symbol set index for a character. We must work backward to extract the symbol set indexes from blockInt because when we encrypted the message, we started with the smaller exponents (66⁰, 66¹, 66², and so on), but when decrypting, we must divide and mod using the larger exponents first. This is why the for loop on line 59 starts at blockSize - 1 and then subtracts 1 on each iteration down to, but not including, -1. This means the value of i on the last iteration is 0.

Before we convert the symbol set index to a character, we need to make sure we aren’t decoding blocks past the length of the message. To do this, we check that the number of characters that have been translated from blocks so far, len(message) + i, is still less than messageLength on line 60.

60.             if len(message) + i < messageLength:
61.                 # Decode the message string for the 128 (or whatever
62.                 # blockSize is set to) characters from this block integer.
63.                 charIndex = blockInt // (len(SYMBOLS) ** i)
64.                 blockInt = blockInt % (len(SYMBOLS) ** i)

To get the characters from the block, we follow the process described in “Converting a Block to a String” on page 354. We put each character into the message list. Encoding each block actually reverses the characters, which you saw earlier, so we can’t just append the decoded character to message. Instead, we insert the character at the front of message, which we’ll need to do with the insert() list method.

Using the insert() List Method

The append() list method only adds values to the end of a list, but the insert() list method can add a value anywhere in the list. The insert() method takes an integer index of where in the list to insert the value and the value to be inserted as its arguments. Enter the following into the interactive shell to see how the insert() method works:

>>> spam = [2, 4, 6, 8]
>>> spam.insert(0, 'hello')
>>> spam
['hello', 2, 4, 6, 8]
>>> spam.insert(2, 'world')
>>> spam
['hello', 2, 'world', 4, 6, 8]

In this example, we create a spam list and then insert the string 'hello' at the 0 index. As you can see, we can insert values at any existing index in the list, such as at index 2.

Merging the Message List into One String

We can use the SYMBOLS string to convert the symbol set index in charIndex to its corresponding character and insert that character at the beginning of the list at index 0.

65.                 blockMessage.insert(0, SYMBOLS[charIndex])
66.         message.extend(blockMessage)
67.     return ''.join(message)

This string is then returned from getTextFromBlocks().

Writing the encryptMessage() Function

The encryptMessage() function encrypts each block using the plaintext string in message along with the two-integer tuple of the public key stored in key, which is created with the readKeyFile() function we’ll write later in this chapter. The encryptMessage() function returns a list of encrypted blocks.

70. def encryptMessage(message, key, blockSize):
71.     # Converts the message string into a list of block integers, and then
72.     # encrypts each block integer. Pass the PUBLIC key to encrypt.
73.     encryptedBlocks = []
74.     n, e = key

Line 73 creates the encryptedBlocks variable, which starts as an empty list that will hold the integer blocks. Then line 74 assigns the two integers in key to the variables n and e. Now that we have the public key variables set up, we can perform math on each message block to encrypt.

To encrypt each block, we perform some math operations on it that result in a new integer, which is the encrypted block. We raise the block to the e power and then mod it by n using pow(block, e, n) on line 78.

76.     for block in getBlocksFromText(message, blockSize):
77.         # ciphertext = plaintext ^ e mod n
78.         encryptedBlocks.append(pow(block, e, n))
79.     return encryptedBlocks

The encrypted block integer is then appended to encryptedBlocks.

Writing the decryptMessage() Function

The decryptMessage() function on line 82 decrypts the blocks and returns the decrypted message string. It takes the list of encrypted blocks, the message length, the private key, and the block size as parameters.

The decryptedBlocks variable we set up on line 86 stores a list of the decrypted blocks, and using the multiple assignment trick, the two integers of the key tuple are placed in n and d, respectively.

82. def decryptMessage(encryptedBlocks, messageLength, key, blockSize):
83.     # Decrypts a list of encrypted block ints into the original message
84.     # string. The original message length is required to properly decrypt
85.     # the last block. Be sure to pass the PRIVATE key to decrypt.
86.     decryptedBlocks = []
87.     n, d = key

The math for decryption is the same as the encryption’s math except the integer block is being raised to d instead of e, as you can see on line 90.

88.     for block in encryptedBlocks:
89.         # plaintext = ciphertext ^ d mod n
90.         decryptedBlocks.append(pow(block, d, n))

The decrypted blocks along with the messageLength and blockSize parameters are passed to getTextFromBlocks() so that decryptMessage() returns the decrypted plaintext as a string on line 91.

91. return getTextFromBlocks(decryptedBlocks, messageLength, blockSize)

Now that you’ve learned about the math that makes encryption and decryption possible, let’s look at how the readKeyFile() function reads in the public and private key files to create tuple values that we passed to encryptMessage() and decryptMessage().

Reading in the Public and Private Keys from Their Key Files

The readKeyFile() function is called to read values from key files created with the makePublicPrivateKeys.py program, which we created in Chapter 23. The filename to open is passed to keyFilename, and the file must be in the same folder as the publicKeyCipher.py program.

Lines 97 to 99 open this file and read in the contents as a string into the content variable.

94. def readKeyFile(keyFilename):
95.     # Given the filename of a file that contains a public or private key,
96.     # return the key as a (n,e) or (n,d) tuple value.
97.     fo = open(keyFilename)
98.     content = fo.read()
99.     fo.close()
100.     keySize, n, EorD = content.split(',')
101.     return (int(keySize), int(n), int(EorD))

The key file stores the key size in bytes as n, and either e or d, depending on whether the key file is for an encryption key or decryption key. As you learned in the previous chapter, these values were stored as text and separated by commas, so we use the split() string method to split the string in content at the commas. The list that split() returns has three items in it, and the multiple assignment on line 100 places each of these items into the keySize, n, and EorD variables, respectively.

Recall that content was a string when it was read from the file, and the items in the list that split() returns will also be string values. To change these string values into integers, we pass the keySize, n, and EorD values to int(). The readKeyFile() function then returns three integers, int(keySize), int(n), and int(EorD), which you’ll use for encryption or decryption.

Writing the Encryption to a File

On line 104, the encryptAndWriteToFile() function calls encryptMessage() to encrypt the string with the key and creates the file that contains the encrypted contents.

104. def encryptAndWriteToFile(messageFilename, keyFilename, message,
       blockSize=None):
105.     # Using a key from a key file, encrypt the message and save it to a
106.     # file. Returns the encrypted message string.
107.     keySize, n, e = readKeyFile(keyFilename)

The encryptAndWriteToFile() function takes three string arguments: a filename to write the encrypted message in (messageFilename), a filename of the public key to use (keyFilename), and a message to be encrypted (message). The blockSize parameter is specified as the fourth argument.

The first step of the encryption process is to read in the values for keySize, n, and e from the key file by calling readKeyFile() on line 107.

The blockSize parameter has a default argument of None:

108.     if blockSize == None:
109.         # If blockSize isn't given, set it to the largest size allowed by
               the key size and symbol set size.
110.         blockSize = int(math.log(2 ** keySize, len(SYMBOLS)))

If no argument is passed for the blockSize parameter, the blockSize will be set to the largest size that the symbol set and key size will allow. Keep in mind that the equation 2^{key size} > symbol set size ^{block size} must be true. To calculate the largest possible block size, Python’s math.log() function is called to calculate the logarithm of 2^{key size} with a base of len(SYMBOLS) on line 110.

The mathematics of the public key cipher work correctly only if the key size is equal to or greater than the block size, so it’s essential that we check this on line 112 before proceeding.

111.     # Check that key size is large enough for the block size:
112.     if not (math.log(2 ** keySize, len(SYMBOLS)) >= blockSize):
113.         sys.exit('ERROR: Block size is too large for the key and symbol
             set size. Did you specify the correct key file and encrypted
             file?')

If keySize is too small, the program exits with an error message. The user must either decrease the value passed for blockSize or use a larger key.

Now that we have the n and e values for the key, we call the function encryptMessage() on line 115, which returns a list of integer blocks.

114. # Encrypt the message
115. encryptedBlocks = encryptMessage(message, (n, e), blockSize)

The encryptMessage() function expects a two-integer tuple for the key, which is why the n and e variables are placed inside a tuple that is then passed as the second argument for encryptMessage().

Next, we convert the encrypted blocks into a string that we can write to a file. We do this by joining the blocks into a string with each block separated with a comma. Using ','.join(encryptedBlocks) to do this won’t work because join() only works on lists with string values. Because encryptedBlocks is a list of integers, we have to first convert these integers to strings:

117.     # Convert the large int values to one string value:
118.     for i in range(len(encryptedBlocks)):
119.         encryptedBlocks[i] = str(encryptedBlocks[i])
120.     encryptedContent = ','.join(encryptedBlocks)

The for loop on line 118 iterates through each index in encryptedBlocks, replacing the integer at encryptedBlocks[i] with a string form of the integer. When the loop completes, encryptedBlocks should contain a list of string values instead of a list of integer values.

Then we can pass the list of string values in encryptedBlocks to the join() method, which returns the list’s strings joined together into a single string with each block separated by commas. Line 120 stores this combined string in the encryptedContent variable.

We also write the length of the message and the block size to the file in addition to the encrypted integer blocks:

122.     # Write out the encrypted string to the output file:
123.     encryptedContent = '%s_%s_%s' % (len(message), blockSize,
           encryptedContent)

Line 123 changes the encryptedContent variable to include the size of the message as an integer, len(message), followed by an underscore, the blockSize, another underscore, and finally, the encrypted integer blocks (encryptedContent).

The last step of the encryption process is to write the contents to the file. The filename provided by the messageFilename parameter is created with the call to open() on line 124. Note that if a file with this name already exists, the new file will overwrite it.

124.     fo = open(messageFilename, 'w')
125.     fo.write(encryptedContent)
126.     fo.close()
127.     # Also return the encrypted string:
128.     return encryptedContent

The string in encryptedContent is written to the file by calling the write() method on line 125. After the program is done writing the file’s contents, line 126 closes the file object in fo.

Finally, the string in encryptedContent is returned from the function encryptAndWriteToFile() on line 128. (This is so the code that calls the function can use this string to, for example, print it onscreen.)

Now you know how the encryptAndWriteToFile() function encrypts a message string and writes the results to a file. Let’s look at how the program uses the readFromFileAndDecrypt() function to decrypt an encrypted message.

Decrypting from a File

Similar to encryptAndWriteToFile(), the readFromFileAndDecrypt() function has parameters for the encrypted message file’s filename and the key file’s filename. Be sure to pass the filename of the private key for keyFilename, not the public key.

131. def readFromFileAndDecrypt(messageFilename, keyFilename):
132.     # Using a key from a key file, read an encrypted message from a file
133.     # and then decrypt it. Returns the decrypted message string.
134.     keySize, n, d = readKeyFile(keyFilename)

The first step is the same as encryptAndWriteToFile(): the readKeyFile() function is called to get the values for the keySize, n, and d variables.

The second step is to read in the contents of the file. Line 138 opens the messageFilename file for reading.

137.     # Read in the message length and the encrypted message from the file:
138.     fo = open(messageFilename)
139.     content = fo.read()
140.     messageLength, blockSize, encryptedMessage = content.split('_')
141.     messageLength = int(messageLength)
142.     blockSize = int(blockSize)

The read() method call on line 139 returns a string of the full contents of the file, which is what you would see if you opened the text file in a program like Notepad or TextEdit, copied the entire contents, and pasted it as a string value into your program.

Recall that the encrypted file’s format has three integers separated by underscores: an integer representing the message length, an integer for the block size used, and the encrypted integer blocks. Line 140 calls the split() method to return a list of these three values, and the multiple assignment trick places the three values into the messageLength, blockSize, and encryptedMessage variables, respectively.

Because the values returned by split() are strings, lines 141 and 142 use int() to change messageLength and blockSize to their integer form, respectively.

The readFromFileAndDecrypt() function also checks, on line 145, that the block size is equal to or less than the key size.

144.     # Check that key size is large enough for the block size:
145.     if not (math.log(2 ** keySize, len(SYMBOLS)) >= blockSize):
146.         sys.exit('ERROR: Block size is too large for the key and symbol
             set size. Did you specify the correct key file and encrypted
             file?')

This check should always pass, because if the block size was too large, it would have been impossible to create the encrypted file in the first place. Most likely the wrong private key file was specified for the keyFilename parameter, which means that the key would not have decrypted the file correctly anyway.

The encryptedMessage string contains many blocks joined together with commas, which we convert back to integers and store in the variable encryptedBlocks.

148.     # Convert the encrypted message into large int values:
149.     encryptedBlocks = []
150.     for block in encryptedMessage.split(','):
151.         encryptedBlocks.append(int(block))

The for loop on line 150 iterates over the list created from calling the split() method on encryptedMessage. This list contains strings of individual blocks. The integer form of these strings is appended to the encryptedBlocks list (which was an empty list on line 149) each time line 151 is executed. After the for loop on line 150 completes, the encryptedBlocks list should contain integer values of the numbers that were in the encryptedMessage string.

On line 154, the list in encryptedBlocks is passed to the decryptMessage() function along with messageLength, the private key (a tuple value of two integers n and d), and the block size.

153.     # Decrypt the large int values:
154.     return decryptMessage(encryptedBlocks, messageLength, (n, d),
           blockSize)

The decryptMessage() function on line 154 returns a single string value of the decrypted message, which itself is a value returned from readFileAndDecrypt().

Calling the main() Function

Finally, lines 159 and 160 call the main() function if publicKeyCipher.py is being run as a program instead of being imported as a module by another program.

157. # If publicKeyCipher.py is run (instead of imported as a module), call
158. # the main() function.
159. if __name__ == '__main__':
160. main()

That completes our discussion of how the publicKeyCipher.py program performs encryption and decryption using the public key cipher.

Summary

Congratulations, you’re done with the book! There’s no “Hacking the Public Key Cipher” chapter because there’s no simple attack against the public key cipher that wouldn’t take trillions of years.

In this chapter, the RSA algorithm was greatly simplified, but it’s still a real cipher used in professional encryption software. When you log into a website or buy something on the internet, for example, ciphers like this keep passwords and credit card numbers secret from anyone who might be intercepting your network traffic.

Although the basic mathematics used for professional encryption software is the same as that described in this chapter, you shouldn’t use this program to secure your secret files. The hacks against an encryption program like publicKeyCipher.py are very sophisticated, but they do exist. (For example, because the random numbers random.randint() created aren’t truly random and can be predicted, a hacker could figure out which numbers were used for the prime numbers of your private key.)

All the previous ciphers discussed in this book can be hacked and rendered worthless. In general, avoid writing your own cryptography code to secure your secrets, because you’ll probably make subtle mistakes in the implementation of these programs. Hackers and spy agencies can exploit these mistakes to hack your encrypted messages.

A cipher is secure only if everything but the key can be revealed while still keeping the message a secret. You cannot rely on a cryptanalyst’s not having access to the same encryption software or not knowing which cipher you used. Always assume that the enemy knows the system!

Professional encryption software is written by cryptographers who have spent years studying the mathematics and potential weaknesses of various ciphers. Even then, the software they write is inspected by other cryptographers to check for mistakes or potential weaknesses. You’re perfectly capable of learning about these cipher systems and cryptographic mathematics. It’s not about being the smartest hacker, but spending the time to study to become the most knowledgeable hacker.

I hope this book gave you the foundations you need to become an elite hacker and programmer. There’s a lot more to programming and cryptography than what this book covers, so I encourage you explore and learn more! I highly recommend The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography by Simon Singh, which is a great book about the general history of cryptography. You can go to https://www.nostarch.com/crackingcodes/ for a list of other books and websites to learn more about cryptography. Feel free to email me your programming or cryptography questions at [email protected] or post them on https://reddit.com/r/inventwithpython/.

Good luck!

24PROGRAMMING THE PUBLIC KEY CIPHER