Help Review Code for the Cryptography and Code Breaking Book

I have the source code written for my next book, "Hacking Secret Ciphers with Python" (formerly "Become a Code Breaker with Python"). (You can download the rough draft.)

The programs all implement classic cryptographic ciphers (Caesar cipher, simple substitution, etc.) as well as programs that can crack them. I'm posting it for public review so people can make any suggestions for edits. The code is loaded on github here:

https://github.com/asweigart/codebreaker

These programs are for a book that teaches computer programming in Python 3 to complete beginners, as well as teaching amateur cryptography. My aim for the programs are:

  • Simple - Is the code arranged in a way that requires the least amount of prerequisite knowledge? Is there a chunk of code that should have more comments? Is there a comment that doesn't make sense?
  • Correctness - Is the cipher implemented correctly according to descriptions of the cipher?
  • Consistency - Is the naming and spacing of the code consistent with the other programs?

Things that are not goals for the source:

  • Security is not a primary concern. These are educational examples to teach basic cryptography concepts. Please withhold comments or criticisms that would be more appropriate for commercial encryption software. I understand that Python's random module is not suitable for crypto. I understand that the RSA block cipher should use CBC mode instead of ECB mode. These decisions have been made because the point of this book is to teach and be simple for the reader to understand without going into the arcana of cryptography.
  • Efficient and elegant code is also not a primary concern. For example, I don't use list comprehensions because it would add another programming topic to explain in the book. The code uses more lines than it probably needs too, but I'd rather have more-but-readable code than less-but-elegant code. If there's an order of magnitude improvement that I can make, I'll do it. Otherwise, a few extra seconds of processing time is worth having simpler code.
  • The code doesn't dogmatically pass PEP-8. "Meh."

Thanks to anyone who can offer their advice!

16 comments.

  1. Awesome. Any ideas when you’re planning to release this book? I’d be interested.

  2. I’m trying to shoot for a December 1st availability so it’ll be out for Christmas.

  3. Great concept. I’ll try to review more of the code over the next few days. I have an initial comment which should improve the performance of many of the brute force attacks.

    In detectEnglish.py, you use a Python list to store the English words. Using a dictionary will give a constant lookup time rather than linear as the set of words grows.

    ENGLISH_WORDS = {}
    with open(“dictionary.txt”) as dict_file:
    for word in dict_file.readlines():
    ENGLISH_WORDS[word.upper()] = None

    Using a test with the Frankenstein text gives an improvement from 250 seconds to 0.5 seconds on my laptop.

    if __name__ == “__main__”:
    import time
    with open(“frankenstein.txt”) as f:
    data = f.read()

    start = time.time()
    print isEnglish(data)
    end = time.time()
    print end – start
    print type(ENGLISH_WORDS)

    True
    0.414999961853

  4. Thanks! I had made the change to a dict before, but I guess somehow some old code got copied.

  5. Hi Al,

    In the detectEnglish.py,

    def isEnglish(message, wordPercentage=80):
    wordPercentage /= 100

    I think this should be
    wordPercentage /= 100.0
    otherwise Python’s integer division will just set wordPercentage to be 0 and all texts will be returned as True for the English check.

  6. Similar to the above comment,

    The return statement in getEnglishCount() also needs to use float division …
    # Return the fraction of matching words out of total words.
    return (matches * 1.0 / len(words))

    I also think the line
    message = nonLettersOrSpacePattern.sub(”, message)
    followed by
    message.split()
    will work as intended because you are removing the spaces before trying to split on whitespace and then lookup words.

    I think this illustrates that these functions would benefit from simple inline unit-testing.

    Even something as simple as …
    assert (isEnglish(“Only the English eat lots of Puddings”))
    assert (!isEnglish(“XXX YYY ZZZ SSS CCC”))

  7. Use functions for the encryptions!

    In caesarCipher.py, making a function for the cipher would make things cleaner, and allow people to experiment with it easily (python -i).

    Your “main” could be: print caesarEncrypt(“This is my secret message”)

    Since your key=13, this will allow caesarEncrypt(caesarEncrypt(“secret message”)) as well.

  8. Wish I could work out how to get formatted Python code in comments, but I agree wholeheartedly with Chris that the code would benefit from tweaks to make it easier to test.

    The implementation of the Caesar algorithm is probably clearer if modulo arithmetic is used …
    (substitute ~ with spaces)

    def caeser(message, key=13, mode=”encrypt”):
    ~# capitalize the string in message
    ~message = message.upper()

    ~translated = “”
    ~# run the encryption/decryption code on each symbol in the message string
    ~for symbol in message:
    ~~# get the number of the symbol
    ~~num = SYMBOLS.find(symbol)
    ~
    ~~# -1 means the symbol in the message was not found in SYMBOLS
    ~~if num != -1:
    ~~~# get the encrypted (or decrypted) number for this symbol
    ~~~if mode == ‘encrypt’:
    ~~~~num = (num + key) % len(SYMBOLS)
    ~~~else: # decrypt
    ~~~~num = (num – key) % len(SYMBOLS)

    ~~~translated += SYMBOLS[num]

    ~~else:
    ~~~# just add the symbol without encrypting/decrypting
    ~~~translated += symbol

    ~return translated

  9. Your statement “Security is not a primary concern” sounds questionable. Surely a book about cryptography needs to teach the reader about security. Otherwise it will further encourage naïvety and prolong the poor security that continues to plague the industry. Read what Bruce Schneier says about security and snake-oil security products.

  10. As I said above, “These are educational examples to teach basic cryptography concepts. Please withhold comments or criticisms that would be more appropriate for commercial encryption software.”

    I have read what Schneier says about security and snake-oil security products. This book isn’t a security product, it’s an educational tool to provide an introduction to basic cryptography concepts. Further, all but one of the cipher programs come with programs that can break those ciphers. No one is going to use a cipher they learn in one chapter when the next chapter demonstrates how broken it is.

    I’m going to guess you haven’t actually taken a serious look at the code before writing your comment.

  11. Nothing useful as I am a newbie, just wanted to say THIS LOOKS AWESOME. I am beyond excited.

    Thanks for all you do (directed at those offering comments, too!),

    Meg.

  12. “Efficient and elegant code is also not a primary concern”

    I think that you should also give the “less-but-elegant code” to make readers explore by themselves.

  13. I just wanted to let you know that this book looks great! Can’t wait to buy it!

  14. Your RSA related stuff makes me cry.

    First you call RSA a block cipher, then you suggest using CBC, and use ECB.
    It’s extremely unusual to encrypt the plaintext with RSA, since RSA does not support reasonable modes, the ciphertext is bigger than the plaintext, and RSA is very slow. In almost every case one should use hybrid encryption. i.e. encrypt a symmetric key (e.g. for AES) with RSA, and the actual message with AES.

    Your code doesn’t even use RSA encryption. It used textbook RSA, which lacks padding. RSA isn’t encryption if you leave out the padding.

    Reply from Al:

    W, thank you for raising the exact criticisms that I foresaw, pointed out as irrelevant ahead of time, and awaited for some too-busy-to-bother-reading-the-objectives nerd to make anyway.

  15. in coinFlips.py why not use

    for flips in range(0, 100):

    instead of the ‘while’ loop? The ‘for’ with range() is clear.

  16. Al, I just ran across this blog and I would like to help. I plan to read a chapter per day and pick up any errors that stand out.

    So far all I found in Chapter 1, in the practice set, I think from the context the word used should be decrypt, rather than encrypt.

    Chapter 1 – Page 12

    Using the cipher wheel or St. Cyr Slide, [encrypt: original] decrypt the following sentences with key 17.

    I hope this helps.

Post a comment.