The Invent with Python Blog

Wed 05 June 2019

Esoteric Python Oddities

Posted by Al Sweigart in misc   

You’re unlikely to actually run into these cases, but these esoteric Python language oddities make for fun Python trivia. While not common in real-world code, they’re fun (ab)uses of the Python syntax to know about. Let’s have a little fun and explore some esoteric gotchas. Why 256 Is 256 But 257 Is Not 257 Remember from Chapter XX that the == operator compares value (that is, compares for equality) but the is operator compares identity. So, while the int value 42 and the float value 42.0 are equal values, they are two different objects held in separate places in the computer’s memory. This is confirmed by their different IDs from the id() function: >>> 42 == 42.0 # The integer and float are considered to be equal. True >>> 42 is 42.0 # The integer object and the float object are separate objects. False >>> id(42), id(42.0) (140718571382896, 2526629638888) The integers -5 to 256 are preallocated integers. When Python needs to create a new integer object and store it in memory, that object creation takes a small amount of time. As a small optimization, CPython (the Python interpreter specifically from https://python.org) creates integer objects for -5 to 256 in advance at the start of every program. These small integers are fairly common: a program is more likely to use the integer 0 or 2 than, say, 1729. When creating a new integer object in memory, CPython spends some extra time to check if it’s between -5 and 256. If so, CPython saves time by simply returning the existing integer object instead of creating a new one. This also saves memory, as duplicate small integers don’t need to be saved in memory like in Figure 7-3. Figure 7-3: Python saves memory by using multiple references to a single integer object (left) than separate, duplicate integer objects for each reference (right). Because of this optimization, you can see some bizarre results. Enter the following into the interactive shell: >>> spam1 = 256 >>> spam2 = 256 >>> spam1 is spam2 # All `256` objects are really the same object. True >>> eggs1 = 257 >>> eggs2 = 257 >>> eggs1 is eggs2 # Python created separate integer objects for `257`. False However, 257 is 257 evaluates to True because CPython reuses the 257 integer object made for identical literals in the same statement: >>> 257 is 257 True Of course, you should never use the is operator to compare integers (or floats, strings, and bools) because real-world programs should only be concerned with comparing for value equality, not identity equality. The exception to this is when you use is None instead of == None, as explained in the “Use “is” to Compare with None Instead of ==” section in Chapter XX. String Interning Similarly, Python also reuses string objects for identical string literals in your code, rather than separate copies. For example, enter the following into the interactive shell: >>> spam = 'cat' >>> eggs = 'cat' >>> spam is eggs # spam and eggs refer to the same string object. True >>> id(spam), id(eggs) # spam and eggs have the same ID. (1285806577904, 1285806577904) Python notices that the second 'cat' string literal is the same as the 'cat' string literal for spam, so instead of making a second, redundant string object it just assigns eggs a reference to the same string object that spam uses. This explains why the ID of their strings is the same: it’s the same string object. This optimization is called string interning and, like the preallocated integers, is just an implementation detail of CPython and you should never write code that relies on it. This optimization won’t catch every possible identical string either (trying to catch every possible time an optimization can be used could take up more time than the optimization saves). Try creating the 'cat' string from 'c' and 'at' in the interactive shell, and you’ll notice that the final 'cat' string was created as a new string object, rather than reusing the string object made for spam previously: >>> bacon = 'c' >>> bacon += 'at' >>> spam is bacon # The pieced together 'cat' is a separate string object. False >>> id(spam), id(bacon) (1285806577904, 1285808207384) String interning is an optimization technique used by interpreters and compilers for many different languages. Further details can be found at https://en.wikipedia.org/wiki/String_interning. Python’s Fake Increment and Decrement Operators In Python, you can increase the value of a variable by 1 or reduce it by 1 using the augmented assignment operators. The code spam += 1 and spam -= 1 increments and decrements the numeric values in spam by 1, respectively. Other languages such as C++ and JavaScript have the ++ and -- operators for incrementing and decrementing variables. (The name of C++ itself reflects this; the name is a tongue-in-cheek joke that indicates it’s an enhanced form of the C language.) Code in C++ and JavaScript could have ++spam or spam++. Python wisely doesn’t include these operators because they’re notoriously susceptible to subtle bugs (as discussed at https://softwareengineering.stackexchange.com/q/59880) However, it is perfectly legal to have the Python following code: >>> spam = 42 >>> spam = ++spam >>> spam 42 >>> spam = --spam >>> spam 42 The first thing you notice is that the ++ and -- “operators” in Python don’t actually increment or decrement the value in spam. Rather, the leading - is Python’s unary negation operator. It allows you to have code like this: >>> spam = 42 >>> -spam -42 It’s legal to have multiple unary negative operators in front of a value. With two of them, you’d get the negative of the negative of the value, which for integer values just evaluates to the original value: >>> spam = 42 >>> -(-spam) 42 This is a quite silly thing to do, and you won’t ever see a unary negation operator used twice in real-world code. (Though if you did, it’s probably because the programmer learned to program in another language has just written buggy Python code!) There is also a + unary operator. It evaluates an integer value to the same sign as the original value, which is to say, it does absolutely nothing: >>> spam = 42 >>> +spam 42 >>> spam = -42 >>> +spam -42 Being able to write +42 (or ++42) seems just as silly as --42, so why does Python even have this unary operator? It exists only to complement the - operator if you need to overload these operators for your own classes. (That’s a lot of terms you might not be familiar with! Operator overloading explained more in Chapter XX.) The + and - unary operators are only valid when in front of a Python value, not after it. While spam++ and spam-- might be legal code in C++ or JavaScript, they produce syntax errors in Python: >>> spam++ File "", line 1 spam++ ^ SyntaxError: invalid syntax Python doesn’t have increment and decrement operators, it’s only a quirk of the langauge syntax that can make it seem like it does. All of Nothing The all() built-in function accepts a sequence value such as a list and returns True if all of the values in that sequence are “truthy”. It returns False if one or more values are “falsey”. You can think of the function call all([False, True, True]) as equivalent to the expression False and True and True. You can use list comprehensions to create a list of Boolean values based off another list. Enter the following into the interactive shell: >>> spam = [67, 39, 20, 55, 13, 45, 44] >>> [i > 42 for i in spam] [True, False, False, True, False, True, True] >>> all([i > 42 for i in spam]) # True if all numbers in spam are > 42 False >>> eggs = [43, 44, 45, 46] >>> all([i > 42 for i in eggs]) # True if all numbers in eggs are > 42 True However, if you pass an empty sequence to all(), it always returns True. Enter the following into the interactive shell: >>> all([]) True Think of all([]) not as “all of the items in this list are truthy” but rather “none of the items in this list are falsey”. But this can still lead to some odd results. Enter the following into the interactive shell: >>> spam = [] >>> all([i > 42 for i in spam]) # All numbers in spam are > 42 True >>> all([i < 42 for i in spam]) # All numbers in spam are also < 42 True >>> all([i == 42 for i in spam]) # And all numbers in spam are == 42! True This code seems to be saying that not only are all the values in spam (an empty list) greater than 42, but also less than 42 and exactly equal to 42 at the same time! This seems logically impossible, but consider that each of these three list comprehensions evaluates to the empty list, which is why the all() function is returning True. Boolean Values Are Integer Values Just as the float value 42.0 compares as equal to the integer 42, Python considers the Boolean values True and False to be equivalent to 1 and 0, respectively. In Python, the bool data type is a subclass of the int data type. (Classes and subclasses are covered in Chapter XX.) You can use isinstance() to confirm that a Boolean value is considered a type of integer: >>> int(False) 0 >>> int(True) 1 >>> False == 0 True >>> True == 1 True >>> isinstance(True, bool) # True is a value of the bool data type. True >>> isinstance(True, int) # bool is a subclass of int, so True is also an int True This means you can use True and False in almost any place you can use integers. This can lead to some bizarre code: >>> True + False + True + True # Same as 1 + 0 + 1 + 1 3 >>> 42 * True # Same as 42 * 1 mathematical multiplication. 42 >>> 'hello' * False # Same as 'hello' * 0 string replication. 'h' >>> 'hello'[False] # Same as 'hello'[0] 'h' >>> 'hello'[True] # Same as 'hello'[1] 'e' >>> 'hello'[-True] # Same as 'hello'[-1] 'o' Of course, just because you can use bool values as numbers doesn’t mean you should. The previous examples are all unreadable and should never be used in real-world code. Originally, Python didn’t have a bool data type and they were only added in Python 2.3! It was made as a subclass of int to ease the implementation. The history of how the bool data type came to be this way is recorded in Python Enhancement Proposal 285 (“PEP 285”) at https://www.python.org/dev/peps/pep-0285/. Incidentally, True and False were only made keywords in Python 3. This means that in Python 2 it was possible to use True and False as variable names, leading to seemingly paradoxical code like this: Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> True is False False >>> True = False >>> True is False True Thankfully, this sort of confusing code isn’t possible in Python 3, which will raise a syntax error if you try to use True or False as a variable name. Chaining Multiple Kinds of Operators Like chaining the != operator, chaining different kinds of operators in the same expression can produce unexpected bugs. For example, this (admittedly unrealistic and contrived) example uses the == and in operators in a single expression: >>> False == False in [False] True This True result is surprising, because you would expect it to evaluate as either: (False == False) in [False], which is False. False == (False in [False]), which is also False. But False == False in [False] isn’t equivalent to either of these expressions. Rather, it’s equivalent to (False == False) and (False in [False]), just as 42 < spam < 99 is equivalent to (42 < spam) and (spam < 99). The False == False in [False] expression is a fun Python riddle, but it’s unlikely to come up in any real-world code. Python’s Antigravity Feature To enable Python’s antigravity feature, enter the following into the interactive shell: >>> import antigravity This line is just a fun Easter egg in Python that opens the web browser to a classic XKCD comic strip at https://xkcd.com/353/. Python’s webbrowser module has an open() function that finds your operating system’s default web browser and open a browser window to a specific URL. Enter the following into the interactive shell: >>> import webbrowser >>> webbrowser.open('https://xkcd.com/353/') The webbrowser module is limited, but can be useful for directing the user to further information on the internet. Summary It’s easy to forget that computers and programming languages are designed by humans and have their own limitations. Because so much software is built on top of and relies upon their creations, language designers and hardware engineers work incredibly hard to make sure that if you have a bug in your program, it’s because of your program and not the interpreter software or CPU hardware running it. We can end up taking them for granted. But this is why there is value in learning the odd nooks and crannies of computers and software. You can get very far just by writing code or gluing major software systems together without understanding them. But when things go wrong (or even just act weirdly and make you think, “That’s odd”) you’ll need to understand the common “gotchas” to debug these problems. You almost certainly won’t run into any of the issues brought up in this chapter, but being aware of these small details is what makes a Python programmer an experienced Python programmer.