In the XKCD comic “Up Goer Five” (https://xkcd.com/1133/), the webcomic’s artist Randall Munroe created a technical schematic for the Saturn V rocket using only the 1,000 most common English words. The comic breaks down all the technical jargon into sentences a young child could understand. But it also highlights why we can’t explain everything using simple terms: The explanation “Thing to help people escape really fast if there’s a problem and everything is on fire so they decide not to go to space” might be easier to understand for a lay audience than “Launch Escape System.” But it’s too verbose for NASA engineers to say in their day-to-day work. Even then, they’d probably rather use the acronym LES.
Although computer jargon can be confusing and intimidating for new programmers, it’s a necessary shorthand. Several terms in Python and software development have subtle differences in meaning, and even experienced developers sometimes carelessly use them interchangeably. The technical definitions for these terms can vary between programming languages, but this chapter covers the terms as they relate to Python. You’ll get a broad, albeit not deep, understanding of the programming language concepts behind them.
This chapter assumes you aren’t yet familiar with classes and object-oriented programming (OOP). I’ve limited the explanations for classes and other OOP jargon here, but the jargon is explained in more detail in Chapters 15 to 17.
As the number of programmers in a room approaches two, the likelihood of an argument about semantics approaches 100 percent. Language is fluid and humans are the masters of words rather than the other way around. Some developers might use terms slightly differently, but becoming familiar with these terms is still useful. This chapter explores these terms and how they compare with each other. If you need a glossary of terms in alphabetical order, you can rely on the official Python glossary at https://docs.python.org/3/glossary.html to provide canonical definitions.
No doubt, some programmers will read the definitions in this chapter and bring up special cases or exceptions that can be endlessly nitpicked. Rather than being a definitive guide, this chapter is intended to give you accessible definitions, even if they’re not comprehensive. As with everything in programming, there’s always more to learn.
The word python can have multiple meanings. The Python programming language gets its name from the British comedy group Monty Python, rather than the snake (although Python tutorials and documentation use both Monty Python and snake references). Similarly, Python can have two meanings in regard to computer programming.
When we say, “Python runs a program” or “Python will raise an exception,” we’re talking about the Python interpreter—the actual software that reads the text of a .py file and carries out its instructions. When we say, “the Python interpreter,” we’re almost always talking about CPython, the Python interpreter maintained by the Python Software Foundation, available at https://www.python.org. CPython is an implementation of the Python language—that is, software created to follow a specification—but there are others. Although CPython is written in the C programming language, Jython is written in Java for running Python scripts that are interoperable with Java programs. PyPy, a just-in-time compiler for Python that compiles as programs execute, is written in Python.
All of these implementations run source code written in the Python programming language, which is what we mean when we say, “This is a Python program” or “I’m learning Python.” Ideally, any Python interpreter can run any source code written in the Python language; however, in the real world there’ll be some slight incompatibilities and differences between interpreters. CPython is called the Python language’s reference implementation because if there’s a difference between how CPython and another interpreter interpret Python code, CPython’s behavior is considered canonical and correct.
In many early programming languages, a programmer had to instruct the program to allocate and then deallocate, or free, memory for data structures as needed. Manual memory allocation was the source of numerous bugs, such as memory leaks (where programmers forgot to free memory) or double-free bugs (where programmers freed the same memory twice, leading to data corruption).
To avoid these bugs, Python has garbage collection, a form of automatic memory management that tracks when to allocate and free memory so the programmer doesn’t have to. You can think of garbage collection as memory recycling, because it makes memory available for new data. For example, enter the following into the interactive shell:
>>> def someFunction():
... print('someFunction() called.')
... spam = ['cat', 'dog', 'moose']
...
>>> someFunction()
someFunction() called.
When someFunction()
is called, Python allocates memory for the list ['cat', 'dog', 'moose']
. The programmer doesn’t need to figure out how many bytes of memory to request because Python manages this automatically. Python’s garbage collector will free the local variables when the function call returns to make that memory available for other data. Garbage collection makes programming much easier and less bug-prone.
A literal is text in the source code for a fixed, typed-out value. In the following code example
>>> age = 42 + len('Zophie')
the 42
and 'Zophie'
text are integer and string literals. Think of a literal as a value that literally appears in source code text. Only the built-in data types can have literal values in Python source code, so the variable age
isn’t a literal value. Table 7-1 lists some example Python literals.
Table 7-1: Examples of Literals in Python
Literal | Data type |
42 | Integer |
3.14 | Float |
1.4886191506362924e+36 | Float |
"""Howdy!""" | String |
r'Green\Blue' | String |
[] | List |
{'name': 'Zophie'} | Dictionary |
b'\x41' | Bytes |
True | Boolean |
None | NoneType |
Nitpickers will argue that some of my choices aren’t literals based on the official Python language documentation. Technically, -5
isn’t a literal in Python because the language defines the negative symbol (-
) as an operator that operates on the 5
literal. In addition, True
, False
, and None
are considered Python keywords rather than literals, whereas []
and {}
are called displays or atoms depending on what part of the official documentation you’re looking at. Regardless, literal is a common term that software professionals will use for all of these examples.
Every programming language has its own keywords. The Python keywords are a set of names reserved for use as part of the language and cannot be used as variable names (that is, as identifiers). For example, you cannot have a variable named while
because while
is a keyword reserved for use in while
loops. The following are the Python keywords as of Python 3.9.
and | continue | finally | is | raise |
as | def | for | lambda | return |
assert | del | from | None | True |
async | elif | global | nonlocal | try |
await | else | if | not | while |
break | except | import | or | with |
class | False | in | pass | yield |
Note that the Python keywords are always in English and aren’t available in alternative languages. For example, the following function has identifiers written in Spanish, but the def
and return
keywords remain in English.
def agregarDosNúmeros(primerNúmero, segundoNúmero):
return primerNúmero + segundoNúmero
Unfortunately for the 6.5 billion people who don’t speak it, English dominates the programming field.
An object is a representation of a piece of data, such as a number, some text, or a more complicated data structure, such as a list or dictionary. All objects can be stored in variables, passed as arguments to function calls, and returned from function calls.
All objects have a value, identity, and data type. The value is the data the object represents, such as the integer 42
or the string 'hello'
. Although somewhat confusing, some programmers use the term value as a synonym for object, especially for simple data types like integers or strings. For example, a variable that contains 42
is a variable that contains an integer value, but we can also say it’s a variable that contains an integer object with a value of 42
.
An object is created with an identity that is a unique integer you can view by calling the id()
function. For example, enter the following code into the interactive shell:
>>> spam = ['cat', 'dog', 'moose']
>>> id(spam)
33805656
The variable spam
stores an object of the list data type. Its value is ['cat', 'dog', 'moose']
. Its identity is 33805656
, although the integer ID varies each time a program runs so you’ll likely get a different ID on your computer. Once created, an object’s identity won’t change for as long as the program runs. Although the data type and the object’s identity will never change, an object’s value can change, as we’ll see in this example:
>>> spam.append('snake')
>>> spam
['cat', 'dog', 'moose', 'snake']
>>> id(spam)
33805656
Now the list also contains 'snake'
. But as you can see from the id(spam)
call, its identity hasn’t changed and it’s still the same list. But let’s see what happens when you enter this code:
>>> spam = [1, 2, 3]
>>> id(spam)
33838544
The value in spam
has been overwritten by a new list object with a new identity: 33838544
instead of 33805656
. An identifier like spam
isn’t the same as an identity because multiple identifiers can refer to the same object, as is the case in this example of two variables that are assigned to the same dictionary:
>>> spam = {'name': 'Zophie'}
>>> id(spam)
33861824
>>> eggs = spam
>>> id(eggs)
33861824
The identities of the spam
and eggs
identifiers are both 33861824
because they refer to the same dictionary object. Now change the value of spam
in the interactive shell:
>>> spam = {'name': 'Zophie'}
>>> eggs = spam
1 >>> spam['name'] = 'Al'
>>> spam
{'name': 'Al'}
>>> eggs
2 {'name': 'Al'}
You’ll see that changes to spam
1 mysteriously also appear in eggs
2. The reason is that they both refer to the same object.
Without understanding that the =
assignment operator always copies the reference, not the object, you might introduce bugs by thinking that you’re making a duplicate copy of an object when really you’re copying the reference to the original object. Fortunately, this isn’t an issue for immutable values like integers, strings, and tuples for reasons that I’ll explain in “Mutable and Immutable” on page 114.
You can use the is
operator to compare whether two objects have the same identity. In contrast, the ==
operator checks only whether object values are the same. You can consider x is y
to be shorthand for id(x) == id(y)
. Enter the following into the interactive shell to see the difference:
>>> spam = {'name': 'Zophie'}
1 >>> eggs = spam
>>> spam is eggs
True
>>> spam == eggs
True
2 >>> bacon = {'name': 'Zophie'}
>>> spam == bacon
True
>>> spam is bacon
False
The variables spam
and eggs
refer to the same dictionary object 1, so their identities and values are the same. But bacon
refers to a separate dictionary object 2, even though it contains data identical to spam
and eggs
. The identical data means bacon
has the same value as spam
and eggs
, but they’re two different objects with two different identities.
In Python, an object that is inside a container object, like a list or dictionary, is also called an item or an element. For example, the strings in the list ['dog', 'cat', 'moose']
are objects but are also called items.
As noted earlier, all objects in Python have a value, data type, and identity, and of these only the value can change. If you can change the object’s value, it’s a mutable object. If you can’t change its value, it’s an immutable object. Table 7-2 lists some mutable and immutable data types in Python.
Table 7-2: Some of Python’s Mutable and Immutable Data Types
Mutable data types | Immutable data types |
List | Integer |
Dictionaries | Floating-point number |
Sets | Boolean |
Bytearray | String |
Array | Frozen set |
Bytes | |
Tuple |
When you overwrite a variable, it might look like you’re changing the object’s value, as in this interactive shell example:
>>> spam = 'hello'
>>> spam
'hello'
>>> spam = 'goodbye'
>>> spam
'goodbye'
But in this code, you haven’t changed the 'hello'
object’s value from 'hello'
to 'goodbye'
. They’re two separate objects. You’ve only switched spam
from referring to the 'hello'
object to the 'goodbye'
object. You can check whether this is true by using the id()
function to show the two objects’ identities:
>>> spam = 'hello'
>>> id(spam)
40718944
>>> spam = 'goodbye'
>>> id(spam)
40719224
These two string objects have different identities (40718944 and 40719224) because they’re different objects. But variables that refer to mutable objects can have their values modified in-place. For example, enter the following into the interactive shell:
>>> spam = ['cat', 'dog']
>>> id(spam)
33805576
1 >>> spam.append('moose')
2 >>> spam[0] = 'snake'
>>> spam
['snake', 'dog', 'moose']
>>> id(spam)
33805576
The append()
method 1 and item assignment by indexing 2 both modify the value of the list in-place. Even though the list’s value has changed, its identity remains the same (33805576). But when you concatenate a list using the +
operator, you create a new object (with a new identity) that overwrites the old list:
>>> spam = spam + ['rat']
>>> spam
['snake', 'dog', 'moose', 'rat']
>>> id(spam)
33840064
List concatenation creates a new list with a new identity. When this happens, the old list will eventually be freed from memory by the garbage collector. You’ll have to consult the Python documentation to see which methods and operations modify objects in-place and which overwrite objects. A good rule to keep in mind is that if you see a literal in the source code, such as ['rat']
in the previous example, Python will most likely create a new object. A method that is called on the object, such as append()
, often modifies the object in-place.
Assignment is simpler for objects of immutable data types like integers, strings, or tuples. For example, enter the following into the interactive shell:
>>> bacon = 'Goodbye'
>>> id(bacon)
33827584
1 >>> bacon = 'Hello'
>>> id(bacon)
33863820
2 >>> bacon = bacon + ', world!'
>>> bacon
'Hello, world!'
>>> id(bacon)
33870056
3 >>> bacon[0] = 'J'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
Strings are immutable, so you cannot change their value. Although it looks like the string’s value in bacon
is being changed from 'Goodbye'
to 'Hello'
1, it’s actually being overwritten by a new string object with a new identity. Similarly, an expression using string concatenation creates a new string object 2 with a new identity. Attempting to modify the string in-place with item assignment isn’t allowed in Python 3.
A tuple’s value is defined as the objects it contains and the order of those objects. Tuples are immutable sequence objects that enclose values in parentheses. This means that items in a tuple can’t be overwritten:
>>> eggs = ('cat', 'dog', [2, 4, 6])
>>> id(eggs)
39560896
>>> id(eggs[2])
40654152
>>> eggs[2] = eggs[2] + [8, 10]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
But a mutable list inside an immutable tuple can still be modified in-place:
>>> eggs[2].append(8)
>>> eggs[2].append(10)
>>> eggs
('cat', 'dog', [2, 4, 6, 8, 10])
>>> id(eggs)
39560896
>>> id(eggs[2])
40654152
Although this is an obscure special case, it’s important to keep in mind. The tuple still refers to the same objects, as depicted in Figure 7-3. But if a tuple contains a mutable object and that object changes its value—that is, if the object mutates—the value of the tuple also changes.
I, and almost every Pythonista, call tuples immutable. But whether some tuples can be called mutable depends on your definition. I explore this topic more in my PyCascades 2019 talk, “The Amazing Mutable, Immutable Tuple” at https://invpy.com/amazingtuple/. You can also read Luciano Ramalho’s explanation in Chapter 2 of Fluent Python. (O’Reilly Media, 2015)
Python lists and dictionaries are values that can contain multiple other values. To access these values, you use an index operator, which is composed of a pair of square brackets ([
]
) and an integer called an index to specify which value you want to access. Enter the following into the interactive shell to see how indexing works with lists:
>>> spam = ['cat', 'dog', 'moose']
>>> spam[0]
'cat'
>>> spam[-2]
'dog'
In this example, 0
is an index. The first index is 0
, not 1
, because Python (as most languages do) uses zero-based indexing. Languages that use one-based indexing are rare: Lua and R are the most predominant. Python also supports negative indexes, where -1
refers to the last item in a list, -2
refers to the second-to-last item, and so on. You can think of a negative index spam[–n]
as being the same as spam[len(spam) – n]
.
You can also use the index operator on a list literal, although all those square brackets can look confusing and unnecessary in real-world code:
>>> ['cat', 'dog', 'moose'][2]
'moose'
Indexing can also be used for values other than lists, such as on a string to obtain individual characters:
>>> 'Hello, world'[0]
'H'
Python dictionaries are organized into key-value pairs:
>>> spam = {'name': 'Zophie'}
>>> spam['name']
'Zophie'
Although list indexes are limited to integers, a Python dictionary’s index operator is a key and can be any hashable object. A hash is an integer that acts as a sort of fingerprint for a value. An object’s hash never changes for the lifetime of the object, and objects with the same value must have the same hash. The string 'name'
in this instance is the key for the value 'Zophie'
. The hash()
function will return an object’s hash if the object is hashable. Immutable objects, such as strings, integers, floats, and tuples, can be hashable. Lists (as well as other mutable objects) aren’t hashable. Enter the following into the interactive shell:
>>> hash('hello')
-1734230105925061914
>>> hash(42)
42
>>> hash(3.14)
322818021289917443
>>> hash((1, 2, 3))
2528502973977326415
>>> hash([1, 2, 3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Although the details are beyond the scope of this book, the key’s hash is used to find items stored in a dictionary and set data structures. That’s why you can’t use a mutable list for a dictionary’s keys:
>>> d = {}
>>> d[[1, 2, 3]] = 'some value'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
A hash is different from an identity. Two different objects with the same value will have different identities but the same hash. For example, enter the following into the interactive shell:
>>> a = ('cat', 'dog', 'moose')
>>> b = ('cat', 'dog', 'moose')
>>> id(a), id(b)
(37111992, 37112136)
1 >>> id(a) == id(b)
False
>>> hash(a), hash(b)
(-3478972040190420094, -3478972040190420094)
2 >>> hash(a) == hash(b)
True
The tuples referred to by a
and b
have different identities 1, but their identical values mean they’ll have identical hashes 2. Note that a tuple is hashable if it contains only hashable items. Because you can use only hashable items as keys in a dictionary, you can’t use a tuple that contains an unhashable list as a key. Enter the following into the interactive shell:
>>> tuple1 = ('cat', 'dog')
>>> tuple2 = ('cat', ['apple', 'orange'])
>>> spam = {}
1 >>> spam[tuple1] = 'a value'
2 >>> spam[tuple2] = 'another value'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Notice that tuple1
is hashable 1, but tuple2
contains an unhashable list 2 and so is also unhashable.
The words container, sequence, and mapping have meanings in Python that don’t necessarily apply to other programming languages. In Python, a container is an object of any data type that can contain multiple other objects. Lists and dictionaries are common container types used in Python.
A sequence is an object of any container data type with ordered values accessible through integer indexes. Strings, tuples, lists, and bytes objects are sequence data types. Objects of these types can access values using integer indexes in the index operator (the [
and ]
brackets) and can also be passed to the len()
function. By “ordered,” we mean that there is a first value, second value, and so on in the sequence. For example, the following two list values aren’t considered equal because their values are ordered differently:
>>> [1, 2, 3] == [3, 2, 1]
False
A mapping is an object of any container data type that uses keys instead of an index. A mapping can be ordered or unordered. Dictionaries in Python 3.4 and earlier are unordered because there is no first or last key-value pair in a dictionary:
>>> spam = {'a': 1, 'b': 2, 'c': 3, 'd': 4} # This is run from CPython 3.5.
>>> list(spam.keys())
['a', 'c', 'd', 'b']
>>> spam['e'] = 5
>>> list(spam.keys())
['e', 'a', 'c', 'd', 'b']
You have no guarantee of getting items in a consistent order from dictionaries in early versions of Python. As a result of dictionaries’ unordered nature, two dictionary literals written with different orders for their key-value pairs are still considered equal:
>>> {'a': 1, 'b': 2, 'c': 3} == {'c': 3, 'a': 1, 'b': 2}
True
But starting in CPython 3.6, dictionaries do retain the insertion order of their key-value pairs:
>>> spam = {'a': 1, 'b': 2, 'c': 3, 'd': 4} # This is run from CPython 3.6.
>>> list(spam)
['a', 'b', 'c', 'd']
>>> spam['e'] = 5
>>> list(spam)
['a', 'b', 'c', 'd', 'e']
This is a feature in the CPython 3.6 interpreter but not in other interpreters for Python 3.6. All Python 3.7 interpreters support ordered dictionaries, which became standard in the Python language in 3.7. But just because a dictionary is ordered doesn’t mean that its items are accessible through integer indexes: spam[0]
won’t evaluate to the first item in an ordered dictionary (unless by coincidence there is a key 0
for the first item). Ordered dictionaries are also considered the same if they contain the same key-value pairs, even if the key-value pairs are in a different order in each dictionary.
The collections
module contains many other mapping types, including OrderedDict
, ChainMap
, Counter
, and UserDict
, which are described in the online documentation at https://docs.python.org/3/library/collections.html.
Dunder methods, also called magic methods, are special methods in Python whose names begin and end with two underscores. These methods are used for operator overloading. Dunder is short for double underscore. The most familiar dunder method is __init__()
(pronounced “dunder init dunder,” or simply “init”), which initializes objects. Python has a few dozen dunder methods, and Chapter 17 explains them in detail.
A module is a Python program that other Python programs can import so they can use the module’s code. The modules that come with Python are collectively called the Python Standard Library, but you can create your own modules as well. If you save a Python program as, for example, spam.py, other programs can run import spam
to access the spam.py program’s functions, classes, and top-level variables.
A package is a collection of modules that you form by placing a file named __init__.py inside a folder. You use the folder’s name as the name of the package. Packages can contain multiple modules (that is, .py files) or other packages (other folders containing __init__.py files).
For more explanation and detail about modules and packages, check out the official Python documentation at https://docs.python.org/3/tutorial/modules.html.
Functions and methods aren’t the only things that you can call in Python. Any object that implements the callable operator—the two parentheses ()
—is a callable object. For example, if you have a def hello():
statement, you can think of the code as a variable named hello
that contains a function object. Using the callable operator on this variable calls the function in the variable: hello()
.
Classes are an OOP concept, and a class is an example of a callable object that isn’t a function or method. For example, the date
class in the datetime
module is called using the callable operator, as in the code datetime.date(2020, 1, 1)
. When the class object is called, the code inside the class’s __init__()
method is run. Chapter 15 has more details about classes.
Functions are first-class objects in Python, meaning you can store them in variables, pass them as arguments in function calls, return them from function calls, and do anything else you can do with an object. Think of a def
statement as assigning a function object to a variable. For example, you could create a spam()
function that you can then call:
>>> def spam():
... print('Spam! Spam! Spam!')
...
>>> spam()
Spam! Spam! Spam!
You can also assign the spam()
function object to other variables. When you call the variable you’ve assigned the function object to, Python executes the function:
>>> eggs = spam
>>> eggs()
Spam! Spam! Spam!
These are called aliases, which are different names for existing functions. They’re often used if you need to rename a function. But a large amount of existing code uses the old name, and it would be too much work to change it.
The most common use of first-class functions is so you can pass functions to other functions. For example, we can define a callTwice()
function, which can be passed a function that needs to be called twice:
>>> def callTwice(func):
... func()
... func()
...
>>> callTwice(spam)
Spam! Spam! Spam!
Spam! Spam! Spam!
You could just write spam()
twice in your source code. But you can pass the callTwice()
function to any function at runtime rather than having to type the function call twice into the source code beforehand.
Technical jargon is confusing enough, especially for terms that have related but distinct definitions. To make matters worse, languages, operating systems, and fields in computing might use different terms to mean the same thing or the same terms to mean different things. To communicate clearly with other programmers, you’ll need to learn the difference between the following terms.
Expressions are the instructions made up of operators and values that evaluate to a single value. A value can be a variable (which contains a value) or a function call (which returns a value). So, 2 + 2
is an expression that evaluates down to the single value of 4
. But len(myName) > 4
and myName.isupper() or myName == 'Zophie'
are expressions as well. A value by itself is also an expression that evaluates to itself.
Statements are, effectively, all other instructions in Python. These include if
statements, for
statements, def
statements, return
statements, and so on. Statements do not evaluate to a value. Some statements can include expressions, such as an assignment statement like spam = 2 + 2
or an if
statement like if myName == 'Zophie':
.
Although Python 3 uses a print()
function, Python 2 instead has a print
statement. The difference might seem like just the introduction of parentheses, but it’s important to note that the Python 3 print()
function has a return value (which is always None
), can be passed as an argument to other functions, and can be assigned to a variable. None of these actions are possible with statements. However, you can still use the parentheses in Python 2, as in the following interactive shell example:
>>> print 'Hello, world!' # run in Python 2
Hello, world!
1 >>> print('Hello, world!') # run in Python 2
Hello, world!
Although this looks like a function call 1, it’s actually a print
statement with a string value wrapped in parentheses, the same way assigning spam = (2 + 2)
is equivalent to spam = 2 + 2
. In Python 2 and 3, you can pass multiple values to the print
statement or print()
function, respectively. In Python 3, this would look like the following:
>>> print('Hello', 'world') # run in Python 3
Hello world
But using this same code in Python 2 would be interpreted as passing a tuple of two string values in a print
statement, producing this output:
>>> print('Hello', 'world') # run in Python 2
('Hello', 'world')
A statement and an expression composed of a function call have subtle but real differences.
The terms block, clause, and body are often used interchangeably to refer to a group of Python instructions. A block begins with indentation and ends when that indentation returns to the previous indent level. For example, the code that follows an if
or for
statement is called the statement’s block. A new block is required following statements that end with a colon, such as if
, else
, for
, while
, def
, class
, and so on.
But Python does allow one-line blocks. This is valid, although not recommended, Python syntax:
if name == 'Zophie': print('Hello, kitty!')
By using the semicolon, you can also have multiple instructions in the if
statement’s block:
if name == 'Zophie': print('Hello, kitty!'); print('Do you want a treat?')
But you can’t have one-liners with other statements that require new blocks. The following isn’t valid Python code:
if name == 'Zophie': if age < 2: print('Hello, kitten!')
This is invalid because if an else
statement is on the next line, it would be ambiguous as to which if
statement the else
statement would refer to.
The official Python documentation prefers the term clause rather than block (https://docs.python.org/3/reference/compound_stmts.html). The following code is a clause:
if name == 'Zophie':
print('Hello, kitty!')
print('Do you want a treat?')
The if
statement is the clause header, and the two print()
calls nested in the if
are the clause suite or body. The official Python documentation uses block to refer to a piece of Python code that executes as a unit, such as a module, a function, or a class definition (https://docs.python.org/3/reference/executionmodel.html).
Variables are simply names that refer to objects. Attributes are, to quote the official documentation, “any name following a dot” (https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces). Attributes are associated with objects (the name before the dot/period). For example, enter the following into the interactive shell:
>>> import datetime
>>> spam = datetime.datetime.now()
>>> spam.year
2018
>>> spam.month
1
In this code example, spam
is a variable that contains a datetime
object (returned from datetime.datetime.now()
), and year
and month
are attributes of that object. Even in the case of, say, sys.exit()
, the exit()
function is considered an attribute of the sys
module object.
Other languages call attributes fields, properties, or member variables.
A function is a collection of code that runs when called. A method is a function (or a callable, described in the next section) that is associated with a class, just as an attribute is a variable associated with an object. Functions include built-in functions or functions associated with a module. For example, enter the following into the interactive shell:
>>> len('Hello')
5
>>> 'Hello'.upper()
'HELLO'
>>> import math
>>> math.sqrt(25)
5.0
In this example, len()
is a function and upper()
is a string method. Methods are also considered attributes of the objects they’re associated with. Note that a period doesn’t necessarily mean you’re dealing with a method instead of a function. The sqrt()
function is associated with math
, which is a module, not a class.
Python’s for
loops are versatile. The statement for i in range(3):
will run a block of code three times. The range(3)
call isn’t just Python’s way of telling a for
loop, “repeat some code three times.” Calling range(3)
returns a range object, just like calling list('cat')
returns a list object. Both of these objects are examples of iterableobjects (or simply, iterables).
You use iterables in for
loops. Enter the following into the interactive shell to see a for
loop iterate over a range object and a list object:
>>> for i in range(3):
... print(i) # body of the for loop
...
0
1
2
>>> for i in ['c', 'a', 't']:
... print(i) # body of the for loop
...
c
a
t
Iterables also include all sequence types, such as range, list, tuple, and string objects, but also some container objects, such as dictionary, set, and file objects.
However, more is going on under the hood in these for
loop examples. Behind the scenes, Python is calling the built-in iter()
and next()
functions for the for
loop. When used in a for
loop, iterable objects are passed to the built-in iter()
function, which returns iterator objects. Although the iterable object contains the items, the iterator object keeps track of which item is next to be used in a loop. On each iteration of the loop, the iterator object is passed to the built-in next()
function to return the next item in the iterable. We can call the iter()
and next()
functions manually to directly see how for
loops work. Enter the following into the interactive shell to perform the same instructions as the previous loop example:
>>> iterableObj = range(3)
>>> iterableObj
range(0, 3)
>>> iteratorObj = iter(iterableObj)
>>> i = next(iteratorObj)
>>> print(i) # body of the for loop
0
>>> i = next(iteratorObj)
>>> print(i) # body of the for loop
1
>>> i = next(iteratorObj)
>>> print(i) # body of the for loop
2
>>> i = next(iteratorObj)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
1 StopIteration
Notice that if you call next()
after the last item in the iterable has been returned, Python raises a StopIteration
exception 1. Instead of crashing your programs with this error message, Python’s for
loops catch this exception to know when they should stop looping.
An iterator can only iterate over the items in an iterable once. This is similar to how you can only use open()
and readlines()
to read the contents of a file once before having to reopen the file to read its contents again. If you want to iterate over the iterable again, you must call iter()
again to create another iterator object. You can create as many iterator objects as you want; each will independently track the next item it should return. Enter the following into the interactive shell to see how this works:
>>> iterableObj = list('cat')
>>> iterableObj
['c', 'a', 't']
>>> iteratorObj1 = iter(iterableObj)
>>> iteratorObj2 = iter(iterableObj)
>>> next(iteratorObj1)
'c'
>>> next(iteratorObj1)
'a'
>>> next(iteratorObj2)
'c'
Remember that iterable objects are passed as an argument to the iter()
function, whereas the object returned from iter()
calls is an iterator object. Iterator objects are passed to the next()
function. When you create your own data types with class
statements, you can implement the __iter__()
and __next__()
special methods to use your objects in for
loops.
There are many ways to categorize bugs. But at a high level you could divide programming errors into three types: syntax errors, runtime errors, and semantic errors.
Syntax is the set of rules for the valid instructions in a given programming language. A syntax error, such as a missing parenthesis, a period instead of a comma, or some other typo will immediately generate a SyntaxError
. Syntax errors are also known as parsing errors, which occur when the Python interpreter can’t parse the text of the source code into valid instructions. In English, this error would be the equivalent of having incorrect grammar or a string of nonsense words like, “by uncontaminated cheese certainly it’s.” Computers require specific instructions and can’t read the programmer’s mind to determine what the program should do, so a program with a syntax error won’t even run.
A runtime error is when a running program fails to perform some task, such as trying to open a file that doesn’t exist or dividing a number by zero. In English, a runtime error is the equivalent of giving an impossible instruction like, “Draw a square with three sides.” If a runtime error isn’t addressed, the program will crash and display a traceback. But you can catch runtime errors using try
-except
statements that run error handling code. For example, enter the following into the interactive shell:
>>> slices = 8
>>> eaters = 0
>>> print('Each person eats', slices / eaters, 'slices.')
This code will display this traceback when you run it:
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
print('Each person eats', slices / eaters, 'slices.')
ZeroDivisionError: division by zero
It’s helpful to remember that the line number the traceback mentions is only the point at which the Python interpreter detected an error. The true cause of the error might be on the previous line of code or even much earlier in the program.
Syntax errors in the source code are caught by the interpreter before the program runs, but syntax errors can also happen at runtime. The eval()
function can take a string of Python code and run it, which might produce a SyntaxError
at runtime. For example, eval('print("Hello, world)')
is missing a closing double quote, which the program won’t encounter until the code calls eval()
.
A semantic error (also called a logical error) is a more subtle bug. Semantic errors won’t cause error messages or crashes, but the computer carries out instructions in a way the programmer didn’t intend. In English, the equivalent of a semantic error would be telling the computer, “Buy a carton of milk from the store and if they have eggs, buy a dozen.” The computer would then buy 13 cartons of milk because the store had eggs. For better or worse, computers do exactly what you tell them to. For example, enter the following into the interactive shell:
>>> print('The sum of 4 and 2 is', '4' + '2')
You would get the following output:
The sum of 4 and 2 is 42
Obviously, 42 isn’t the answer. But notice that the program didn’t crash. Because Python’s +
operator adds integer values and concatenates string values, mistakenly using the string values '4'
and '2'
instead of integers caused unintended behavior.
Parameters are the variable names between the parentheses in a def
statement. Arguments are the values passed in a function call, which are then assigned to the parameters. For example, enter the following into the interactive shell:
1 >>> def greeting(name, species):
... print(name + ' is a ' + species)
...
2 >>> greeting('Zophie', 'cat')
Zophie is a cat
In the def
statement, name
and species
are parameters 1. In the function call, 'Zophie'
and 'cat'
are arguments 2. These two terms are often confused with each other. Remember that parameters and arguments are just other names for variables and values, respectively, when they are used in this context.
You can convert an object of one type to an object of another type. For example, int('42')
converts a string '42'
to an integer 42
. In actuality, the string object '42'
isn’t converted so much as the int()
function creates a new integer object based on the original object. When conversion is done explicitly like this, we’re casting the object, although programmers often still refer to this process as converting the object.
Python will often implicitly do a type conversion, such as when evaluating the expression 2 + 3.0
to 5.0
. Values, such as the 2
and 3.0
, are coerced to a common data type that the operator can work with. This conversion, which is done implicitly, is called type coercion.
Coercion can sometimes lead to surprising results. The Boolean True
and False
values in Python can be coerced to the integer values 1
and 0
, respectively. Although you’d never write Booleans as those values in real-world code, this means that the expression True + False + True
is the equivalent of 1 + 0 + 1
and evaluates to 2
. After learning this, you might think that passing a list of Booleans to sum()
would be a good way to count the number of True
values in a list. But it turns out that calling the count()
list method is faster.
In many languages, the terms property and attribute are used synonymously, but in Python these words have distinct meanings. An attribute, explained in “Variable vs. Attribute” on page 124, is a name associated with an object. Attributes include the object’s member variables and methods.
Other languages, such as Java, have getter
and setter
methods for classes. Instead of being able to directly assign an attribute a (potentially invalid) value, a program must call the setter
method for that attribute. The code inside the setter
method can ensure that the member variable only has a valid value assigned to it. The getter
method reads an attribute’s value. If an attribute is named, say, accountBalance
, the setter
and getter
methods are usually named setAccountBalance()
and getAccountBalance()
, respectively.
In Python, properties allow programmers to use getters and setters with much cleaner syntax. Chapter 17 explores Python properties in more detail.
Source code is compiled into a form of instructions called machine code that the CPU directly carries out. Machine code is composed of instructions from the CPU’s instruction set, the computer’s built-in set of commands. A compiled program composed of machine code is called a binary. A venerable language like C has compiler software that can compile C source code into binaries for almost every CPU available. But if a language such as Python wants to run on the same set of CPUs, a large amount of work would have to go into writing Python compilers for each of them.
There is another way of turning source code into machine-usable code. Instead of creating machine code that is carried out directly by CPU hardware, you could create bytecode. Also called portable code or p-code, bytecode is carried out by a software interpreter program instead of directly by the CPU. Python bytecode is composed of instructions from an instruction set, although no real-world hardware CPU carries out these instructions. Instead, the software interpreter executes the bytecode. Python bytecode is stored in the .pyc files you sometimes see alongside your .py source files. The CPython interpreter, which is written in C, can compile Python source code into Python bytecode and then carry out the instructions. (The same goes for the Java Virtual Machine [JVM] software, which carries out Java bytecode.) Because it’s written in C, CPython has a Python interpreter and can be compiled for any CPU that C already has a compiler for.
The PyCon 2016 talk, “Playing with Python Bytecode” by Scott Sanderson and Joe Jevnik, is an excellent resource to learn more about this topic (https://youtu.be/mxjv9KqzwjI).
The differences between a script and a program, or even a scripting language and a programming language, are vague and arbitrary. It’s fair to say that all scripts are programs and all scripting languages are programming languages. But scripting languages are sometimes regarded as easier or “not real” programming languages.
One way to distinguish scripts from programs is by how the code executes. Scripts written in scripting languages are interpreted directly from the source code, whereas programs written in programming languages are compiled into binaries. But Python is commonly thought of as a scripting language, even though there is a compilation step to bytecode when a Python program is run. Meanwhile, Java isn’t commonly thought of as a scripting language, even though it produces bytecode instead of machine code binaries, just like Python. Technically, languages aren’t compiled or interpreted; rather, there are compiler or interpreter implementations of a language, and it’s possible to create a compiler or interpreter for any language.
The differences can be argued but ultimately aren’t very important. Scripting languages aren’t necessarily less powerful, nor are compiled programming languages more difficult to work with.
Using other people’s code is a great time-saver. You can often find code to use packaged as libraries, frameworks, SDKs, engines, or APIs. The differences between these entities are subtle but important.
A library is a generic term for a collection of code made by a third party. A library can contain functions, classes, or other pieces of code for a developer to use. A Python library might take the form of a package, a set of packages, or even just a single module. Libraries are often specific to a particular language. The developer doesn’t need to know how the library code works; they only need to know how to call or interface with the code in a library. A standard library, such as the Python standard library, is a code library that is assumed to be available to all implementations of a programming language.
A framework is a collection of code that operates with inversion of control; the developer creates functions that the framework will call as needed, as opposed to the developer’s code calling functions in the framework. Inversion of control is often described as “don’t call us, we’ll call you.” For example, writing code for a web app framework involves creating functions for the web pages that the framework will call when a web request comes in.
A software development kit (SDK) includes code libraries, documentation, and software tools to assist in creating applications for a particular operating system or platform. For example, the Android SDK and iOS SDK are used to create mobile apps for Android and iOS, respectively. The Java Development Kit (JDK) is an SDK for creating applications for the JVM.
An engine is a large, self-contained system that can be externally controlled by the developer’s software. Developers usually call functions in an engine to perform a large, complex task. Examples of engines include game engines, physics engines, recommendation engines, database engines, chess engines, and search engines.
An application programming interface (API) is the public-facing interface for a library, SDK, framework, or engine. The API specifies how to call the functions or make requests of the library to access resources. The library creators will (hopefully) make documentation for the API available. Many popular social networks and websites make an HTTP API available for programs to access their services rather than a human with a web browser. Using these APIs allows you to write programs that can, for example, automatically post on Facebook or read Twitter timelines.
It’s easy to program for years and still be unfamiliar with certain programming terms. But most major software applications are created by teams of software developers, not individuals. So being able to communicate unambiguously is important when you’re working with a team.
This chapter explained that Python programs are made up of identifiers, variables, literals, keywords, and objects, and that all Python objects have a value, data type, and identity. Although every object has a data type, there are also several broad categories of types, such as container, sequence, mapping, set, built-in, and user-defined.
Some terms, like values, variables, and functions, have different names in specific contexts, such as items, parameters, arguments, and methods. Several terms are also easy to confuse with each other. It’s not a big deal to confuse some of these terms in day-to-day programming: for example, property versus attribute, block versus body, exception versus error, or the subtle differences between library, framework, SDK, engine, and API. Other misunderstandings won’t make the code you write wrong but might make you look unprofessional: for example, statement and expression, function and method, and parameter and argument are commonly used interchangeably by beginners.But other terms, such as iterable versus iterator, syntax error versus semantic error, and bytecode versus machine code, have distinct meanings that you should never confuse with each other unless you want to confuse your colleagues.
You’ll still find that the use of terms varies from language to language and even programmer to programmer. You’ll become more familiar with jargon with experience (and frequent web searches) in time.
The official Python glossary at https://docs.python.org/3/glossary.html lists short but helpful definitions the Python ecosystem uses. The official Python documentation at https://docs.python.org/3/reference/datamodel.html describes Python objects in greater detail.
Nina Zakharenko’s PyCon 2016 talk, “Memory Management in Python—The Basics,” at https://youtu.be/F6u5rhUQ6dU, explains many details about how Python’s garbage collector works. The official Python documentation at https://docs.python.org/3/library/gc.html has more information about the garbage collector.
The Python mailing list discussion about making dictionaries ordered in Python 3.6 makes for good reading as well and is at https://mail.python.org/pipermail/python-dev/2016-September/146327.html.