A Curriculum for Python Packaging
Mon 22 October 2018 Al Sweigart
Python's packaging ecosystem contains multitudes. It can be intimidating for new Python developers to try to crack into, especially given the rapid evolution of Python packaging. Writing a *helloworld.py* file and running it on your computer is simple, but getting it to run on someone else's computer (and doing this the "right" way) involves a tangle of terms, tools, and techniques. What are wheel files? What is distutils? Do I use distutils or easy_install or pip?
To get to the bottom of this myself, I've compiled a curriculum of PyCon talks, online documentation, and my own personal notes to finally get a complete handle on Python packaging. The following list is my recommended order: start from the beginning, and continue until whatever level of satisfaction you require. I've tried to put the easy/simple/quick resources first, and the rest become progressively more detailed and specialized.
I've also added my own summary and notes, which you may or may not find useful. If you have any comments about this curriculum, please email me or message me (and follow me!) on Twitter. (Note that this blog post was written in October 2018. I might update this every few months (or years).)
NOTE: The information in the official Python documentation for the distutils module is out of date (note that it says "Legacy version" in the title), as well as the Installing Python Modules (Legacy version) documentation. The latter was replaced by Installing Python Modules.
Python Packaging Curriculum
- Dustin Ingram - Inside the Cheeseshop: How Python Packaging Works - PyCon 2018
- Dave Forgac Share Your Code! Python Packaging Without Complication PyCon 2017
- Packaging a python library - Ionel Cristian Mărieș's Blog
- Testing & Packaging - Hynek Schlawack's Blog
- Grug make fire! Grug make wheel! by Russell Keith-Magee - PyCon Australia 2014
- Glyph - Shipping Software To Users With Python - PyCon 2016
- Mahmoud Hashemi - BayPiggies September 2017 at LinkedIn: The Packaging Gradient
- Kenneth Reitz - Pipenv: The Future of Python Dependency Management - PyCon 2018
The following are also packaging-related, but I haven't had time to organize them into this curriculum:
- Glyph - Shipping Software To Users With Python - PyCon 2016
- Elana Hashman - The Black Magic of Python Wheels - PyCon 2019
- Dave Forgac - Python Packaging from Init to Deploy - PyOhio 2015
- Samuel Roeca - Poetry: "dependency management and packaging made easy" - PyGotham 2019
- Andrew T Baker 5 ways to deploy your Python web app in 2017 PyCon 2017
- Katie McLaughlin - What is deployment, anyway? - DjangoCon 2021
- Chris Wilcox - Shipping your first Python package and automating future publishing - PyCon 2019
- Caitlin Rubin - Intentional Deployment: Best Practices for Feature Flag Management - PyCon 2019
- Hynek Schlawack - How to Write Deployment-friendly Applications - PyCon 2018
- Asheesh Laroia: Python packaging simplified, for end users, app developers - PyCon 2014
- Domen Kožar - Rethinking packaging, development and deployment - PyCon 2015
1 - Dustin Ingram - Inside the Cheeseshop: How Python Packaging Works - PyCon 2018
My summary of this talk: A great history lesson on the Python Package Index (PyPI), it’s previous incarnations, and the distutils/setuptools/easy_install/twine tools for getting your packages on it.
- History: The first package index was a web site called The Vaults of Parnassus, which was a simple, fairly unstructured, GeoCities-looking website.
- Then came distutils (“distribution utilities”) in Python 1.6 gave us
python setup.py ...
Had a “just use Python code instead of a domain-specific language or anything fancy” attitude; very barebones. python setup.py build
became the standard, makefile-like way to build packages. To package source code for sharing, we used the now-familiarpython setup.py sdist
(“source distributions”)python setup.py bdist
(“build distributions”) would package up pre-built code for your OS/platform for you.- The problem was, before running
python setup.py bdist
, how do you get everything together for the bdist part? This was solved on Linux by package managers like RPM, but not on Windows/macOS/obscure platforms. - To solve this, the Python Package Index was made as a general Python package manager for all platforms.
- PyPI is “Pie-pea-eye”, not “Pie-pie”. “Cheese shop” was the old name for PyPI.
setuptools
was a monkey-patch (bad idea) to distutils to install package dependencies.easy_install
intrduced the “Egg Distribution”. Egg files are just zip files with some metadata files included in them. (Pythons, the snake, lay eggs.)easy_install
couldn’t uninstall stuff or tell you the packages you had installed.pyinstall
was created to solve these problems. You’ve probably never heard of this, because it was soon renamed topip
, which stands for “PIP Installs Packages”.- Pip ignores eggs, and only handles source distributions. Pip begins to be used to install applications, not just library modules.
- Pip introduces requirements.txt, which contains a list of depended-on packages (and the specific version of the package) that the package depends on:
pip install -r requirements.txt
is used to install all these dependencies listed in the file. - Pip mainly depended on installing from whoever was hosting the package from their website. This was a problem; this random person’s website could be slow, or could be hacked so that it spread malware when you installed from it. So PyPI begins hosting modules on its own website.
- There are still problems with source distributions that were solved with built distributions, but the Egg format was poorly defined, so Wheel files becomes the new Egg file: a way to distribute built modules. Wheel files (named after cheese wheels, a reference to cheese shop) are also like Egg files in that they are just zip files but they are more well-defined (see PEP 427) that learned from the mistakes of easy_install and Egg files.
- Twine came out to solve the problem that
python setup.py upload
doesn’t use HTTPS. The name comes from using twine to bundle up boxes & packages. The name doesn’t really make a lot of sense. Twine doesn’t bundle stuff, so much as just upload already-bundled packages using encrypted HTTPS. - PyPI was showing its age, and so it was rewritten from scratch. This was the “warehouse” project, started in 2011 and became the new PyPI in 2018. Yay!
- Current problems: Packaging is still kinda hard. There’s tons of tools and history. Read the Python Packaging Guide at https://packaging.python.org, and sampleproject at https://github.com/pypa/sampleproject (a nice skeleton project that follows best practices).
- Current problems: Packaging is a little too easy; typo squatting and spamming PyPI. Solution: conda, a Python-agnostic packaging tool.
- Current problems: Reproducible environments. Solution: pipfile and pipfile.lock from https://github.com/pypa/pipfile
- Current problem: setup.py executes arbitrary code, which is a security/standards/maintenance concern.
- Current problem: The “distutils/setuptools” dance; these are old/hard-to-maintain standard library modules. Updating them is hard to do. PEP 517 and PEP 518 detail solutions, which is where pyproject.toml files come in. These are files that specify dependencies, etc. in an agnostic way: distutils & setuptools can use pyproject.toml files to install dependencies, or some other tool can use them.
2 - Dave Forgac Share Your Code! Python Packaging Without Complication PyCon 2017
My summary of this talk: This video goes into detail about setup.py files, but otherwise it’s great in listing things that you should go off and learn more about. It also touches on the associated things you need to get together for a package (docs, tests, continuous integration, etc). The slides for this talk are available at daveops.com/pycon2017
- The things you need to share your code with other people:
- Package the code
- Documentation
- Source Hosting (GitHub, etc.)
- Tests
- Continuous Integration (CI)
- License
- Contributing guide
- This is a lot stuff, but you can just use cookiecutter to make all of this stuff for you.
- Terminology:
- Module (Python code saved in a file)
- Import Package (a folder that stores Python modules)
- Distribution Package (a file that has bundled up your code into a shareable/installable file)
- Source Distribution (source code that is shared with other people, include C source code for C extensions, these are built/compiled when the package is downloaded/installed)
- Built Distribution (eggs and wheel files, wheel is the modern current one, these are pre-compiled so they only need to be unpacked when downloaded/installed)
- Types of wheel files:
- Universal wheels (contains only Python code that works with Python 2 and 3 and can be installed anywhere.)
- Pure Python wheels (contains only Python code, but only for Python 2 or Python 3.)
- Platform wheels (files that contain compiled code for a target platform/OS)
- History (this was covered more in Dustin Ingram’s talk).
- PyPA (Python Packaging Authority) starts creating standards for packaging in the Python ecosystem. PyPUG is the Python Packaging User Guide at https://packaging.python.org/
- setup.py is just a Python file, but don’t do anything clever or add custom logic to it.
- setup.cfg holds wheel settings and various other settings
- MANIFEST.in will list any non-Python files (data files, configuration files, documentation, etc.) that need to be packaged up too.
- README.rst is the Restructured Text file (note, not Markdown) that will be used for the package’s PyPI page.
- PEP 440 covers the format to use for version strings.
- setup.py is a file basically just calls
setuptools.setup()
. - (There’s a lot of details about the various keyword arguments in setup.py's setuptools.setup() function call.)
- Use piptools to manage the requirements.txt file.
- Don’t use setup.py's own upload feature, use twine instead.
- Develop Mode (any file changes made to your source won’t have to be reinstalled with pip):
pip install -e .
3 - Packaging a python library - Ionel Cristian Mărieș's Blog
My summary of this blog post: This is about packaging libraries for other Python software developers to use in their applications, not on packaging applications. It’s a good list of how to layout all the different parts of a library.
- Put your Python module in /src/packagename, not in /packagename. This gives you “import parity”: it forces you to install the package just like your users would have to. Otherwise, if you run Python from the root of your repo,
import packagename
would import /src/packagename just fine, but this means you don’t have to go through the install process that your users would and you could miss potential errors they’d face. - Don’t import your package from your setup.py file, i.e. your setup.py shouldn’t have
import packagename
in it. You don’t want to do this, because if your package imports other dependencies, those dependencies might not be installed yet and this causes distribution installation errors. - Having a /src folder lets you just add
graft src
to your MANIFEST.in file, which is simple. - “Flat is better than nested” but not for data.
- Don’t put your tests folder inside your src or packagename folder.
- Whenever the setup.py file opens a file, it specifies an encoding. Note that for python 2/3 compatibility, it uses
io.open()
instead ofopen()
, since Python 2’sopen()
doesn’t have anencoding
keyword argument. - Don’t use
python setup.py test
to run your tests; that’s outdated. Travis-CI has become the de factor standard for running tests after checking in code. Tox is a good way to run tests locally. - Ionel has a sample project layout here: https://github.com/ionelmc/python-nameless
- Use
cookiecutter
to generate these files (tox.ini, MANIFEST.in, etc.)
4 - Testing & Packaging - Hynek Schlawack's Blog
My summary of this blog post: This blog article sort of assumes you are already familiar with tox and coverage.py. It doesn't really hold your hand with its examples. Though it did introduce me to detox, which is a drop-in replacement (rather, an addition) to tox that lets you run the different tox environment tests in parallel.
Put your modules in a separate src folder. This simplifies what you need in your setup.py:
setup( ... packages=find_packages(where='src'), package_dir={'': 'src'}, )
"Combined coverage" means not just measuring your code coverage with one version of Python, but across all the versions you test with tox. If you have 100% coverage in Python 3, but less with Python 2, then that affects your overall combined coverage percentage.
5 - Grug make fire! Grug make wheel! by Russell Keith-Magee - PyCon Australia 2014
My summary of this talk: Kind of old, being from 2014, but has some good basic information about setup.py. He also covers the difference between universal wheels and platform wheels. A good, basic overview of Python packaging, and I think this talk touched on things the other basic Python packaging talks didn’t touch on, so it makes sense to watch this along with other basic packaging talks.
- Wheel files - Contains binary files & compiled code. This is much more ready-to-use than a source distribution (sdist).
- Eggs were intended to be an executable file, not just a distribution file. Wheels are just for distribution, you don’t “execute a wheel file” or “run a wheel file”.
- Russ goes into a project folder layout, which includes: README, LICENSE, package folder, docs folder, tests folder.
- Other tools you might find useful: tox (for running unit tests on different versions of Python all at once), sphinx (for creating documentation, which can then be uploaded to readthedocs.org).
- To put your module into a package, you need: setup.py, setup.cfg, MANIFEST.in.
- Note:
python setup.py bdist_wheel
might come up with an error because the bdist_wheel command was used in older versions of pip. I believepython setup.py bdist
replaces it, or you can runpip install wheel
to get the bdis_wheel command just in case older scripts require it. - setup.cfg is a config file is optional, but includes options for setup.py so you don’t need to specify them as command line arguments.
- MANIFEST.in describes what files need to be in the distributable that isn’t a .py file (docs, tests, etc.)
- Use check-manifest can check the MANIFEST.in file.
- Use bumpversion to be sure to update the version number in your file.
- If your code is pure-Python, only runs on Python 2 & 3 and has no C extensions, you can create a “universal wheel” with
python setup.py bdist_wheel --universal
(or you can have a setup.cfg with universal set to 1 there). - If you do have C extensions, you need to create a wheel file.
- If all else fails, you can use a source distribution
python setup.py sdist
, but any C extensions will have be compiled by the user. - Don’t use
python setup.py register
because it passes in your username/password unencrypted. Instead, log in to PyPI with your web browser and create it through the website. - Twine uses encryption, but also lets you test your package that you created with
python setup.py
before uploading it with Twine. - Bootstrapping pip: If you don’t have it, then (for pre Python 3.4) you can run
python get-pip.py
or (for Python 3.4 and later) you can runpython -m ensurepip
. But pip should automatically be included with Python 3.4 and later. - Python 3.4 also includes virtualenv.
6 - Glyph - Shipping Software To Users With Python - PyCon 2016
My summary of this talk: Getting software onto a production server or running on end user’s desktop still requires a few different steps, but the tools for doing this are out there and exist. This talk touches on many of those tools and the issues in using them. This talk also has a great set of “bad practices” that can help you avoid making some common mistakes.
- “Package” in most software senses (like a Redhat Package Manager package) is called a “distribution” in Python (specifically a “distutils distribution”. “Package” in Python refers to those folders that contain a init.py and main.py and so on. They’re more of a namespace.
- A distutils distribution comes, for example, in a wheel file.
- PYTHONPATH was used in the past, but don’t use it today.
- Don’t use
sudo pip install
. Don’t install stuff using your operating system’s Python installation (i.e. “System Python”). - Don’t use
python setup.py install
, instead use pip. For one, there’s nopython setup.py uninstall
- Virtualenv creates lightweight Python environments. They’re about a 1/10 of the full Python environment but safely isolated (for the most part).
- Don’t have a C compiler on the server machine, since it’ll compile to different compiled bits in each deployment. The “Ana Karenina Principal” comes from a line in Tolstoy’s novel: “All happy families are alike, but all unhappy families are unhappy in different ways.” Also true of unhappy servers. Having a C compiler can also slow down your deployments. Don’t have build tools on your production server, which means you don’t want to ship source distributions to your servers.
- For library module developers Requirements.txt should have exact versions specified.
- Shipping software to end users is more rocky. (Let’s assume that getting Python is easy to install, and we can get pip. )
- pip install --user can install stuff to the user’s home directory, but now they also have to configure their shell so that Python can find it. This is less than ideal for deploying applications.
- GUI applications in Python: using py2app to get Python GUI applications on Macs. Py2App builds effectively a self-contained Python environment that is an application by itself.
- py2exe and py2installer does the same thing on Windows.
- cx_Freeze is cross-platformed.
- These tools that can distribute GUI desktop applications to users tend to over-optimize: they read the source for import statements so they only include them and not everything. This goes against Python’s dynamic nature; namespace packages (import zope.interface, import flufl.enum) will likely break.
- You can just import all of these modules in your initial script. It’s kind of tedious.
- PyBee’s Briefcase project is good for packaging up stuff for various platforms.
- pynsist is great for making Windows installers.
- My takeaway: The Python packaging ecosystem is large and intimidating, but there’s a lot of really cool tools that have been made for taking on this problem from a variety of angles. It can be hard to know where to start, and there seems to be a “last mile” problem: it’s still not obvious to non-developers how to install Python software.
7 - Python Packaging User Guide
My summary of this website: This is the official Python packaging documentation, so it has complete information but may be a bit much all at once. I'd start here: Installing Packages
8 - Mahmoud Hashemi - BayPiggies September 2017 at LinkedIn: The Packaging Gradient
My summary of this talk: With some humor, Mahmoud covers a lot of the issues devs face when they don’t know about Python’s packaging ecosystem and start reinventing things from scratch. “The Packing Gradient” is his concept of how deep into the packaging ecosystem you need to get depending on how complicated your software is. There’s info about Anaconda & packaging that I didn’t see anywhere else. But in general, not directly related to how to do Python packaging.
- https://speakerdeck.com/mhashemi/the-packaging-gradient-extended-edition
- “Packaging” is getting your code into some kind of file to send to other people.
- Packaging is something developers don’t consider until the end of development, but this is a mistake and leads to reinventing existing tools.
- Packaging a standalone Python module (one .py file). “Standalone” means it only imports from the standard library. For example, bottle.py (which inspired Flask (and the name “Flask”)) is a single file. Standalone modules are easy to distribute and integrate, you basically just copy the file.
- “Vendoring” is including software you didn’t write in your software.
- “Artifacts” are the files you want your build process to produce: .dll or .exe for compiled stuff, or a .zip file or .whl file for a Python distribution.
- Packaging a pure-Python package: Note that “package” in Python means a folder with a init.py file in it. They only have Python files in them. For example: Django, requests. These are easy to install with pip.
- PIP stands for “PIP Installs Packages”
- “Distribution” is a file (like a zip file) that contains zero or more Python packages. (“Distribution” is what Python calls what most package managers like RPM would call a “package”.)
- Distributions are built by setuptools with a setup.py file. Making a distribution file is great for pure-Python modules.
- “You can have multiple distributions providing the same package”, for example, PIL and Pillow where someone forked PIL to make Pillow. PIL wasn’t updated and Pillow has PIL’s code but with a new distribution name on PyPI but the same package names. That is, you can run
pip install pillow
but still use the oldfrom PIL import Image
code you’ve always been using before Pillow. - He mentions the left-pad incident which is a good case study to read on: https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/
- “Python is slow” is misleading: One of Python’s strengths is its interoperability: after all, numpy isn’t slow.
- If you distribute sdist (“source distributions”) you’ll be sending out uncompiled C code, which isn’t ready to use on the platform you’re installing to. We have wheel files: they have binary distributions (bdist) for compiled C code. Supports most Windows, Mac, and Linux. The “manylinux 1” tag is used as for generic Linux platform that wheel files target.
- (Wheel files replace egg files, so you can forget about egg files. Egg files are obsolete.)
- Note: He has
python setup.py sdist bdist_wheel upload
as the modern way to build and upload a Python package, but these days Twine replaces the upload command. (This talk was in September 2017.) - But wheel files aren’t what you use to install, say, applications on your phone or whatever.
- “PyPI is not an app store” (it’s free for one thing, but also Python is too general purpose for just apps). Installing by pip from PyPI requires a working Python & pip installation, an internet connection, preinstalled system libraries (like lxml requires libxml2), and build tools for target packages (like gcc/clang). But more than anything, pip requires a knowledgeable software developer. Pip isn’t that great for end users who don’t know how to debug pip’s install error messages.
- So how do we ship applications like Sublime Text or EVE Online or whatever?
- PEX (Python EXecutable) results in a single runnable file. No setup or install step. You can wget this file and then just run it. It uses Python’s zipimport module. (zipapp is more or less PEX.) PEX is good for distributing standalone Python scripts. A 15-minute lightning talk on PEX: “WTF is PEX?” https://www.youtube.com/watch?v=NmpnGhRwsu0 (The lightning talk is a bit hard to follow, but I did learn about
python .
runs the main.py file in the current folder) - Anaconda has
conda install your_application
, for example, you canconda install postgres
andconda install nginx
or even install other versions of Python! Anaconda is a whole new ecosystem. It’s not just Python-specific. - Anaconda is a cross-platform, Python-first package management and social app store. It’s like a Python version of Steam, pkgsrc, and Nix/Enlambda.
- Mahmoud likes to recommend
conda install
overpip install
for non-developers. - Freezers: Provide installers, include Python. Dropbox, EVE Online, Civilization IV, any Kivy apps have a full runtime Python included with the app. Examples: cx_Freeze, PyInstaller, osnap, bbFreee, py2exe, py2app, pynsist, nuitka.
- Freezers represent the best option for consumer software.
- “Enterprise” Freezer like Omnibus. Example: GitLab. Best for shipping applications with multiple components (services, queues, etc)
- Userspace images - include their own environment.
- Containers - like userspace images but also with sandboxing. Example: Flatpak/Snappy, Docker. These only work on Linux, reuire a container runtime, has a “pull” step. Can easily be gigabytes in size.
- Virtual machine - includes an entire OS kernel, along with everything containers have. Can work on any OS, is gigabytes in size. There is no “VM Image” app store, users have to download it directly from your site.
- Hardware - ha! They just ship you the computer: they’re basically appliances that sit on a server rack. Think: routers that let you include your own Python code.
9 - Kenneth Reitz - Pipenv: The Future of Python Dependency Management - PyCon 2018
- Packaging history: We had the Cheeseshop, which was only an index of packages; you had to host your packages on your own website. Packages were installed with
python setup.py install
but you couldn’t uninstall them. Easy_install made this process easier, but there was still no uninstall. From 2010 onward, pip & virtualenv & requirements.txt files were the norm. - Ruby doesn’t have virtual environments because they can have multiple versions of a package installed at the same time, unlike Python.
- Other communities (Node.js’s yarn & npm, PHP’s Composer, Rust’s Cargo, Ruby’s Bundler) all use a lockfile.
- Venv has downsides: it’s hard for new-comers to understand, its a manual process (though virtualenv-wrapper helps with this).
- The requirements.txt file has a problem: it can represent what you want installed, and it can represent what you need installed. (pip-tools was created to ease the pain of installing packages you need.)
pip freeze
shows you a pre-flattened list of required packages: it includes the project’s dependencies, but also the dependencies’ dependencies. So it can be hard to see which packages are really required for your project and what packages are installed because they are dependencies of those required packages. (This is what we mean by “pre-flattened”; you can’t tell which is which between dependencies and dependencies’ dependencies.)- You could just specify the few packages you need, but then you have non-deterministic builds: the subdependencies will be whatever latest version is out there. So building one day may have different (that is, newer) subdependency packages installed. This uncertainty and different behavior can cause hard-to-debug bugs.
- The requirements.txt file can be a lockfile and not a lockfile depending on how you use it. We need to split this up into two files.
- The pipfile is a new standard from the PyPA (Python Packaging Authority) (specifically from Donald Stufft) that is replacing the requirements.txt file. It’s written in TOML (an .ini-like language)
- Example pipfile:
[[source]] url = "https://pypi.python.org/simple" verify_ssl = true name = "pypi" [packages] flask = "*" [dev-packages] pytest = "*"
pipenv install --dev
is run.pipenv install requests (Creates the virtual environment, the pipfile, and the pipfile.lock file.)
pipenv -- venv
will show you where the virtual environment files are stored.pipenv install
and it will use the existing Pipfile.lock file to install all the modules again.pipenv graph
will give a nice dependency tree.pipenv check
will check for security vulnerabilities.pipenv install --deploy
will do ???. It also checks if Pipfile nad Pipfile.lock are in sync (they must be kept in sync with each other)..venv
folder in your project folder, than pipenv install
will use it instead of the central folder for all virtual environments.pipenv --three
creates a virtual environment with Python 3. You can later run pipenv --two
and it destroys the old virtual environment and creates a new one with Python 2.pipenv lock -r
will output a requirements.txt file.pipenv sync
will uninstall any packages you no longer need.Updated 2021/12/12: New recorded talks added.