The Invent with Python Blog

Mon 22 October 2018

A Curriculum for Python Packaging

Posted by Al Sweigart in python   

Python's packaging ecosystem is contains multitudes. It can be intimidating for new Python developers to try to crack into, especially since the history of Python packaging is one of rapid evolution. Writing a *helloworld.py* file and running it on your computer is simple, but getting it to run on someone else's computer (and doing this the "right" way) involves a tangle of terms, tools, and techniques. What are wheel files? What is distutils? Do I use distutils or easy_install or pip?

To get to the bottom of this myself, I've compiled a curriculum of PyCon talks, online documentation, and my own personal notes to finally get a complete handle on Python packaging. The following list is my recommended order: start from the beginning, and continue until whatever level of satisfaction you require. I've tried to put the easy/simple/quick resources first, and the rest become progressively more detailed and specialized.

I've also added my own summary and notes, which you may or may not find useful. If you have any comments about this curriculum, please email me or message me (and follow me!) on Twitter.

NOTE: The information in the official Python documentation for the distutils module is out of date (note that it says "Legacy version" in the title), as well as the Installing Python Modules (Legacy version) documentation. The latter was replaced by Installing Python Modules.

Python Packaging Curriculum

  1. Dustin Ingram - Inside the Cheeseshop: How Python Packaging Works - PyCon 2018
  2. Dave Forgac Share Your Code! Python Packaging Without Complication PyCon 2017
  3. Packaging a python library - Ionel Cristian Mărieș's Blog
  4. Testing & Packaging - Hynek Schlawack's Blog
  5. Grug make fire! Grug make wheel! by Russell Keith-Magee - PyCon Australia 2014
  6. Glyph - Shipping Software To Users With Python - PyCon 2016
  7. Mahmoud Hashemi - BayPiggies September 2017 at LinkedIn: The Packaging Gradient
  8. Kenneth Reitz - Pipenv: The Future of Python Dependency Management - PyCon 2018

1 - Dustin Ingram - Inside the Cheeseshop: How Python Packaging Works - PyCon 2018

Link to YouTube video.

My summary of this talk: A great history lesson on the Python Package Index (PyPI), it’s previous incarnations, and the distutils/setuptools/easy_install/twine tools for getting your packages on it.

  • History: The first package index was a web site called The Vaults of Parnassus, which was a simple, fairly unstructured, GeoCities-looking website.
  • Then came distutils (“distribution utilities”) in Python 1.6 gave us python setup.py ... Had a “just use Python code instead of a domain-specific language or anything fancy” attitude; very barebones.
  • python setup.py build became the standard, makefile-like way to build packages. To package source code for sharing, we used the now-familiar python setup.py sdist (“source distributions”)
  • python setup.py bdist (“build distributions”) would package up pre-built code for your OS/platform for you.
  • The problem was, before running python setup.py bdist, how do you get everything together for the bdist part? This was solved on Linux by package managers like RPM, but not on Windows/macOS/obscure platforms.
  • To solve this, the Python Package Index was made as a general Python package manager for all platforms.
  • PyPI is “Pie-pea-eye”, not “Pie-pie”. “Cheese shop” was the old name for PyPI.
  • setuptools was a monkey-patch (bad idea) to distutils to install package dependencies.
  • easy_install intrduced the “Egg Distribution”. Egg files are just zip files with some metadata files included in them. (Pythons, the snake, lay eggs.)
  • easy_install couldn’t uninstall stuff or tell you the packages you had installed.
  • pyinstall was created to solve these problems. You’ve probably never heard of this, because it was soon renamed to pip, which stands for “PIP Installs Packages”.
  • Pip ignores eggs, and only handles source distributions. Pip begins to be used to install applications, not just library modules.
  • Pip introduces requirements.txt, which contains a list of depended-on packages (and the specific version of the package) that the package depends on: pip install -r requirements.txt is used to install all these dependencies listed in the file.
  • Pip mainly depended on installing from whoever was hosting the package from their website. This was a problem; this random person’s website could be slow, or could be hacked so that it spread malware when you installed from it. So PyPI begins hosting modules on its own website.
  • There are still problems with source distributions that were solved with built distributions, but the Egg format was poorly defined, so Wheel files becomes the new Egg file: a way to distribute built modules. Wheel files (named after cheese wheels, a reference to cheese shop) are also like Egg files in that they are just zip files but they are more well-defined (see PEP 427) that learned from the mistakes of easy_install and Egg files.
  • Twine came out to solve the problem that python setup.py upload doesn’t use HTTPS. The name comes from using twine to bundle up boxes & packages. The name doesn’t really make a lot of sense. Twine doesn’t bundle stuff, so much as just upload already-bundled packages using encrypted HTTPS.
  • PyPI was showing its age, and so it was rewritten from scratch. This was the “warehouse” project, started in 2011 and became the new PyPI in 2018. Yay!
  • Current problems: Packaging is still kinda hard. There’s tons of tools and history. Read the Python Packaging Guide at https://packaging.python.org, and sampleproject at https://github.com/pypa/sampleproject (a nice skeleton project that follows best practices).
  • Current problems: Packaging is a little too easy; typo squatting and spamming PyPI. Solution: conda, a Python-agnostic packaging tool.
  • Current problems: Reproducible environments. Solution: pipfile and pipfile.lock from https://github.com/pypa/pipfile
  • Current problem: setup.py executes arbitrary code, which is a security/standards/maintenance concern.
  • Current problem: The “distutils/setuptools” dance; these are old/hard-to-maintain standard library modules. Updating them is hard to do. PEP 517 and PEP 518 detail solutions, which is where pyproject.toml files come in. These are files that specify dependencies, etc. in an agnostic way: distutils & setuptools can use pyproject.toml files to install dependencies, or some other tool can use them.

2 - Dave Forgac Share Your Code! Python Packaging Without Complication PyCon 2017

Link to YouTube video.

My summary of this talk: This video goes into detail about setup.py files, but otherwise it’s great in listing things that you should go off and learn more about. It also touches on the associated things you need to get together for a package (docs, tests, continuous integration, etc). The slides for this talk are available at daveops.com/pycon2017

  • The things you need to share your code with other people:
    • Package the code
    • Documentation
    • Source Hosting (GitHub, etc.)
    • Tests
    • Continuous Integration (CI)
    • License
    • Contributing guide
  • This is a lot stuff, but you can just use cookiecutter to make all of this stuff for you.
  • Terminology:
    • Module (Python code saved in a file)
    • Import Package (a folder that stores Python modules)
    • Distribution Package (a file that has bundled up your code into a shareable/installable file)
    • Source Distribution (source code that is shared with other people, include C source code for C extensions, these are built/compiled when the package is downloaded/installed)
    • Built Distribution (eggs and wheel files, wheel is the modern current one, these are pre-compiled so they only need to be unpacked when downloaded/installed)
  • Types of wheel files:
    • Universal wheels (contains only Python code that works with Python 2 and 3 and can be installed anywhere.)
    • Pure Python wheels (contains only Python code, but only for Python 2 or Python 3.)
    • Platform wheels (files that contain compiled code for a target platform/OS)
  • History (this was covered more in Dustin Ingram’s talk).
  • PyPA (Python Packaging Authority) starts creating standards for packaging in the Python ecosystem. PyPUG is the Python Packaging User Guide at https://packaging.python.org/
  • setup.py is just a Python file, but don’t do anything clever or add custom logic to it.
  • setup.cfg holds wheel settings and various other settings
  • MANIFEST.in will list any non-Python files (data files, configuration files, documentation, etc.) that need to be packaged up too.
  • README.rst is the Restructured Text file (note, not Markdown) that will be used for the package’s PyPI page.
  • PEP 440 covers the format to use for version strings.
  • setup.py is a file basically just calls setuptools.setup().
  • (There’s a lot of details about the various keyword arguments in setup.py's setuptools.setup() function call.)
  • Use piptools to manage the requirements.txt file.
  • Don’t use setup.py's own upload feature, use twine instead.
  • Develop Mode (any file changes made to your source won’t have to be reinstalled with pip): pip install -e .

3 - Packaging a python library - Ionel Cristian Mărieș's Blog

Link to the blog post.

My summary of this blog post: This is about packaging libraries for other Python software developers to use in their applications, not on packaging applications. It’s a good list of how to layout all the different parts of a library.

  • Put your Python module in /src/packagename, not in /packagename. This gives you “import parity”: it forces you to install the package just like your users would have to. Otherwise, if you run Python from the root of your repo, import packagename would import /src/packagename just fine, but this means you don’t have to go through the install process that your users would and you could miss potential errors they’d face.
  • Don’t import your package from your setup.py file, i.e. your setup.py shouldn’t have import packagename in it. You don’t want to do this, because if your package imports other dependencies, those dependencies might not be installed yet and this causes distribution installation errors.
  • Having a /src folder lets you just add graft src to your MANIFEST.in file, which is simple.
  • “Flat is better than nested” but not for data.
  • Don’t put your tests folder inside your src or packagename folder.
  • Whenever the setup.py file opens a file, it specifies an encoding. Note that for python 2/3 compatibility, it uses io.open() instead of open(), since Python 2’s open() doesn’t have an encoding keyword argument.
  • Don’t use python setup.py test to run your tests; that’s outdated. Travis-CI has become the de factor standard for running tests after checking in code. Tox is a good way to run tests locally.
  • Ionel has a sample project layout here: https://github.com/ionelmc/python-nameless
  • Use cookiecutter to generate these files (tox.ini, MANIFEST.in, etc.)

4 - Testing & Packaging - Hynek Schlawack's Blog

Link to the blog post.

My summary of this blog post: This blog article sort of assumes you are already familiar with tox and coverage.py. It doesn't really hold your hand with its examples. Though it did introduce me to detox, which is a drop-in replacement (rather, an addition) to tox that lets you run the different tox environment tests in parallel.

Put your modules in a separate src folder. This simplifies what you need in your setup.py:

    setup(
        ...
        packages=find_packages(where='src'),
        package_dir={'': 'src'},
    )

"Combined coverage" means not just measuring your code coverage with one version of Python, but across all the versions you test with tox. If you have 100% coverage in Python 3, but less with Python 2, then that affects your overall combined coverage percentage.

5 - Grug make fire! Grug make wheel! by Russell Keith-Magee - PyCon Australia 2014

Link to YouTube video.

My summary of this talk: Kind of old, being from 2014, but has some good basic information about setup.py. He also covers the difference between universal wheels and platform wheels. A good, basic overview of Python packaging, and I think this talk touched on things the other basic Python packaging talks didn’t touch on, so it makes sense to watch this along with other basic packaging talks.

  • Wheel files - Contains binary files & compiled code. This is much more ready-to-use than a source distribution (sdist).
  • Eggs were intended to be an executable file, not just a distribution file. Wheels are just for distribution, you don’t “execute a wheel file” or “run a wheel file”.
  • Russ goes into a project folder layout, which includes: README, LICENSE, package folder, docs folder, tests folder.
  • Other tools you might find useful: tox (for running unit tests on different versions of Python all at once), sphinx (for creating documentation, which can then be uploaded to readthedocs.org).
  • To put your module into a package, you need: setup.py, setup.cfg, MANIFEST.in.
  • Note: python setup.py bdist_wheel might come up with an error because the bdist_wheel command was used in older versions of pip. I believe python setup.py bdist replaces it, or you can run pip install wheel to get the bdis_wheel command just in case older scripts require it.
  • setup.cfg is a config file is optional, but includes options for setup.py so you don’t need to specify them as command line arguments.
  • MANIFEST.in describes what files need to be in the distributable that isn’t a .py file (docs, tests, etc.)
  • Use check-manifest can check the MANIFEST.in file.
  • Use bumpversion to be sure to update the version number in your file.
  • If your code is pure-Python, only runs on Python 2 & 3 and has no C extensions, you can create a “universal wheel” with python setup.py bdist_wheel --universal (or you can have a setup.cfg with universal set to 1 there).
  • If you do have C extensions, you need to create a wheel file.
  • If all else fails, you can use a source distribution python setup.py sdist, but any C extensions will have be compiled by the user.
  • Don’t use python setup.py register because it passes in your username/password unencrypted. Instead, log in to PyPI with your web browser and create it through the website.
  • Twine uses encryption, but also lets you test your package that you created with python setup.py before uploading it with Twine.
  • Bootstrapping pip: If you don’t have it, then (for pre Python 3.4) you can run python get-pip.py or (for Python 3.4 and later) you can run python -m ensurepip. But pip should automatically be included with Python 3.4 and later.
  • Python 3.4 also includes virtualenv.

6 - Glyph - Shipping Software To Users With Python - PyCon 2016

Link to YouTube video.

My summary of this talk: Getting software onto a production server or running on end user’s desktop still requires a few different steps, but the tools for doing this are out there and exist. This talk touches on many of those tools and the issues in using them. This talk also has a great set of “bad practices” that can help you avoid making some common mistakes.

  • “Package” in most software senses (like a Redhat Package Manager package) is called a “distribution” in Python (specifically a “distutils distribution”. “Package” in Python refers to those folders that contain a init.py and main.py and so on. They’re more of a namespace.
  • A distutils distribution comes, for example, in a wheel file.
  • PYTHONPATH was used in the past, but don’t use it today.
  • Don’t use sudo pip install. Don’t install stuff using your operating system’s Python installation (i.e. “System Python”).
  • Don’t use python setup.py install, instead use pip. For one, there’s no python setup.py uninstall
  • Virtualenv creates lightweight Python environments. They’re about a 1/10 of the full Python environment but safely isolated (for the most part).
  • Don’t have a C compiler on the server machine, since it’ll compile to different compiled bits in each deployment. The “Ana Karenina Principal” comes from a line in Tolstoy’s novel: “All happy families are alike, but all unhappy families are unhappy in different ways.” Also true of unhappy servers. Having a C compiler can also slow down your deployments. Don’t have build tools on your production server, which means you don’t want to ship source distributions to your servers.
  • For library module developers Requirements.txt should have exact versions specified.
  • Shipping software to end users is more rocky. (Let’s assume that getting Python is easy to install, and we can get pip. )
  • pip install --user can install stuff to the user’s home directory, but now they also have to configure their shell so that Python can find it. This is less than ideal for deploying applications.
  • GUI applications in Python: using py2app to get Python GUI applications on Macs. Py2App builds effectively a self-contained Python environment that is an application by itself.
  • py2exe and py2installer does the same thing on Windows.
  • cx_Freeze is cross-platformed.
  • These tools that can distribute GUI desktop applications to users tend to over-optimize: they read the source for import statements so they only include them and not everything. This goes against Python’s dynamic nature; namespace packages (import zope.interface, import flufl.enum) will likely break.
  • You can just import all of these modules in your initial script. It’s kind of tedious.
  • PyBee’s Briefcase project is good for packaging up stuff for various platforms.
  • pynsist is great for making Windows installers.
  • My takeaway: The Python packaging ecosystem is large and intimidating, but there’s a lot of really cool tools that have been made for taking on this problem from a variety of angles. It can be hard to know where to start, and there seems to be a “last mile” problem: it’s still not obvious to non-developers how to install Python software.

7 - Python Packaging User Guide

Link to documentation.

My summary of this website: This is the official Python packaging documentation, so it has complete information but may be a bit much all at once. I'd start here: Installing Packages

8 - Mahmoud Hashemi - BayPiggies September 2017 at LinkedIn: The Packaging Gradient

Link to YouTube video.

My summary of this talk: With some humor, Mahmoud covers a lot of the issues devs face when they don’t know about Python’s packaging ecosystem and start reinventing things from scratch. “The Packing Gradient” is his concept of how deep into the packaging ecosystem you need to get depending on how complicated your software is. There’s info about Anaconda & packaging that I didn’t see anywhere else. But in general, not directly related to how to do Python packaging.

  • https://speakerdeck.com/mhashemi/the-packaging-gradient-extended-edition
  • “Packaging” is getting your code into some kind of file to send to other people.
  • Packaging is something developers don’t consider until the end of development, but this is a mistake and leads to reinventing existing tools.
  • Packaging a standalone Python module (one .py file). “Standalone” means it only imports from the standard library. For example, bottle.py (which inspired Flask (and the name “Flask”)) is a single file. Standalone modules are easy to distribute and integrate, you basically just copy the file.
  • “Vendoring” is including software you didn’t write in your software.
  • “Artifacts” are the files you want your build process to produce: .dll or .exe for compiled stuff, or a .zip file or .whl file for a Python distribution.
  • Packaging a pure-Python package: Note that “package” in Python means a folder with a init.py file in it. They only have Python files in them. For example: Django, requests. These are easy to install with pip.
  • PIP stands for “PIP Installs Packages”
  • “Distribution” is a file (like a zip file) that contains zero or more Python packages. (“Distribution” is what Python calls what most package managers like RPM would call a “package”.)
  • Distributions are built by setuptools with a setup.py file. Making a distribution file is great for pure-Python modules.
  • “You can have multiple distributions providing the same package”, for example, PIL and Pillow where someone forked PIL to make Pillow. PIL wasn’t updated and Pillow has PIL’s code but with a new distribution name on PyPI but the same package names. That is, you can run pip install pillow but still use the old from PIL import Image code you’ve always been using before Pillow.
  • He mentions the left-pad incident which is a good case study to read on: https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/
  • “Python is slow” is misleading: One of Python’s strengths is its interoperability: after all, numpy isn’t slow.
  • If you distribute sdist (“source distributions”) you’ll be sending out uncompiled C code, which isn’t ready to use on the platform you’re installing to. We have wheel files: they have binary distributions (bdist) for compiled C code. Supports most Windows, Mac, and Linux. The “manylinux 1” tag is used as for generic Linux platform that wheel files target.
  • (Wheel files replace egg files, so you can forget about egg files. Egg files are obsolete.)
  • Note: He has python setup.py sdist bdist_wheel upload as the modern way to build and upload a Python package, but these days Twine replaces the upload command. (This talk was in September 2017.)
  • But wheel files aren’t what you use to install, say, applications on your phone or whatever.
  • “PyPI is not an app store” (it’s free for one thing, but also Python is too general purpose for just apps). Installing by pip from PyPI requires a working Python & pip installation, an internet connection, preinstalled system libraries (like lxml requires libxml2), and build tools for target packages (like gcc/clang). But more than anything, pip requires a knowledgeable software developer. Pip isn’t that great for end users who don’t know how to debug pip’s install error messages.
  • So how do we ship applications like Sublime Text or EVE Online or whatever?
  • PEX (Python EXecutable) results in a single runnable file. No setup or install step. You can wget this file and then just run it. It uses Python’s zipimport module. (zipapp is more or less PEX.) PEX is good for distributing standalone Python scripts. A 15-minute lightning talk on PEX: “WTF is PEX?” https://www.youtube.com/watch?v=NmpnGhRwsu0 (The lightning talk is a bit hard to follow, but I did learn about python . runs the main.py file in the current folder)
  • Anaconda has conda install your_application, for example, you can conda install postgres and conda install nginx or even install other versions of Python! Anaconda is a whole new ecosystem. It’s not just Python-specific.
  • Anaconda is a cross-platform, Python-first package management and social app store. It’s like a Python version of Steam, pkgsrc, and Nix/Enlambda.
  • Mahmoud likes to recommend conda install over pip install for non-developers.
  • Freezers: Provide installers, include Python. Dropbox, EVE Online, Civilization IV, any Kivy apps have a full runtime Python included with the app. Examples: cx_Freeze, PyInstaller, osnap, bbFreee, py2exe, py2app, pynsist, nuitka.
  • Freezers represent the best option for consumer software.
  • “Enterprise” Freezer like Omnibus. Example: GitLab. Best for shipping applications with multiple components (services, queues, etc)
  • Userspace images - include their own environment.
  • Containers - like userspace images but also with sandboxing. Example: Flatpak/Snappy, Docker. These only work on Linux, reuire a container runtime, has a “pull” step. Can easily be gigabytes in size.
  • Virtual machine - includes an entire OS kernel, along with everything containers have. Can work on any OS, is gigabytes in size. There is no “VM Image” app store, users have to download it directly from your site.
  • Hardware - ha! They just ship you the computer: they’re basically appliances that sit on a server rack. Think: routers that let you include your own Python code.

9 - Kenneth Reitz - Pipenv: The Future of Python Dependency Management - PyCon 2018

Link to YouTube video.

  • Packaging history: We had the Cheeseshop, which was only an index of packages; you had to host your packages on your own website. Packages were installed with python setup.py install but you couldn’t uninstall them. Easy_install made this process easier, but there was still no uninstall. From 2010 onward, pip & virtualenv & requirements.txt files were the norm.
  • Ruby doesn’t have virtual environments because they can have multiple versions of a package installed at the same time, unlike Python.
  • Other communities (Node.js’s yarn & npm, PHP’s Composer, Rust’s Cargo, Ruby’s Bundler) all use a lockfile.
  • Venv has downsides: it’s hard for new-comers to understand, its a manual process (though virtualenv-wrapper helps with this).
  • The requirements.txt file has a problem: it can represent what you want installed, and it can represent what you need installed. (pip-tools was created to ease the pain of installing packages you need.)
  • pip freeze shows you a pre-flattened list of required packages: it includes the project’s dependencies, but also the dependencies’ dependencies. So it can be hard to see which packages are really required for your project and what packages are installed because they are dependencies of those required packages. (This is what we mean by “pre-flattened”; you can’t tell which is which between dependencies and dependencies’ dependencies.)
  • You could just specify the few packages you need, but then you have non-deterministic builds: the subdependencies will be whatever latest version is out there. So building one day may have different (that is, newer) subdependency packages installed. This uncertainty and different behavior can cause hard-to-debug bugs.
  • The requirements.txt file can be a lockfile and not a lockfile depending on how you use it. We need to split this up into two files.
  • The pipfile is a new standard from the PyPA (Python Packaging Authority) (specifically from Donald Stufft) that is replacing the requirements.txt file. It’s written in TOML (an .ini-like language)
  • Example pipfile:
  • [[source]]
    url = "https://pypi.python.org/simple"
    verify_ssl = true
    name = "pypi"
    
    [packages]
    flask = "*"
    
    [dev-packages]
    pytest = "*"
    
  • Pipenv lets you use pipfile & pipfile.lock, it uses virtualenv but hides all the details of it from you. Pipenv mostly replaces virtualenvwrapper. The [dev-packages] section specifies what will be installed when pipenv install --dev is run.
  • Demo:
  • pipenv install requests (Creates the virtual environment, the pipfile, and the pipfile.lock file.)
    
  • Once this virtual environment is created by pipenv, you can run pipenv shell to activate it. The virutal environment exists in HOME/.virtualenvs. Running pipenv -- venv will show you where the virtual environment files are stored.
  • The created pipfile file only has the requests library in it, while the pipfile.lock file has all of its subdependencies with exact version numbers (and hashes, in case the package maintainer changes it but reuses the version number) for each one.
  • According to https://github.com/pypa/pipenv/issues/598 you should commit pipfile.lock to source control.
  • Pipenv is very folder-specific. If you move the project folder to a new location, you’ll lose the virtual environment. But that’s okay, because you can just run pipenv install and it will use the existing Pipfile.lock file to install all the modules again.
  • Running pipenv graph will give a nice dependency tree.
  • Running pipenv check will check for security vulnerabilities.
  • Running pipenv install --deploy will do ???. It also checks if Pipfile nad Pipfile.lock are in sync (they must be kept in sync with each other).
  • If you have an existing .venv folder in your project folder, than pipenv install will use it instead of the central folder for all virtual environments.
  • Running pipenv --three creates a virtual environment with Python 3. You can later run pipenv --two and it destroys the old virtual environment and creates a new one with Python 2.
  • Running pipenv lock -r will output a requirements.txt file.
  • Running pipenv sync will uninstall any packages you no longer need.
  • Pipenv does NOT replace setup.py. Pipenv is for applications, while setup.py is used for libraries. Kenneth would not check in the lockfile for libraries that could be targetting multiple versions of Python.
  • Don’t get pipenv confused with pyenv.