No Executable Is An Island

Single-file executables are neither necessary nor sufficient to provide a good end-user software installation experience. They don’t work at all on macOS today, but they don't really work great anywhere else either. The focus of Python packaging tool development ought to be elsewhere.

One of the perennial talking points in the Python packaging discourse is that it’s unnecessarily difficult to create a simple, single-file binary that you can hand to users.

This complaint is understandable. In most other programming languages, the first thing you sit down to do is to invoke the compiler, get an executable, and run it. Other, more recently created programming languages — particularly Go and Rust — have really excellent toolchains for doing this which eliminate a lot of classes of error during that build-and-run process. A single, dependency-free, binary executable file that you can run is an eminently comprehensible artifact, an obvious endpoint for “packaging” as an endeavor.

While Rust and Go are producing these artifacts as effectively their only output, Python has such a dizzying array of tooling complexity that even attempting to describe “the problem” takes many thousands of words and one may not ever even get around to fully describing the complexity of the issues involved in the course of those words. All the more understandable, then, that we should urgently add this functionality to Python.

A programmer might see Python produce wheels and virtualenvs which can break when anything in their environment changes, and see the complexity of that situation. Then, they see Rust produce a statically-linked executable which “just works”, and they see its toolchain simplicity. I agree that this shouldn’t be so hard, and some of the architectural decisions that make this difficult in Python are indeed unfortunate.

But then, I think, our hypothetical programmer gets confused. They think that Rust is simple because it produces an executable, and they think Python’s complexity comes from all the packaging standards and tools. But this is not entirely true.

Python’s packaging complexity, and indeed some of those packaging standards, arises from the fact that it is often used as a glue language. Packaging pure Python is really not all that hard. And although the tools aren’t included with the language and the DX isn’t as smooth, even packaging pure Python into a single-file executable is pretty trivial.

But almost nobody wants to package a single pure-python script. The whole reason you’re writing in Python is because you want to rely on an enormous ecosystem of libraries, and some low but critical percentage of those libraries include things like their own statically-linked copies of OpenSSL or a few megabytes of FORTRAN code with its own extremely finicky build system you don’t want to have to interact with.

When you look aside to other ecosystems, while Python still definitely has some unique challenges, shipping Rust with a ton of FFI, or Go with a bunch of Cgo is substantially more complex than the default out-of-the-box single-file “it just works” executable you get at the start.¹

Still, all things being equal, I think single-file executable builds would be nice for Python to have as a building block. It’s certainly easier to produce a package for a platform if your starting point is that you have a known-good, functioning single-file executable and you all you need to do is embed it in some kind of environment-specific envelope. Certainly if what you want is to copy a simple microservice executable into a container image, you might really want to have this rather than setting up what is functionally a full Python development environment in your Dockerfile. After team-wide philosophical debates over what virtual environment manager to use, those Golang Dockerfiles that seem to be nothing but the following 4 lines are really appealing:

FROM golang
COPY *.go ./
RUN go build -o /app
ENTRYPOINT ["/app"]

All things are rarely equal, however.

The issue that we, as a community, ought to be trying to address with build tools is to get the software into users’ hands, not to produce a specific file format. In my opinion, single-file binary builds are not a great tool for this. They’re fundamentally not how people, even quite tech-savvy programming people, find and manage their tools.

A brief summary of the problems with single-file distributions:

They’re not discoverable. A single file linked on your website will not be found via something like brew search, apt search, choco search or searching in a platform’s GUI app store’s search bar.
They’re not updatable. People expect their system package manager to update stuff for them. Standalone binaries might add their own updaters, but now you’re shipping a whole software-update system inside your binary. More likely, it’ll go stale forever while better-packaged software will be updated and managed properly.
They have trouble packaging resources. Once you’ve got your code stuffed into a binary, how do you distribute images, configuration files, or other data resources along with it? This isn’t impossible to solve, but in other programming languages which do have a great single-file binary story, this problem is typically solved by third party tooling which, while it might work fine, will still generally exist in multiple alternative forms which adds its own complexity.

So while it might be a useful building-block that simplifies those annoying container builds a lot, it hardly solves the problem comprehensively.

If we were to build a big new tool, the tool we need is something that standardizes the input format to produce a variety of different complex, multi-file output formats, including things like:

deb packages (including uploading to PPA archives so people can add an apt line; a manual dpkg -i has many of the same issues as a single file)
container images (including the upload to a registry so that people can "$(shuf -n 1 -e nerdctl docker podman)" pull or FROM it)
Flatpak apps
Snaps
macOS apps
Microsoft store apps
MSI installers
Chocolatey / NuGet packages
Homebrew formulae

In other words, ask yourself, as a user of an application, how do you want to consume it? It depends what kind of tool it is, and there is no one-size-fits-all answer.

In any software ecosystem, if a feature is a building block which doesn’t fully solve the problem, that is an issue with the feature, but in many cases, that’s fine. We need lots of building blocks to get to full solutions. This is the story of open source.

However, if I had to take a crack at summing up the infinite-headed hydra of the Problem With Python Packaging, I’d put it like this:

Python has a wide array of tools which can be used to package your Python code for almost any platform, in almost any form, if you are sufficiently determined. The problem is that the end-to-end experience of shipping an application to end users who are not Python programmers² for any particular platform has a terrible user experience. What we need are more holistic solutions, not more building blocks.³

This makes me want to push back against this tendency whenever I see it, and to try to point towards more efficient ways to achieving a good user experience, with the relatively scarce community resources at our collective disposal⁴. Efficiency isn’t exclusively about ideal outcomes, though; it’s the optimization a cost/benefit ratio. In terms of benefits, it’s relatively low, as I hope I’ve shown above.

Building a tool that makes arbitrary Python code into a fully self-contained executable is also very high-cost, in terms of community effort, for a bunch of reasons. For starters, in any moderately-large collection of popular dependencies from PyPI, at least a few of them are going to want to find their own resources via __file__, and you need to hack in a way to find those, which is itself error prone. Python also expects dynamic linking in a lot of places, and messing around with C linkers to change that around is a complex process with its own huge pile of failure modes. You need to do this on pre-existing binaries built with options you can’t necessarily control, because making everyone rebuild all the binary wheels they find on PyPI is a huge step backwards in terms of exposing app developers to confusing infrastructure complexity.

Now, none of this is impossible. There are even existing tools to do some of the scarier low-level parts of these problems. But one of the reasons that all the existing tools for doing similar things have folk-wisdom reputations and even official documentation expecting constant pain is that part of the project here is conducting a full audit of every usage of __file__ on PyPI and replacing it with some resource-locating API which we haven’t even got a mature version of yet⁵.

Whereas copying all the files into the right spots in an archive file that can be directly deployed to an existing platform is tedious, but only moderately challenging. It usually doesn’t involve fundamentally changing how the code being packaged works, only where it is placed.

To the extent that we have a choice between “make it easy to produce a single-file binary without needing to understand the complexities of binaries” or “make it easy to produce a Homebrew formula / Flatpak build / etc without the user needing to understand Homebrew / Flatpak / etc”, we should always choose the latter.

If this is giving you déjà vu, I’ve gestured at this general concept more vaguely in a few places, including tweeting⁶ about it in 2019, saying vaguely similar stuff:

If you're making a packaging tool for Python, stop trying to make single-file executables. They are a pointless technical flourish. You need to build:
• .app bundles
• .deb/.rpm/flatpack packages
• container images
• .msi installers
• Homebrew formulæ
• a codesigning pipeline
— glyph ⎷⃣ (@glyph) August 5, 2019

Everything I’ve written here so far is debatable.

You can find that debate both in replies to that original tweet and in various other comments and posts elsewhere that I’ve grumbled about this. I still don’t agree with that criticism, but there are very clever people working on complex tools which are slowly gaining popularity and might be making the overall packaging situation better.

So while I think we should in general direct efforts more towards integrating with full-featured packaging standards, I don’t want to yuck anybody’s yum when it comes to producing clean single-file executables in general. If you want to build that tool and it turns out to be a more useful building block than I’m giving it credit for, knock yourself out.

However, in addition to having a comprehensive write-up of my previously-stated opinions here, I want to impart a more concrete, less debatable issue. To wit: single-file executables as a distribution mechanism, specifically on macOS is not only sub-optimal, but a complete waste of time.

Late last year, Hynek wrote a great post about his desire for, and experience of, packaging a single-file binary for multiple platforms. This should serve as an illustrative example of my point here. I don’t want to pick on Hynek. Prominent Python programmers wish for this all the time.. In fact, Hynek also did the thing I said is a good idea here, and did, in fact, create a Homebrew tap, and that’s the one the README recommends.

So since he kindly supplied a perfect case-study of the contrasting options, let’s contrast them!

The first thing I notice is that the Homebrew version is Apple Silicon native, whereas the single-file binary is still x86_64, as the brew build and test infrastructure apparently deals with architectural differences (probably pretty easy given it can use Homebrew’s existing Python build) but the more hand-rolled PyOxidizer setup builds only for the host platform, which in this case is still an Intel mac thanks to GitHub dragging their feet.

The second is that the Homebrew version runs as I expect it to. I run doc2dash in my terminal and I see Usage: doc2dash [OPTIONS] SOURCE, as I should.

So, A+ on the Homebrew tap. No notes. I did not have to know anything about Python being in the loop at all here, it “just works” like every Ruby, Clojure, Rust, or Go tool I’ve installed with the same toolchain.

Over to the single-file brew-less version.

Beyond the architecture being emulated and having to download Rosetta2⁸, I have to note that this “single file” binary already comes in a zip file, since it needs to include the license in a separate file.⁷ Now that it’s unarchived, I have some choices to make about where to put it on my $PATH. But let’s ignore that for now and focus on the experience of running it. I fire up a terminal, and run cd Downloads/doc2dash.x86_64-apple-darwin/ and then ./doc2dash.

Now we hit the more intractable problem:

The executable does not launch because it is neither code-signed nor notarized. I’m not going to go through the actual demonstration here, because you already know how annoying this is, and also, you can’t actually do it.

Code-signing is more or less fine. The codesign tool will do its thing, and that will change the wording in the angry dialog box from something about an “unidentified developer” to being “unable to check for malware”, which is not much of a help. You still need to notarize it, and notarization can’t work.

macOS really wants your executable code to be in a bundle (i.e., an App) so that it can know various things about its provenance and structure. CLI tools are expected to be in the operating system, or managed by a tool like brew that acts like a sort of bootleg secondary operating-system-ish thing and knows how to manage binaries.

If it isn’t in a bundle, then it needs to be in a platform-specific .pkg file, which is installed with the built-in Installer app. This is because apple cannot notarize a stand-alone binary executable file.

Part of the notarization process involves stapling an external “notarization ticket” to your code, and if you’ve only got a single file, it has nowhere to put that ticket. You can’t even submit a stand-alone binary; you have to package it in a format that is legible to Apple’s notarization service, which for a pure-CLI tool, means a .pkg.

What about corporate distributions of proprietary command-line tools, like the 1Password CLI? Oh look, their official instructions also tell you to use their Homebrew formula too. Homebrew really is the standard developer-CLI platform at this point for macOS. When 1Password distributes stuff outside of Homebrew, as with their beta builds, it’s stuff that lives in a .pkg as well.

It is possible to work around all of this.

I could open the unzipped file, right-click on the CLI tool, go to “Open”, get a subtly differently worded error dialog, like this…

…watch it open Terminal for me and then exit, then wait multiple seconds for it to run each time I want to re-run it at the command line. Did I mention that? The single-file option takes 2-3 seconds doing who-knows what (maybe some kind of security check, maybe pyoxidizer overhead, I don’t know) but the Homebrew version starts imperceptibly instantly.

Also, I then need to manually re-do this process in the GUI every time I want to update it.

If you know about the magic of how this all actually works, you can also do xattr -d com.apple.quarantine doc2dash by hand, but I feel like xattr -d is a step lower down in the user-friendliness hierarchy than python3 -m pip install⁹, and not only because making a habit of clearing quarantine attributes manually is a little like cutting the brake lines on Gatekeeper’s ability to keep out malware.

But the point of distributing a single-file binary is to make it “easy” for end users, and is explaining gatekeeper’s integrity verification accomplishing that goal?

Apple’s effectively mandatory code-signing verification on macOS is far out ahead of other desktop platforms right now, both in terms of its security and in terms of its obnoxiousness. But every mobile platform is like this, and I think that as everyone gets more and more panicked about malicious interference with software delivery, we’re going to see more and more official requirements that software must come packaged in one of these containers.

Microsoft will probably fix their absolute trash-fire of a codesigning system one day too. I predict that something vaguely like this will eventually even come to most Linux distributions. Not necessarily a prohibition on individual binaries like this, or like a GUI launch-prevention tool, but some sort of requirement imposed by the OS that every binary file be traceable to some sort of package, maybe enforced with some sort of annoying AppArmor profile if you don’t do it.

The practical, immediate message today is: “don’t bother producing a single-file binary for macOS users, we don’t want it and we will have a hard time using it”. But the longer term message is that focusing on creating single-file binaries is, in general, skating to where the puck used to be.

If we want Python to have a good platform-specific distribution mechanism for every platform, so it’s easy for developers to get their tools to users without having to teach all their users a bunch of nonsense about setuptools and virtualenvs first, we need to build that, and not get hung up on making a single-file executable packer a core part of the developer experience.

Thanks very much to my patrons for their support of writing like this, and software like these.

Oh, right. This is where I put the marketing “call to action”. Still getting the hang of these.

Did you enjoy this post and want me to write more like it, and/or did you hate it and want the psychological leverage and moral authority to tell me to stop and do something else? You can sign up here!

I remember one escapade in particular where someone had to ship a bunch of PKCS#11 providers along with a Go executable in their application and it was, to put it lightly, not a barrel of laughs. ↩
Shipping to Python programmers in a Python environment is kind of fine now, and has been for a while. ↩
Yet, even given my advance awareness of this criticism, despite my best efforts, I can’t seem to stop building little tools that poorly solve only one problem in isolation. ↩
And it does have to be at our collective disposal. Even the minuscule corner of this problem I’ve worked on, the aforementioned Mac code-signing and notarization stuff, is tedious and exhausting; nobody can take on the whole problem space, which is why I find writing about this is such an important part of the problem. Thanks Pradyun, and everyone else who has written about this at length! ↩
One of the sources of my anti-single-file stance here is that I tried very hard, for many years, to ensure that everything in Twisted was carefully zipimport-agnostic, even before pkg_resources existed, by using the from twisted.python.modules import getModule, getModule(__name__).filePath.sibling(...) idiom, where that .filePath attribute might be anything FilePath-like, specifically including ZipPath. It didn’t work; fundamentally, nobody cared, other contributors wouldn’t bother to enforce this, or even remember that it might be desirable, because they’d never worked in an environment where it was. Today, a quick git grep __file__ in Twisted turns up tons of usages that will make at least the tests fail to run in a single-file or zipimport environment. Despite the availability of zipimport itself since 2001, I have never seen tox or a tool like it support running with a zipimport-style deployment to ensure that this sort of configuration is easily, properly testable across entire libraries or applications. If someone really did want to care about single-file deployments, fixing this problem comprehensively across the ecosystem is probably one of the main things to start with, beginning with an international evangelism tour for importlib.resources. ↩
This is where the historical document is, given that I was using it at the time, but if you want to follow me now, please follow me on Mastodon. ↩
Oops, I guess we might as well not have bothered to make a single-file executable anyway! Once you need two files, you can put whatever you want in the zip file... ↩
Just kidding. Of course it’s installed already. It’s almost cute how Apple shows you the install progress to remind you that one day you might not need to download it whenever you get a new mac. ↩
There’s still technically a Python included in Xcode and the Xcode CLT, so functionally macs do have a /usr/bin/python3 that is sort of a python3.9. You shouldn’t really use it. Instead, download the python installer from python.org. But probably you should use it before you start disabling code integrity verification everywhere. ↩

Mac Python Distribution Post Updated for Catalina and Notarization

Notarize your Python apps for macOS Catalina.

python packaging Sunday October 13, 2019

I previously wrote a post about shipping a PyGame app to users on macOS. It’s now substantially updated for the new Notarization requirements in Catalina. I hope it’s useful to somebody!

Careful With That PyPI

PyPI credentials are important. Here are some tips for securing them a little better.

python programming security packaging pypi Sunday October 22, 2017

Too Many Secrets

A wise man once said, “you shouldn’t use ENV variables for secret data”. In large part, he was right, for all the reasons he gives (and you should read them). Filesystem locations are usually a better operating system interface to communicate secrets than environment variables; fewer things can intercept an open() than can read your process’s command-line or calling environment.

One might say that files are “more secure” than environment variables. To his credit, Diogo doesn’t, for good reason: one shouldn’t refer to the superiority of such a mechanism as being “more secure” in general, but rather, as better for a specific reason in some specific circumstance.

Supplying your PyPI password to tools you run on your personal machine is a very different case than providing a cryptographic key to a containerized application in a remote datacenter. In this case, based on the constraints of the software presently available, I believe an environment variable provides better security, if you use it correctly.

Popping A Shell By Any Other Name

If you upload packages to the python package index, and people use those packages, your PyPI password is an extremely high-privilege credential: effectively, it grants a time-delayed arbitrary code execution privilege on all of the systems where anyone might pip install your packages.

Unfortunately, the suggested mechanism to manage this crucial, potentially world-destroying credential is to just stick it in an unencrypted file.

The authors of this documentation know this is a problem; the authors of the tooling know too (and, given that these tools are all open source and we all could have fixed them to be better about this, we should all feel bad).

Leaving the secret lying around on the filesystem is a form of ambient authority; a permission you always have, but only sometimes want. One of the worst things about this is that you can easily forget it’s there if you don’t use these credentials very often.

The keyring is a much better place, but even it can be a slightly scary place to put such a thing, because it’s still easy to put it into a state where some random command could upload a PyPI release without prompting you. PyPI is forever, so we want to measure twice and cut once.

Luckily, even more secure places exist: password managers. If you use https://1password.com or https://www.lastpass.com, both offer command-line interfaces that integrate nicely with PyPI. If you use 1password, you’ll really want https://stedolan.github.io/jq/ (apt-get install jq, brew install jq) to slice & dice its command-line.

The way that I manage my PyPI credentials is that I never put them on my filesystem, or even into my keyring; instead, I leave them in my password manager, and very briefly toss them into the tools that need them via an environment variable.

First, I have the following shell function, to prevent any mistakes:

function twine () {
    echo "Use dev.twine or prod.twine depending on where you want to upload.";
    return 1;
}

For dev.twine, I configure twine to always only talk to my local DevPI instance:

function dev.twine () {
    env TWINE_USERNAME=root \
        TWINE_PASSWORD= \
        TWINE_REPOSITORY_URL=http://127.0.0.1:3141/root/plus/ \
        twine "$@";
}

This way I can debug Twine, my setup.py, and various test-upload things without ever needing real credentials at all.

But, OK. Eventually, I need to actually get the credentials and do the thing. How does that work?

1Password

1password’s command line is a little tricky to log in to (you have to eval its output, it’s not just a command), so here’s a handy shell function that will do it.

function opme () {
    # Log this shell in to 1password.
    if ! env | grep -q OP_SESSION; then
        eval "$(op signin "$(jq -r '.latest_signin' ~/.op/config)")";
    fi;
}

Then, I have this little helper for slicing out a particular field from the OP JSON structure:

function _op_field () {
    jq -r '.details.fields[] | select(.name == "'"${1}"'") | .value';
}

And finally, I use this to grab the item I want (named, memorably enough, “PyPI”) and invoke Twine:

function prod.twine () {
    opme;
    local pypi_item="$(op get item PyPI)";
    env TWINE_USERNAME="$(echo ${pypi_item} | _op_field username)" \
        TWINE_PASSWORD="$(echo "${pypi_item}" | _op_field password)" \
        twine "$@";
}

LastPass

For lastpass, you can just log in (for all shells; it’s a little less secure) via lpass login; if you’ve logged in before you often don’t even have to do that, and it will just prompt you when running command that require you to be logged in; so we don’t need the preamble that 1password’s command line did.

Its version of prod.twine looks quite similar, but its plaintext output obviates the need for jq:

function prod.twine () {
    env TWINE_USERNAME="$(lpass show PyPI --username)" \
        TWINE_PASSWORD="$(lpass show PyPI --password)" \
        twine "$@";
}

In Conclusion

“Keep secrets out of your environment” is generally a good idea, and you should always do it when you can. But, better a moment in your process environment than an eternity on your filesystem. Environment-based configuration can be a very useful stopgap for limiting the lifetimes of credentials when your tools don’t support more sophisticated approaches to secret storage.¹

Post Script

If you are interested in secure secret storage, my micro-project secretly might be of interest. Right now it doesn’t do a whole lot; it’s just a small wrapper around the excellent keyring module and the pinentry / pinentry-mac password prompt tools. secretly presents an interface both for prompting users for their credentials without requiring the command-line or env vars, and for saving them away in keychain, for tools that need to pull in an API key and don’t want to make the user manually edit a config file first.

Really, PyPI should have API keys that last for some short amount of time, that automatically expire so you don’t have to freak out if you gave somebody a 5-year-old laptop and forgot to wipe it first. But again, if I wanted that so bad, I should have implemented it myself... ↩

Python Packaging Is Good Now

setup.py is your friend. It’s real sorry about what happened last time.

python programming packaging desiderata Sunday August 14, 2016

Okay folks. Time’s up. It’s too late to say that Python’s packaging ecosystem terrible any more. I’m calling it.

Python packaging is not bad any more. If you’re a developer, and you’re trying to create or consume Python libraries, it can be a tractable, even pleasant experience.

I need to say this, because for a long time, Python’s packaging toolchain was … problematic. It isn’t any more, but a lot of people still seem to think that it is, so it’s time to set the record straight.

If you’re not familiar with the history it went something like this:

The Dawn

Python first shipped in an era when adding a dependency meant a veritable Odyssey into cyberspace. First, you’d wait until nobody in your whole family was using the phone line. Then you’d dial your ISP. Once you’d finished fighting your SLIP or PPP client, you’d ask a netnews group if anyone knew of a good gopher site to find a library that could solve your problem. Once you were done with that task, you’d sign off the Internet for the night, and wait about 48 hours too see if anyone responded. If you were lucky enough to get a reply, you’d set up a download at the end of your night’s web-surfing.

pip search it wasn’t.

For the time, Python’s approach to dependency-handling was incredibly forward-looking. The import statement, and the pluggable module import system, made it easy to get dependencies from wherever made sense.

In Python 2.0¹, Distutils was introduced. This let Python developers describe their collections of modules abstractly, and added tool support to producing redistributable collections of modules and packages. Again, this was tremendously forward-looking, if somewhat primitive; there was very little to compare it to at the time.

Fast forwarding to 2004; setuptools was created to address some of the increasingly-common tasks that open source software maintainers were facing with distributing their modules over the internet. In 2005, it added easy_install, in order to provide a tool to automate resolving dependencies and downloading them into the right locations.

The Dark Age

Unfortunately, in addition to providing basic utilities for expressing dependencies, setuptools also dragged in a tremendous amount of complexity. Its author felt that import should do something slightly different than what it does, so installing setuptools changed it. The main difference between normal import and setuptools import was that it facilitated having multiple different versions of the same library in the same program at the same time. It turns out that that’s a dumb idea, but in fairness, it wasn’t entirely clear at the time, and it is certainly useful (and necessary!) to be able to have multiple versions of a library installed onto a computer at the same time.

In addition to these idiosyncratic departures from standard Python semantics, setuptools suffered from being unmaintained. It became a critical part of the Python ecosystem at the same time as the author was moving on to other projects entirely outside of programming. No-one could agree on who the new maintainers should be for a long period of time. The project was forked, and many operating systems’ packaging toolchains calcified around a buggy, ancient version.

From 2008 to 2012 or so, Python packaging was a total mess. It was painful to use. It was not clear which libraries or tools to use, which ones were worth investing in or learning. Doing things the simple way was too tedious, and doing things the automated way involved lots of poorly-documented workarounds and inscrutable failure modes.

This is to say nothing of the fact that there were critical security flaws in various parts of this toolchain. There was no practical way to package and upload Python packages in such a way that users didn’t need a full compiler toolchain for their platform.

To make matters worse for the popular perception of Python’s packaging prowess², at this same time, newer languages and environments were getting a lot of buzz, ones that had packaging built in at the very beginning and had a much better binary distribution story. These environments learned lessons from the screw-ups of Python and Perl, and really got a lot of things right from the start.

Finally, the Python Package Index, the site which hosts all the open source packages uploaded by the Python community, was basically a proof-of-concept that went live way too early, had almost no operational resources, and was offline all the dang time.

Things were looking pretty bad for Python.

Intermission

Here is where we get to the point of this post - this is where popular opinion about Python packaging is stuck. Outdated information from this period abounds. Blog posts complaining about problems score high in web searches. Those who used Python during this time, but have now moved on to some other language, frequently scoff and dismiss Python as impossible to package, its packaging ecosystem as broken, PyPI as down all the time, and so on. Worst of all, bad advice for workarounds which are no longer necessary are still easy to find, which causes users to pre-emptively break their environments where they really don’t need to.

From The Ashes

In the midst of all this brokenness, there were some who were heroically, quietly, slowly fixing the mess, one gnarly bug-report at a time. pip was started, and its various maintainers fixed much of easy_install’s overcomplexity and many of its flaws. Donald Stufft stepped in both on Pip and PyPI and improved the availability of the systems it depended upon, as well as some pretty serious vulnerabilities in the tool itself. Daniel Holth wrote a PEP for the wheel format, which allows for binary redistribution of libraries. In other words, it lets authors of packages which need a C compiler to build give their users a way to not have one.

In 2013, setuptools and distribute un-forked, providing a path forward for operating system vendors to start updating their installations and allowing users to use something modern.

Python Core started distributing the ensurepip module along with both Python 2.7 and 3.3, allowing any user with a recent Python installed to quickly bootstrap into a sensible Python development environment with a one-liner.

A New Renaissance

I won’t give you a full run-down of the state of the packaging art. There’s already a website for that. I will, however, give you a précis of how much easier it is to get started nowadays. Today, if you want to get a sensible, up-to-date python development environment, without administrative privileges, all you have to do is:

$ python -m ensurepip --user
$ python -m pip install --user --upgrade pip
$ python -m pip install --user --upgrade virtualenv

Then, for each project you want to do, make a new virtualenv:

$ python -m virtualenv lets-go
$ . ./lets-go/bin/activate
(lets-go) $ _

From here on out, now the world is your oyster; you can pip install to your heart’s content, and you probably won’t even need to compile any C for most packages. These instructions don’t depend on Python version, either: as long as it’s up-to-date, the same steps work on Python 2, Python 3, PyPy and even Jython. In fact, often the ensurepip step isn’t even necessary since pip comes preinstalled. Running it if it’s unnecessary is harmless, even!

Other, more advanced packaging operations are much simpler than they used to be, too.

Need a C compiler? OS vendors have been working with the open source community to make this easier across the board:

$ apt install build-essential python-dev # ubuntu
$ xcode-select --install # macOS
$ dnf install @development-tools python-devel # fedora
C:\> REM windows
C:\> start https://www.microsoft.com/en-us/download/details.aspx?id=44266

Okay that last one’s not as obvious as it ought to be but they did at least make it freely available!

Want to upload some stuff to PyPI? This should do it for almost any project:

$ pip install twine
$ python setup.py sdist bdist_wheel
$ twine upload dist/*

Want to build wheels for the wild and wooly world of Linux? There’s an app⁴ for that.

Importantly, PyPI will almost certainly be online. Not only that, but a new, revamped site will be “launching” any day now³.

Again, this isn’t a comprehensive resource; I just want to give you an idea of what’s possible. But, as a deeply experienced Python expert I used to swear at these tools six times a day for years; the most serious Python packaging issue I’ve had this year to date was fixed by cleaning up my git repo to delete a cache file.

Work Still To Do

While the current situation is good, it’s still not great.

Here are just a few of my desiderata:

We still need better and more universally agreed-upon tooling for end-user deployments.
Pip should have a GUI frontend so that users can write Python stuff without learning as much command-line arcana.
There should be tools that help you write and update a setup.py. Or a setup.python.json or something, so you don’t actually need to write code just to ship some metadata.
The error messages that you get when you try to build something that needs a C compiler and it doesn’t work should be clearer and more actionable for users who don’t already know what they mean.
PyPI should automatically build wheels for all platforms by default when you upload sdists; this is a huge project, of course, but it would be super awesome default behavior.

I could go on. There are lots of ways that Python packaging could be better.

The Bottom Line

The real takeaway here though, is that although it’s still not perfect, other languages are no longer doing appreciably better. Go is still working through a number of different options regarding dependency management and vendoring, and, like Python extensions that require C dependencies, CGo is sometimes necessary and always a problem. Node has had its own well-publicized problems with their dependency management culture and package manager. Hackage is cool and all but everything takes a literal geological epoch to compile.

As always, I’m sure none of this applies to Rust and Cargo is basically perfect, but that doesn’t matter, because nobody reading this is actually using Rust.

My point is not that packaging in any of these languages is particularly bad. They’re all actually doing pretty well, especially compared to the state of the general programming ecosystem a few years ago; many of them are making regular progress towards user-facing improvements.

My point is that any commentary suggesting they’re meaningfully better than Python at this point is probably just out of date. Working with Python packaging is more or less fine right now. It could be better, but lots of people are working on improving it, and the structural problems that prevented those improvements from being adopted by the community in a timely manner have almost all been addressed.

Go! Make some virtualenvs! Hack some setup.pys! If it’s been a while and your last experience was really miserable, I promise, it’s better now.

Am I wrong? Did I screw up a detail of your favorite language? Did I forget to mention the one language environment that has a completely perfect, flawless packaging story? Do you feel the need to just yell at a stranger on the Internet about picayune details? Feel free to get in touch!

released in October, 2000 ↩
say that five times fast. ↩
although I’m not sure what it means to “launch” when the site is online, and running against the production data-store, and you can use it for pretty much everything... ↩
“app” meaning of course “docker container” ↩