DBXS 0.1.0

Today there is a new release of my database access and query organizer library with support for MySQL, PostgreSQL, and asyncio.

New Release

Yesterday I published a new release of DBXS for you all. It’s still ZeroVer, but it has graduated from double-ZeroVer as this is the first nonzero minor version.

More to the point though, the meaning of that version increment this version introduces some critical features that I think most people would need to give it a spin on a hobby project.

What’s New

It has support for MySQL and PostgreSQL using native asyncio drivers, which means you don’t need to take a Twisted dependency in production.
While Twisted is still used for some of the testing internals, Deferred is no longer exposed anywhere in the public API, either; your tests can happily pretend that they’re doing asyncio, as long as they can run against SQLite.
There is a new repository convenience function that automatically wires together multiple accessors and transaction discipline. Have a look at the docstring for a sense of how to use it.
Several papercuts, like confusing error messages when messing up query result handling, and lack of proper handling of default arguments in access protocols, are now addressed.

It’s A Good Time To Contribute!

If you’ve been looking for an open source project to try your hand at contributing to, DBXS might be a great opportunity, for a few reasons:

The team is quite small (just me, right now!), so it’s easy to get involved.
It’s quite generally useful, so there’s a potential for an audience, but right now it doesn’t really have any production users; there’s still time to change things without a lot of ceremony.
Unlike many other small starter projects, it’s got a test suite with 100% coverage, so you can contribute with confidence that you’re not breaking anything.
There’s not that much code (a bit over 2 thousand SLOC), so it’s not hard to get your head around.
There are a few obvious next steps for improvement, which I’ve filed as issues if you want to pick one up.

Share and enjoy, and please let me know if you do something fun with it.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics such as “How do I shot SQL?”.

Annotated At Runtime

PEP 593 is a bit vague on how you’re supposed to actually consume arguments to Annotated; here is my proposal.

python programming mypy supported Thursday December 07, 2023

PEP 0593 added the ability to add arbitrary user-defined metadata to type annotations in Python.

At type-check time, such annotations are… inert. They don’t do anything. Annotated[int, X] just means int to the type-checker, regardless of the value of X. So the entire purpose of Annotated is to provide a run-time API to consume metadata, which integrates with the type checker syntactically, but does not otherwise disturb it.

Yet, the documentation for this central purpose seems, while not exactly absent, oddly incomplete.

The PEP itself simply says:

A tool or library encountering an Annotated type can scan through the annotations to determine if they are of interest (e.g., using isinstance()).

But it’s not clear where “the annotations” are, given that the PEP’s entire “consuming annotations” section does not even mention the __metadata__ attribute where the annotation’s arguments go, which was only even added to CPython’s documentation. Its list of examples just show the repr() of the relevant type.

There’s also a bit of an open question of what, exactly, we are supposed to isinstance()-ing here. If we want to find arguments to Annotated, presumably we need to be able to detect if an annotation is an Annotated. But isinstance(Annotated[int, "hello"], Annotated) is both False at runtime, and also a type-checking error, that looks like this:

Argument 2 to "isinstance" has incompatible type "<typing special form>"; expected "_ClassInfo"

The actual type of these objects, typing._AnnotatedAlias, does not seem to have a publicly available or documented alias, so that seems like the wrong route too.

Now, it certainly works to escape-hatch your way out of all of this with an Any, build some version-specific special-case hacks to dig around in the relevant namespaces, access __metadata__ and call it a day. But this solution is … unsatisfying.

What are you looking for?

Upon encountering these quirks, it is understandable to want to simply ask the question “is this annotation that I’m looking at an Annotated?” and to be frustrated that it seems so obscure to straightforwardly get an answer to that question without disabling all type-checking in your meta-programming code.

However, I think that this is a slight misframing of the problem. Code that is inspecting parameters for an annotation is going to do something with that annotation, which means that it must necessarily be looking for a specific set of annotations. Therefore the thing we want to pass to isinstance is not some obscure part of the annotations’ internals, but the actual interesting annotation type from your framework or application.

When consuming an Annotated parameter, there are 3 things you probably want to know:

What was the parameter itself? (type: The type you passed in.)
What was the name of the annotated object (i.e.: the parameter name, the attribute name) being passed the parameter? (type: str)
What was the actual type being annotated? (type: type)

And the things that we have are the type of the Annotated we’re querying for, and the object with annotations we are interrogating. So that gives us this function signature:

def annotated_by(
    annotated: object,
    kind: type[T],
) -> Iterable[tuple[str, T, type]]:
    ...

To extract this information, all we need are get_args and get_type_hints; no need for __metadata__ or get_origin or any other metaprogramming. Here’s a recipe:

def annotated_by(
    annotated: object,
    kind: type[T],
) -> Iterable[tuple[str, T, type]]:
    for k, v in get_type_hints(annotated, include_extras=True).items():
        all_args = get_args(v)
        if not all_args:
            continue
        actual, *rest = all_args
        for arg in rest:
            if isinstance(arg, kind):
                yield k, arg, actual

It might seem a little odd to be blindly assuming that get_args(...)[0] will always be the relevant type, when that is not true of unions or generics. Note, however, that we are only yielding results when we have found the instance type in the argument list; our arbitrary user-defined instance isn’t valid as a type annotation argument in any other context. It can’t be part of a Union or a Generic, so we can rely on it to be an Annotated argument, and from there, we can make that assumption about the format of get_args(...).

This can give us back the annotations that we’re looking for in a handy format that’s easy to consume. Here’s a quick example of how you might use it:

@dataclass
class AnAnnotation:
    name: str

def a_function(
    a: str,
    b: Annotated[int, AnAnnotation("b")],
    c: Annotated[float, AnAnnotation("c")],
) -> None:
    ...

print(list(annotated_by(a_function, AnAnnotation)))

# [('b', AnAnnotation(name='b'), <class 'int'>),
#  ('c', AnAnnotation(name='c'), <class 'float'>)]

Acknowledgments

Get Your Mac Python From Python.org

There are many ways to get Python installed on macOS, but for most people the version that you download from Python.org is best.

python programming supported Tuesday August 29, 2023

One of the most unfortunate things about learning Python is that there are so many different ways to get it installed, and you need to choose one before you even begin. The differences can also be subtle and require technical depth to truly understand, which you don’t have yet.¹ Even experts can be missing information about which one to use and why.

There are perhaps more of these on macOS than on any other platform, and that’s the platform I primarily use these days. If you’re using macOS, I’d like to make it simple for you.

The One You Probably Want: Python.org

My recommendation is to use an official build from python.org.

I recommed the official installer for most uses, and if you were just looking for a choice about which one to use, you can stop reading now. Thanks for your time, and have fun with Python.

If you want to get into the nerdy nuances, read on.

For starters, the official builds are compiled in such a way that they will run on a wide range of macs, both new and old. They are universal2 binaries, unlike some other builds, which means you can distribute them as part of a mac application.

The main advantage that the Python.org build has, though, is very subtle, and not any concrete technical detail. It’s a social, structural issue: the Python.org builds are produced by the people who make CPython, who are more likely to know about the nuances of what options it can be built with, and who are more likely to adopt their own improvements as they are released. Third party builders who are focused on a more niche use-case may not realize that there are build options or environment requirements that could make their Pythons better.

I’m being a bit vague deliberately here, because at any particular moment in time, this may not be an advantage at all. Third party integrators generally catch up to changes, and eventually achieve parity. But for a specific upcoming example, PEP 703 will have extensive build-time implications, and I would trust the python.org team to be keeping pace with all those subtle details immediately as releases happen.

(And Auto-Update It)

The one downside of the official build is that you have to return to the website to check for security updates. Unlike other options described below, there’s no built-in auto-updater for security patches. If you follow the normal process, you still have to click around in a GUI installer to update it once you’ve clicked around on the website to get the file.

I have written a micro-tool to address this and you can pip install mopup and then periodically run mopup and it will install any security updates for your current version of Python, with no interaction besides entering your admin password.

(And Always Use Virtual Environments)

Once you have installed Python from python.org, never pip install anything globally into that Python, even using the --user flag. Always, always use a virtual environment of some kind. In fact, I recommend configuring it so that it is not even possible to do so, by putting this in your ~/.pip/pip.conf:

[global]
require-virtualenv = true

This will avoid damaging your Python installation by polluting it with libraries that you install and then forget about. Any time you need to do something new, you should make a fresh virtual environment, and then you don’t have to worry about library conflicts between different projects that you may work on.

If you need to install tools written in Python, don’t manage those environments directly, install the tools with pipx. By using pipx, you allow each tool to maintain its own set dependencies, which means you don’t need to worry about whether two tools you use have conflicting version requirements, or whether the tools conflict with your own code.²

The Others

There are, of course, several other ways to install Python, which you probably don’t want to use.

The One For Running Other People’s Code, Not Yours: Homebrew

In general, Homebrew Python is not for you.

The purpose of Homebrew’s python is to support applications packaged within Homebrew, which have all been tested against the versions of python libraries also packaged within Homebrew. It may upgrade without warning on just about any brew operation, and you can’t downgrade it without breaking other parts of your install.

Specifically for creating redistributable binaries, Homebrew python is typically compiled only for your specific architecture, and thus will not create binaries that can be used on Intel macs if you have an Apple Silicon machine, or will run slower on Apple Silicon machines if you have an Intel mac. Also, if there are prebuilt wheels which don’t yet exist for Apple Silicon, you cannot easily arch -x86_64 python ... and just install them; you have to install a whole second copy of Homebrew in a different location, which is a headache.

In other words, homebrew is an alternative to pipx, not to Python. For that purpose, it’s fine.

The One For When You Need 20 Different Pythons For Debugging: pyenv

Like Homebrew, pyenv will default to building a single-architecture binary. Even worse, it will not build a Framework build of Python, which means several things related to being a mac app just won’t work properly. Remember those build-time esoterica that the core team is on top of but third parties may not be? “Should I use a Framework build” is an enduring piece of said esoterica.

The purpose of pyenv is to provide a large matrix of different, precise legacy versions of python for library authors to test compatibility against those older Pythons. If you need to do that, particularly if you work on different projects where you may need to install some random super-old version of Python that you would not normally use to test something on, then pyenv is great. But if you only need one version of Python, it’s not a great way to get it.

The Other One That’s Exactly Like pyenv: asdf-python

The issues are exactly the same as with pyenv, as the tool is a straightforward alternative for the exact same purpose. It’s a bit less focused on Python than pyenv, which has pros and cons; it has broader community support, but it’s less specifically tuned for Python. But a comparative exploration of their differences is beyond the scope of this post.

The Built-In One That Isn’t Really Built-In: `/usr/bin/python3`

There is a binary in /usr/bin/python3 which might seem like an appealing option — it comes from Apple, after all! — but it is provided as a developer tool, for running things like build scripts. It isn’t for building applications with.

That binary is not a “system python”; the thing in the operating system itself is only a shim, which will determine if you have development tools, and shell out to a tool that will download the development tools for you if you don’t. There is unfortunately a lot of folk wisdom among older Python programmers who remember a time when apple did actually package an antedeluvian version of the interpreter that seemed to be supported forever, and might suggest it for things intended to be self-contained or have minimal bundled dependencies, but this is exactly the reason that Apple stopped shipping that.

If you use this option, it means that your Python might come from the Xcode Command Line Tools, or the Xcode application, depending on the state of xcode-select in your current environment and the order in which you installed them.

Upgrading Xcode via the app store or a developer.apple.com manual download — or its command-line tools, which are installed separately, and updated via the “settings” application in a completely different workflow — therefore also upgrades your version of Python without an easy way to downgrade, unless you manage multiple Xcode installs. Which, at 12G per install, is probably not an appealing option.³

The One With The Data And The Science: Conda

As someone with a limited understanding of data science and scientific computing, I’m not really qualified to go into the detailed pros and cons here, but luckily, Itamar Turner-Trauring is, and he did.

My one coda to his detailed exploration here is that while there are good reasons to want to use Anaconda — particularly if you are managing a data-science workload across multiple platforms and you want a consistent, holistic development experience across a large team supporting heterogenous platforms — some people will tell you that you need Conda to get you your libraries if you want to do data science or numerical work with Python at all, because Conda is how you install those libraries, and otherwise things just won’t work.

This is a historical artifact that is no longer true. Over the last decade, Python Wheels have been comprehensively adopted across the Python community, and almost every popular library with an extension module ships pre-built binaries to multiple platforms. There may be some libraries that only have prebuilt binaries for conda, but they are sufficiently specialized that I don’t know what they are.

The One for Being Consistent With Your Cloud Hosting

Another way to run Python on macOS is to not run it on macOS, but to get another computer inside your computer that isn’t running macOS, and instead run Python inside that, usually using Docker.⁴

There are good reasons to want to use a containerized configuration for development, but they start to drift away from the point of this post and into more complicated stuff about how to get your Python into the cloud.

So rather than saying “use Python.org native Python instead of Docker”, I am specifically not covering Docker as a replacement for a native mac Python here because in a lot of cases, it can’t be one. Many tools require native mac facilities like displaying GUIs or scripting applications, or want to be able to take a path name to a file without elaborate pre-work to allow the program to access it.

Summary

If you didn’t want to read all of that, here’s the summary.

If you use a mac:

Get your Python interpreter from python.org.
Update it with mopup so you don’t fall behind on security updates.
Always use venvs for specific projects, never pip install anything directly.
Use pipx to manage your Python applications so you don’t have to worry about dependency conflicts.
Don’t worry if Homebrew also installs a python executable, but don’t use it for your own stuff.
You might need a different Python interpreter if you have any specialized requirements, but you’ll probably know if you do.

Acknowledgements

If somebody sent you this article because you’re trying to get into Python and you got stuck on this point, let me first reassure you that all the information about this really is highly complex and confusing; if you’re feeling overwhelmed, that’s normal. But the good news is that you can really ignore most of it. Just read the next little bit. ↩
Some tools need to be installed in the same environment as the code they’re operating on, so you may want to have multiple installs of, for example, Mypy, PuDB, or sphinx. But for things that just do something useful but don’t need to load your code — such as this small selection of examples from my own collection: certbot, pgcli, asciinema, gister, speedtest-cli — pipx means you won’t have to debug wonky dependency interactions. ↩
The command-line tools are a lot smaller, but cannot have multiple versions installed at once, and are updated through a different mechanism. There are odd little details like the fact that the default bundle identifier for the framework differs, being either org.python.python or com.apple.python3. They’re generally different in a bunch of small subtle ways that don’t really matter in 95% of cases until they suddenly matter a lot in that last 5%. ↩
Or minikube, or podman, or colima or whatever I guess, there’s way too many of these containerization Pokémon running around for me to keep track of them all these days. ↩

Post-PyCon-US 2023 Notes

Some stream of consciousness post-conference notes.

python pycon supported Sunday April 30, 2023

PyCon 2023 was last week, and I wanted to write some notes on it while the memory is fresh. Much of this was jotted down on the plane ride home and edited a few days later.

Health & Safety

Even given my smaller practice run at PyBay, it was a bit weird for me to be back around so many people, given that it was all indoors.

However, it was very nice that everyone took masking seriously. I personally witnessed very few violations of the masking rules, and they all seemed to be momentary, unintentional slip-ups after eating or drinking something.

As a result, I’ve now been home for 4 full days, am COVID negative and did not pick up any more generic con crud. It’s really nice to be feeling healthy after a conference!

Overall Vibe

I was a bit surprised to find the conference much more overwhelming than I remembered it being. It’s been 4 years since my last PyCon; I was out of practice! It was also odd since last year was in person and at the same venue, so most folks had a sense of Salt Lake, and I really didn’t.

I think this was good, since I’ll remember this experience and have a fresher sense of what it feels like (at least a little bit) to be a new attendee next year.

The Schedule

I only managed to attend a few talks, but every one was excellent. In case you were not aware, un-edited livestream VODs of the talks are available with your online ticket, in advance of the release of the final videos on the YouTube channel, so if you missed these but you attended the conference you can still watch them¹

Ned Batchelder’s keynote was awesome, and was a great kickoff to the conference. It helped set a tone of kindness and thoughtfulness throughout.
Russel Keith-Magee gave a great update on the state of BeeWare, which I hope will help to popularize the excellent work that he, and his collaborators, are doing: You can take it with you: Packaging your Python code with Briefcase. Anaconda sponsoring time for people to work on it has really pushed it forward.
Sumana Harihareswara and Jacob Kaplan-Moss put on an excellent stage play — Argument Clinic: What Healthy Professional Conflict Looks Like — highlighting how to have a productive professional conflict in a way which was both illuminatingly structured and viscerally real.

My Talk

My talk, “How To Keep A Secret”, seemed to be very well received.²

I got to talk to a lot of people who said they learned things from it. I had the idea to respond to audience feedback by asking “will you be doing anything differently as a result of seeing the talk?” and so I got to hear about which specific information was actually useful to help improve the audience’s security posture. I highly recommend this follow-up question to other speakers in the future.

As part of the talk, I released and announced 2 projects related to its topic of better security posture around secrets-management:

PINPal, a little spaced-repetition tool to help you safely rotate your “core” passwords, the ones you actually need to memorize.
TokenRing, a backend for the keyring module which uses a hardware token to require user presence for any secret access, by encrypting your vault and passwords as Fernet tokens.

I also called for donations to a local LGBT+ charity in Salt Lake City and made a small matching donation, to try to help the conference have a bit of a positive impact on the area’s trans population, given the extremely bigoted law passed by the state legislature in the run-up to the conference.

We raised $330 in total³, and I think other speakers were making similar calls. Nobody wanted any credit; everyone who got in touch and donated just wanted to help out.

Open Spaces

I went to a couple of open spaces that were really engaging and thought-provoking.

Hynek hosted one based on his talk (which is based on this blog post) where we explored some really interesting case-studies in replacing subclassing with composition.
There was a “web framework maintainers” open-space hosted by David Lord, which turned into a bit of a group therapy session amongst framework maintainers from Flask, Django, Klein (i.e. Twisted), and Sanic. I had a few key takeaways from this one:
- We should try to keep our users in the loop with what is going on in the project. Every project should have a project blog so that users have a single point of contact.
  - It turns out Twisted does actually have one of these. But we should actually post updates to that blog so that users can see new developments. We have forgotten to even post.
  - We should repeatedly drive users to those posts, from every communications channel; social media (mastodon, twitter), chat (discord, IRC, matrix, gitter), or mailing lists. We should not be afraid to repeat ourselves a bit. We’re often afraid to spam our users but there’s a lot of room between where we are now — i.e. “users never hear from us” — and spamming them.
- We should regularly remind ourselves, and each other, that any work doing things like ticket triage, code review, writing for the project blog, and writing the project website are valuable work. We all kinda know this already, but psychologically it just feels like ancillary “stuff” that isn’t as real as the coding itself.
- We should find ways to recognize contributions, especially the aforementioned less-visible stuff, like people who hang out in chat and patiently direct users to the appropriate documentation or support channels.

The Sprints

The sprints were not what I expected. I sat down thinking I’d be slogging through some Twisted org GitHub Actions breakage on Klein and Treq, but what I actually did was:

Request an org on the recently-released PyPI “Organizations” feature, got it approved, and started adding a few core contributors.
Have some lovely conversations with PyCon and PSF staff about several potential projects that I think could really help the ecosystem. I don’t want to imply anyone has committed to anything here, so I’ll leave a description of exactly what those were for later.
Filed a series of issues against BeeWare™ Briefcase™ detailing exactly what I needed from Encrust that wasn’t already provided by Briefcase’s existing Mac support.
I also did much more than I expected on Pomodouroboros, including:
I talked to my first in-the-wild Pomodouroboros user, someone who started using the app early enough to get bitten by a Pickle data-migration bug and couldn’t upgrade! I’d forgotten that I’d released a version that modeled time as a float rather than a datetime.
Started working on a design with Moshe Zadka for integrations for external time-tracking services and task-management services.
I had the opportunity to review datetype with Paul Ganssle and explore options for integrating it with or recommending it from the standard library, to hopefully start to address the both the datetime-shouldn’t-subclass-date problem and the how-do-you-know-if-a-datetime-is-timezone-aware problem.
Speaking of Twisted infrastructure maintenance, special thanks to Paul Kehrer, who noticed that pyasn1 was breaking Twisted’s CI, and submitted a PR to fix it. I finally managed to do a review a few days after the conference and that’s landed now.

Everything Else

I’m sure I’m forgetting at least a half a dozen other meaningful interactions that I had; the week was packed, and I talked to lots of interesting people as always.

See you next year in Pittsburgh!.

Go to your dashboard and click the “Join PyCon US 2023 Online Now!” button at the top of the page, then look for the talk on the “agenda” tab or the speaker in the search box on the right. ↩
Talks like these and software like PINPal and TokenRing are the sorts of things things that I hope to get support for from my Patreon, so please go there if you’d like to support my continuing to do this sort of work. ↩
If you’d like to make that number bigger, ~~I’ll do another $100 match on this blog post, and update that paragraph if I receive anything; just send the receipt to encircle@glyph.im.~~ A reader sent in another matching donation and I made a contribution, so the total raised is now $530. ↩

No Executable Is An Island

Single-file executables are neither necessary nor sufficient to provide a good end-user software installation experience. They don’t work at all on macOS today, but they don't really work great anywhere else either. The focus of Python packaging tool development ought to be elsewhere.

python packaging supported Thursday March 30, 2023

One of the perennial talking points in the Python packaging discourse is that it’s unnecessarily difficult to create a simple, single-file binary that you can hand to users.

This complaint is understandable. In most other programming languages, the first thing you sit down to do is to invoke the compiler, get an executable, and run it. Other, more recently created programming languages — particularly Go and Rust — have really excellent toolchains for doing this which eliminate a lot of classes of error during that build-and-run process. A single, dependency-free, binary executable file that you can run is an eminently comprehensible artifact, an obvious endpoint for “packaging” as an endeavor.

While Rust and Go are producing these artifacts as effectively their only output, Python has such a dizzying array of tooling complexity that even attempting to describe “the problem” takes many thousands of words and one may not ever even get around to fully describing the complexity of the issues involved in the course of those words. All the more understandable, then, that we should urgently add this functionality to Python.

A programmer might see Python produce wheels and virtualenvs which can break when anything in their environment changes, and see the complexity of that situation. Then, they see Rust produce a statically-linked executable which “just works”, and they see its toolchain simplicity. I agree that this shouldn’t be so hard, and some of the architectural decisions that make this difficult in Python are indeed unfortunate.

But then, I think, our hypothetical programmer gets confused. They think that Rust is simple because it produces an executable, and they think Python’s complexity comes from all the packaging standards and tools. But this is not entirely true.

Python’s packaging complexity, and indeed some of those packaging standards, arises from the fact that it is often used as a glue language. Packaging pure Python is really not all that hard. And although the tools aren’t included with the language and the DX isn’t as smooth, even packaging pure Python into a single-file executable is pretty trivial.

But almost nobody wants to package a single pure-python script. The whole reason you’re writing in Python is because you want to rely on an enormous ecosystem of libraries, and some low but critical percentage of those libraries include things like their own statically-linked copies of OpenSSL or a few megabytes of FORTRAN code with its own extremely finicky build system you don’t want to have to interact with.

When you look aside to other ecosystems, while Python still definitely has some unique challenges, shipping Rust with a ton of FFI, or Go with a bunch of Cgo is substantially more complex than the default out-of-the-box single-file “it just works” executable you get at the start.¹

Still, all things being equal, I think single-file executable builds would be nice for Python to have as a building block. It’s certainly easier to produce a package for a platform if your starting point is that you have a known-good, functioning single-file executable and you all you need to do is embed it in some kind of environment-specific envelope. Certainly if what you want is to copy a simple microservice executable into a container image, you might really want to have this rather than setting up what is functionally a full Python development environment in your Dockerfile. After team-wide philosophical debates over what virtual environment manager to use, those Golang Dockerfiles that seem to be nothing but the following 4 lines are really appealing:

FROM golang
COPY *.go ./
RUN go build -o /app
ENTRYPOINT ["/app"]

All things are rarely equal, however.

The issue that we, as a community, ought to be trying to address with build tools is to get the software into users’ hands, not to produce a specific file format. In my opinion, single-file binary builds are not a great tool for this. They’re fundamentally not how people, even quite tech-savvy programming people, find and manage their tools.

A brief summary of the problems with single-file distributions:

They’re not discoverable. A single file linked on your website will not be found via something like brew search, apt search, choco search or searching in a platform’s GUI app store’s search bar.
They’re not updatable. People expect their system package manager to update stuff for them. Standalone binaries might add their own updaters, but now you’re shipping a whole software-update system inside your binary. More likely, it’ll go stale forever while better-packaged software will be updated and managed properly.
They have trouble packaging resources. Once you’ve got your code stuffed into a binary, how do you distribute images, configuration files, or other data resources along with it? This isn’t impossible to solve, but in other programming languages which do have a great single-file binary story, this problem is typically solved by third party tooling which, while it might work fine, will still generally exist in multiple alternative forms which adds its own complexity.

So while it might be a useful building-block that simplifies those annoying container builds a lot, it hardly solves the problem comprehensively.

If we were to build a big new tool, the tool we need is something that standardizes the input format to produce a variety of different complex, multi-file output formats, including things like:

deb packages (including uploading to PPA archives so people can add an apt line; a manual dpkg -i has many of the same issues as a single file)
container images (including the upload to a registry so that people can "$(shuf -n 1 -e nerdctl docker podman)" pull or FROM it)
Flatpak apps
Snaps
macOS apps
Microsoft store apps
MSI installers
Chocolatey / NuGet packages
Homebrew formulae

In other words, ask yourself, as a user of an application, how do you want to consume it? It depends what kind of tool it is, and there is no one-size-fits-all answer.

In any software ecosystem, if a feature is a building block which doesn’t fully solve the problem, that is an issue with the feature, but in many cases, that’s fine. We need lots of building blocks to get to full solutions. This is the story of open source.

However, if I had to take a crack at summing up the infinite-headed hydra of the Problem With Python Packaging, I’d put it like this:

Python has a wide array of tools which can be used to package your Python code for almost any platform, in almost any form, if you are sufficiently determined. The problem is that the end-to-end experience of shipping an application to end users who are not Python programmers² for any particular platform has a terrible user experience. What we need are more holistic solutions, not more building blocks.³

This makes me want to push back against this tendency whenever I see it, and to try to point towards more efficient ways to achieving a good user experience, with the relatively scarce community resources at our collective disposal⁴. Efficiency isn’t exclusively about ideal outcomes, though; it’s the optimization a cost/benefit ratio. In terms of benefits, it’s relatively low, as I hope I’ve shown above.

Building a tool that makes arbitrary Python code into a fully self-contained executable is also very high-cost, in terms of community effort, for a bunch of reasons. For starters, in any moderately-large collection of popular dependencies from PyPI, at least a few of them are going to want to find their own resources via __file__, and you need to hack in a way to find those, which is itself error prone. Python also expects dynamic linking in a lot of places, and messing around with C linkers to change that around is a complex process with its own huge pile of failure modes. You need to do this on pre-existing binaries built with options you can’t necessarily control, because making everyone rebuild all the binary wheels they find on PyPI is a huge step backwards in terms of exposing app developers to confusing infrastructure complexity.

Now, none of this is impossible. There are even existing tools to do some of the scarier low-level parts of these problems. But one of the reasons that all the existing tools for doing similar things have folk-wisdom reputations and even official documentation expecting constant pain is that part of the project here is conducting a full audit of every usage of __file__ on PyPI and replacing it with some resource-locating API which we haven’t even got a mature version of yet⁵.

Whereas copying all the files into the right spots in an archive file that can be directly deployed to an existing platform is tedious, but only moderately challenging. It usually doesn’t involve fundamentally changing how the code being packaged works, only where it is placed.

To the extent that we have a choice between “make it easy to produce a single-file binary without needing to understand the complexities of binaries” or “make it easy to produce a Homebrew formula / Flatpak build / etc without the user needing to understand Homebrew / Flatpak / etc”, we should always choose the latter.

If this is giving you déjà vu, I’ve gestured at this general concept more vaguely in a few places, including tweeting⁶ about it in 2019, saying vaguely similar stuff:

If you're making a packaging tool for Python, stop trying to make single-file executables. They are a pointless technical flourish. You need to build:
• .app bundles
• .deb/.rpm/flatpack packages
• container images
• .msi installers
• Homebrew formulæ
• a codesigning pipeline
— glyph ⎷⃣ (@glyph) August 5, 2019

Everything I’ve written here so far is debatable.

You can find that debate both in replies to that original tweet and in various other comments and posts elsewhere that I’ve grumbled about this. I still don’t agree with that criticism, but there are very clever people working on complex tools which are slowly gaining popularity and might be making the overall packaging situation better.

So while I think we should in general direct efforts more towards integrating with full-featured packaging standards, I don’t want to yuck anybody’s yum when it comes to producing clean single-file executables in general. If you want to build that tool and it turns out to be a more useful building block than I’m giving it credit for, knock yourself out.

However, in addition to having a comprehensive write-up of my previously-stated opinions here, I want to impart a more concrete, less debatable issue. To wit: single-file executables as a distribution mechanism, specifically on macOS is not only sub-optimal, but a complete waste of time.

Late last year, Hynek wrote a great post about his desire for, and experience of, packaging a single-file binary for multiple platforms. This should serve as an illustrative example of my point here. I don’t want to pick on Hynek. Prominent Python programmers wish for this all the time.. In fact, Hynek also did the thing I said is a good idea here, and did, in fact, create a Homebrew tap, and that’s the one the README recommends.

So since he kindly supplied a perfect case-study of the contrasting options, let’s contrast them!

The first thing I notice is that the Homebrew version is Apple Silicon native, whereas the single-file binary is still x86_64, as the brew build and test infrastructure apparently deals with architectural differences (probably pretty easy given it can use Homebrew’s existing Python build) but the more hand-rolled PyOxidizer setup builds only for the host platform, which in this case is still an Intel mac thanks to GitHub dragging their feet.

The second is that the Homebrew version runs as I expect it to. I run doc2dash in my terminal and I see Usage: doc2dash [OPTIONS] SOURCE, as I should.

So, A+ on the Homebrew tap. No notes. I did not have to know anything about Python being in the loop at all here, it “just works” like every Ruby, Clojure, Rust, or Go tool I’ve installed with the same toolchain.

Over to the single-file brew-less version.

Beyond the architecture being emulated and having to download Rosetta2⁸, I have to note that this “single file” binary already comes in a zip file, since it needs to include the license in a separate file.⁷ Now that it’s unarchived, I have some choices to make about where to put it on my $PATH. But let’s ignore that for now and focus on the experience of running it. I fire up a terminal, and run cd Downloads/doc2dash.x86_64-apple-darwin/ and then ./doc2dash.

Now we hit the more intractable problem:

The executable does not launch because it is neither code-signed nor notarized. I’m not going to go through the actual demonstration here, because you already know how annoying this is, and also, you can’t actually do it.

Code-signing is more or less fine. The codesign tool will do its thing, and that will change the wording in the angry dialog box from something about an “unidentified developer” to being “unable to check for malware”, which is not much of a help. You still need to notarize it, and notarization can’t work.

macOS really wants your executable code to be in a bundle (i.e., an App) so that it can know various things about its provenance and structure. CLI tools are expected to be in the operating system, or managed by a tool like brew that acts like a sort of bootleg secondary operating-system-ish thing and knows how to manage binaries.

If it isn’t in a bundle, then it needs to be in a platform-specific .pkg file, which is installed with the built-in Installer app. This is because apple cannot notarize a stand-alone binary executable file.

Part of the notarization process involves stapling an external “notarization ticket” to your code, and if you’ve only got a single file, it has nowhere to put that ticket. You can’t even submit a stand-alone binary; you have to package it in a format that is legible to Apple’s notarization service, which for a pure-CLI tool, means a .pkg.

What about corporate distributions of proprietary command-line tools, like the 1Password CLI? Oh look, their official instructions also tell you to use their Homebrew formula too. Homebrew really is the standard developer-CLI platform at this point for macOS. When 1Password distributes stuff outside of Homebrew, as with their beta builds, it’s stuff that lives in a .pkg as well.

It is possible to work around all of this.

I could open the unzipped file, right-click on the CLI tool, go to “Open”, get a subtly differently worded error dialog, like this…

…watch it open Terminal for me and then exit, then wait multiple seconds for it to run each time I want to re-run it at the command line. Did I mention that? The single-file option takes 2-3 seconds doing who-knows what (maybe some kind of security check, maybe pyoxidizer overhead, I don’t know) but the Homebrew version starts imperceptibly instantly.

Also, I then need to manually re-do this process in the GUI every time I want to update it.

If you know about the magic of how this all actually works, you can also do xattr -d com.apple.quarantine doc2dash by hand, but I feel like xattr -d is a step lower down in the user-friendliness hierarchy than python3 -m pip install⁹, and not only because making a habit of clearing quarantine attributes manually is a little like cutting the brake lines on Gatekeeper’s ability to keep out malware.

But the point of distributing a single-file binary is to make it “easy” for end users, and is explaining gatekeeper’s integrity verification accomplishing that goal?

Apple’s effectively mandatory code-signing verification on macOS is far out ahead of other desktop platforms right now, both in terms of its security and in terms of its obnoxiousness. But every mobile platform is like this, and I think that as everyone gets more and more panicked about malicious interference with software delivery, we’re going to see more and more official requirements that software must come packaged in one of these containers.

Microsoft will probably fix their absolute trash-fire of a codesigning system one day too. I predict that something vaguely like this will eventually even come to most Linux distributions. Not necessarily a prohibition on individual binaries like this, or like a GUI launch-prevention tool, but some sort of requirement imposed by the OS that every binary file be traceable to some sort of package, maybe enforced with some sort of annoying AppArmor profile if you don’t do it.

The practical, immediate message today is: “don’t bother producing a single-file binary for macOS users, we don’t want it and we will have a hard time using it”. But the longer term message is that focusing on creating single-file binaries is, in general, skating to where the puck used to be.

If we want Python to have a good platform-specific distribution mechanism for every platform, so it’s easy for developers to get their tools to users without having to teach all their users a bunch of nonsense about setuptools and virtualenvs first, we need to build that, and not get hung up on making a single-file executable packer a core part of the developer experience.

Thanks very much to my patrons for their support of writing like this, and software like these.

Oh, right. This is where I put the marketing “call to action”. Still getting the hang of these.

Did you enjoy this post and want me to write more like it, and/or did you hate it and want the psychological leverage and moral authority to tell me to stop and do something else? You can sign up here!

I remember one escapade in particular where someone had to ship a bunch of PKCS#11 providers along with a Go executable in their application and it was, to put it lightly, not a barrel of laughs. ↩
Shipping to Python programmers in a Python environment is kind of fine now, and has been for a while. ↩
Yet, even given my advance awareness of this criticism, despite my best efforts, I can’t seem to stop building little tools that poorly solve only one problem in isolation. ↩
And it does have to be at our collective disposal. Even the minuscule corner of this problem I’ve worked on, the aforementioned Mac code-signing and notarization stuff, is tedious and exhausting; nobody can take on the whole problem space, which is why I find writing about this is such an important part of the problem. Thanks Pradyun, and everyone else who has written about this at length! ↩
One of the sources of my anti-single-file stance here is that I tried very hard, for many years, to ensure that everything in Twisted was carefully zipimport-agnostic, even before pkg_resources existed, by using the from twisted.python.modules import getModule, getModule(__name__).filePath.sibling(...) idiom, where that .filePath attribute might be anything FilePath-like, specifically including ZipPath. It didn’t work; fundamentally, nobody cared, other contributors wouldn’t bother to enforce this, or even remember that it might be desirable, because they’d never worked in an environment where it was. Today, a quick git grep __file__ in Twisted turns up tons of usages that will make at least the tests fail to run in a single-file or zipimport environment. Despite the availability of zipimport itself since 2001, I have never seen tox or a tool like it support running with a zipimport-style deployment to ensure that this sort of configuration is easily, properly testable across entire libraries or applications. If someone really did want to care about single-file deployments, fixing this problem comprehensively across the ecosystem is probably one of the main things to start with, beginning with an international evangelism tour for importlib.resources. ↩
This is where the historical document is, given that I was using it at the time, but if you want to follow me now, please follow me on Mastodon. ↩
Oops, I guess we might as well not have bothered to make a single-file executable anyway! Once you need two files, you can put whatever you want in the zip file... ↩
Just kidding. Of course it’s installed already. It’s almost cute how Apple shows you the install progress to remind you that one day you might not need to download it whenever you get a new mac. ↩
There’s still technically a Python included in Xcode and the Xcode CLT, so functionally macs do have a /usr/bin/python3 that is sort of a python3.9. You shouldn’t really use it. Instead, download the python installer from python.org. But probably you should use it before you start disabling code integrity verification everywhere. ↩

DBXS 0.1.0

New Release

What’s New

It’s A Good Time To Contribute!

Acknowledgments

Annotated At Runtime

What are you looking for?

Acknowledgments

Get Your Mac Python From Python.org

The One You Probably Want: Python.org

(And Auto-Update It)

(And Always Use Virtual Environments)

The Others

The One For Running Other People’s Code, Not Yours: Homebrew

The One For When You Need 20 Different Pythons For Debugging: pyenv

The Other One That’s Exactly Like pyenv: asdf-python

The Built-In One That Isn’t Really Built-In: /usr/bin/python3

The One With The Data And The Science: Conda

The One for Being Consistent With Your Cloud Hosting

Summary

Acknowledgements

Post-PyCon-US 2023 Notes

Health & Safety

Overall Vibe

The Schedule

My Talk

Open Spaces

The Sprints

Everything Else

No Executable Is An Island

The Built-In One That Isn’t Really Built-In: `/usr/bin/python3`