Safer, Not Later

How “Move Fast and Break Things” ruined the world by escaping the context that it was intended for.

Facebook — and by extension, most of Silicon Valley — rightly gets a lot of shit for its old motto, “Move Fast and Break Things”.

As a general principle for living your life, it is obviously terrible advice, and it leads to a lot of the horrific outcomes of Facebook’s business.

I don’t want to be an apologist for Facebook. I also do not want to excuse the worldview that leads to those kinds of outcomes. However, I do want to try to help laypeople understand how software engineers—particularly those situated at the point in history where this motto became popular—actually meant by it. I would like more people in the general public to understand why, to engineers, it was supposed to mean roughly the same thing as Facebook’s newer, goofier-sounding “Move fast with stable infrastructure”.

Move Slow

In the bad old days, circa 2005, two worlds within the software industry were colliding.

The old world was the world of integrated hardware/software companies, like IBM and Apple, and shrink-wrapped software companies like Microsoft and WordPerfect. The new world was software-as-a-service companies like Google, and, yes, Facebook.

In the old world, you delivered software in a physical, shrink-wrapped box, on a yearly release cycle. If you were really aggressive you might ship updates as often as quarterly, but faster than that and your physical shipping infrastructure would not be able to keep pace with new versions. As such, development could proceed in long phases based on those schedules.

In practice what this meant was that in the old world, when development began on a new version, programmers would go absolutely wild adding incredibly buggy, experimental code to see what sorts of things might be possible in a new version, then slowly transition to less coding and more testing, eventually settling into a testing and bug-fixing mode in the last few months before the release.

This is where the idea of “alpha” (development testing) and “beta” (user testing) versions came from. Software in that initial surge of unstable development was extremely likely to malfunction or even crash. Everyone understood that. How could it be otherwise? In an alpha test, the engineers hadn’t even started bug-fixing yet!

In the new world, the idea of a 6-month-long “beta test” was incoherent. If your software was a website, you shipped it to users every time they hit “refresh”. The software was running 24/7, on hardware that you controlled. You could be adding features at every minute of every day. And, now that this was possible, you needed to be adding those features, or your users would get bored and leave for your competitors, who would do it.

But this came along with a new attitude towards quality and reliability. If you needed to ship a feature within 24 hours, you couldn’t write a buggy version that crashed all the time, see how your carefully-selected group of users used it, collect crash reports, fix all the bugs, have a feature-freeze and do nothing but fix bugs for a few months. You needed to be able to ship a stable version of your software on Monday and then have another stable version on Tuesday.

To support this novel sort of development workflow, the industry developed new technologies. I am tempted to tell you about them all. Unit testing, continuous integration servers, error telemetry, system monitoring dashboards, feature flags... this is where a lot of my personal expertise lies. I was very much on the front lines of the “new world” in this conflict, trying to move companies to shorter and shorter development cycles, and move away from the legacy worldview of Big Release Day engineering.

Old habits die hard, though. Most engineers at this point were trained in a world where they had months of continuous quality assurance processes after writing their first rough draft. Such engineers feel understandably nervous about being required to ship their probably-buggy code to paying customers every day. So they would try to slow things down.

Of course, when one is deploying all the time, all other things being equal, it’s easy to ship a show-stopping bug to customers. Organizations would do this, and they’d get burned. And when they’d get burned, they would introduce Processes to slow things down. Some of these would look like:

  1. Let’s keep a special version of our code set aside for testing, and then we’ll test that for a few weeks before sending it to users.
  2. The heads of every department need to sign-off on every deployed version, so everyone needs to spend a day writing up an explanation of their changes.
  3. QA should sign off too, so let’s have an extensive sign-off process where each individual tester does a fills out a sign-off form.

Then there’s my favorite version of this pattern, where management decides that deploys are inherently dangerous, and everyone should probably just stop doing them. It typically proceeds in stages:

  1. Let’s have a deploy freeze, and not deploy on Fridays; don’t want to mess up the weekend debugging an outage.
  2. Actually, let’s extend that freeze for all of December, we don’t want to mess up the holiday shopping season.
  3. Actually why not have the freeze extend into the end of November? Don’t want to mess with Thanksgiving and the Black Friday weekend.
  4. Some of our customers are in India, and Diwali’s also a big deal. Why not extend the freeze from the end of October?
  5. But, come to think of it, we do a fair amount of seasonal sales for Halloween too. How about no deployments from October 10 onward?
  6. You know what, sometimes people like to use our shop for Valentine’s day too. Let’s just never deploy again.

This same anti-pattern can repeat itself with an endlessly proliferating list of “environments”, whose main role ends up being to ensure that no code ever makes it to actual users.

… and break things anyway

As you may have begun to suspect, there are a few problems with this style of software development.

Even back in the bad old days of the 90s when you had to ship disks in boxes, this methodology contained within itself the seeds of its own destruction. As Joel Spolsky memorably put it, Microsoft discovered that this idea that you could introduce a ton of bugs and then just fix them later came along with some massive disadvantages:

The very first version of Microsoft Word for Windows was considered a “death march” project. It took forever. It kept slipping. The whole team was working ridiculous hours, the project was delayed again, and again, and again, and the stress was incredible. [...] The story goes that one programmer, who had to write the code to calculate the height of a line of text, simply wrote “return 12;” and waited for the bug report to come in [...]. The schedule was merely a checklist of features waiting to be turned into bugs. In the post-mortem, this was referred to as “infinite defects methodology”.

Which lead them to what is perhaps the most ironclad law of software engineering:

In general, the longer you wait before fixing a bug, the costlier (in time and money) it is to fix.

A corollary to this is that the longer you wait to discover a bug, the costlier it is to fix.

Some bugs can be found by code review. So you should do code review. Some bugs can be found by automated tests. So you should do automated testing. Some bugs will be found by monitoring dashboards, so you should have monitoring dashboards.

So why not move fast?

But here is where Facebook’s old motto comes in to play. All of those principles above are true, but here are two more things that are true:

  1. No matter how much code review, automated testing, and monitoring you have some bugs can only be found by users interacting with your software.
  2. No bugs can be found merely by slowing down and putting the deploy off another day.

Once you have made the process of releasing software to users sufficiently safe that the potential damage of any given deployment can be reliably limited, it is always best to release your changes to users as quickly as possible.

More importantly, as an engineer, you will naturally have an inherent fear of breaking things. If you make no changes, you cannot be blamed for whatever goes wrong. Particularly if you grew up in the Old World, there is an ever-present temptation to slow down, to avoid shipping, to hold back your changes, just in case.

You will want to move slow, to avoid breaking things. Better to do nothing, to be useless, than to do harm.

For all its faults as an organization, Facebook did, and does, have some excellent infrastructure to avoid breaking their software systems in response to features being deployed to production. In that sense, they’d already done the work to avoid the “harm” of an individual engineer’s changes. If future work needed to be performed to increase safety, then that work should be done by the infrastructure team to make things safer, not by every other engineer slowing down.

The problem is that slowing down is not actually value neutral. To quote myself here:

If you can’t ship a feature, you can’t fix a bug.

When you slow down just for the sake of slowing down, you create more problems.

The first problem that you create is smashing together far too many changes at once.

You’ve got a development team. Every engineer on that team is adding features at some rate. You want them to be doing that work. Necessarily, they’re all integrating them into the codebase to be deployed whenever the next deployment happens.

If a problem occurs with one of those changes, and you want to quickly know which change caused that problem, ideally you want to compare two versions of the software with the smallest number of changes possible between them. Ideally, every individual change would be released on its own, so you can see differences in behavior between versions which contain one change each, not a gigantic avalanche of changes where any one of hundred different features might be the culprit.

If you slow down for the sake of slowing down, you also create a process that cannot respond to failures of the existing code.

I’ve been writing thus far as if a system in a steady state is inherently fine, and each change carries the possibility of benefit but also the risk of failure. This is not always true. Changes don’t just occur in your software. They can happen in the world as well, and your software needs to be able to respond to them.

Back to that holiday shopping season example from earlier: if your deploy freeze prevents all deployments during the holiday season to prevent breakages, what happens when your small but growing e-commerce site encounters a catastrophic bug that has always been there, but only occurs when you have more than 10,000 concurrent users. The breakage is coming from new, never before seen levels of traffic. The breakage is coming from your success, not your code. You’d better be able to ship a fix for that bug real fast, because your only other option to a fast turn-around bug-fix is shutting down the site entirely.

And if you see this failure for the first time on Black Friday, that is not the moment where you want to suddenly develop a new process for deploying on Friday. The only way to ensure that shipping that fix is easy is to ensure that shipping any fix is easy. That it’s a thing your whole team does quickly, all the time.

The motto “Move Fast And Break Things” caught on with a lot of the rest of Silicon Valley because we are all familiar with this toxic, paralyzing fear.

After we have the safety mechanisms in place to make changes as safe as they can be, we just need to push through it, and accept that things might break, but that’s OK.

Some Important Words are Missing

The motto has an implicit preamble, “Once you have done the work to make broken things safe enough, then you should move fast and break things”.

When you are in a conflict about whether to “go fast” or “go slow”, the motto is not supposed to be telling you that the answer is an unqualified “GOTTA GO FAST”. Rather, it is an exhortation to take a beat and to go through a process of interrogating your motivation for slowing down. There are three possible things that a person saying “slow down” could mean about making a change:

  1. It is broken in a way you already understand. If this is the problem, then you should not make the change, because you know it’s not ready. If you already know it’s broken, then the change simply isn’t done. Finish the work, and ship it to users when it’s finished.
  2. It is risky in a way that you don’t have a way to defend against. As far as you know, the change works, but there’s a risk embedded in it that you don’t have any safety tools to deal with. If this is the issue, then what you should do is pause working on this change, and build the safety first.
  3. It is making you nervous in a way you can’t articulate. If you can’t describe an known defect as in point 1, and you can’t outline an improved safety control as in step 2, then this is the time to let go, accept that you might break something, and move fast.

The implied context for “move fast and break things” is only in that third condition. If you’ve already built all the infrastructure that you can think of to build, and you’ve already fixed all the bugs in the change that you need to fix, any further delay will not serve you, do not have any further delays.

Unfortunately, as you probably already know,

This motto did a lot of good in its appropriate context, at its appropriate time. It’s still a useful heuristic for engineers, if the appropriate context is generally understood within the conversation where it is used.

However, it has clearly been taken to mean a lot of significantly more damaging things.

Purely from an engineering perspective, it has been reasonably successful. It’s less and less common to see people in the industry pushing back against tight deployment cycles. It’s also less common to see the basic safety mechanisms (version control, continuous integration, unit testing) get ignored. And many ex-Facebook engineers have used this motto very clearly under the understanding I’ve described here.

Even in the narrow domain of software engineering it is misused. I’ve seen it used to argue a project didn’t need tests; that a deploy could be forced through a safety process; that users did not need to be informed of a change that could potentially impact them personally.

Outside that domain, it’s far worse. It’s generally understood to mean that no safety mechanisms are required at all, that any change a software company wants to make is inherently justified because it’s OK to “move fast”. You can see this interpretation in the way that it has leaked out of Facebook’s engineering culture and suffused its entire management strategy, blundering through market after market and issue after issue, making catastrophic mistakes, making a perfunctory apology and moving on to the next massive harm.

In the decade since it has been retired as Facebook’s official motto, it has been used to defend some truly horrific abuses within the tech industry. You only need to visit the orange website to see it still being used this way.

Even at its best, “move fast and break things” is an engineering heuristic, it is not an ethical principle. Even within the context I’ve described, it’s only okay to move fast and break things. It is never okay to move fast and harm people.

So, while I do think that it is broadly misunderstood by the public, it’s still not a thing I’d ever say again. Instead, I propose this:

Make it safer, don’t make it later.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “how do I make changes to my codebase, but, like, good ones”.

iOS Mail To Omnifocus Task

Convert messages in the Mail app built in to iOS into tasks in OmniFocus.

One of my longest-running frustrations with iOS is that the default mail app does not have a “share” action, making it impossible to do the one thing that a mail client needs to be able to do for me, which is to selectively turn messages into tasks. This deficiency has multiple components which makes it difficult to work around:

  1. There is no UI to “share” a message from within the mail app, so you can’t share it to OmniFocus or a shortcut.
  2. There is no way to query for a mail message in Shortcuts and then get some kind of “message” object, so there’s no way you can start in Shortcuts and manually invoke something.
  3. There’s no such thing as MailKit on iOS, so third-party apps can’t query your mail database either.

To work around this, I’ve long subscribed to the “AirMail” app, which has a “message to omnifocus” action but is otherwise kind of a buggy mess.

But today, I read that you can set up an iPad in a split-screen view and drag messages from the built-in Mail app’s message list view into the OmniFocus inbox, and I extrapolated, and discovered how to get Mail-message-to-Omnifocus-task work on an iPhone.

I’m thrilled that this functionality exists, but it is a bit of a masterclass in how to get a terrible UX out of a series of decisions that were probably locally reasonable. So, without any further ado, here’s how you do it:

  1. Open up mail.app, and find the message you want to share in the message list. Here, you have two choices:
    1. With one finger, press and hold on a message until you feel a haptic. When you feel the haptic, immediately move your finger a little bit, before you see the preview come up. This lets you operate directly from the message list, but is very fiddly.
    2. Tap the message to open the detail view, then press and hold on the sent date in the top right.
  2. Continue holding down with your first finger. With a second finger, swipe up from the bottom to enter the multitasking view, or to go back to your home screen. While holding your first finger in place, either switch to or launch OmniFocus.
  3. With your second finger, navigate to the Inbox, drag your first finger to the bottom of the list, and release it. Voila! You should have a task with a brief summary and a link back to the message.
  4. Swipe up from the bottom to switch back to Mail, then archive the message.

Get Your Mac Python From Python.org

There are many ways to get Python installed on macOS, but for most people the version that you download from Python.org is best.

One of the most unfortunate things about learning Python is that there are so many different ways to get it installed, and you need to choose one before you even begin. The differences can also be subtle and require technical depth to truly understand, which you don’t have yet.1 Even experts can be missing information about which one to use and why.

There are perhaps more of these on macOS than on any other platform, and that’s the platform I primarily use these days. If you’re using macOS, I’d like to make it simple for you.

The One You Probably Want: Python.org

My recommendation is to use an official build from python.org.

I recommed the official installer for most uses, and if you were just looking for a choice about which one to use, you can stop reading now. Thanks for your time, and have fun with Python.

If you want to get into the nerdy nuances, read on.

For starters, the official builds are compiled in such a way that they will run on a wide range of macs, both new and old. They are universal2 binaries, unlike some other builds, which means you can distribute them as part of a mac application.

The main advantage that the Python.org build has, though, is very subtle, and not any concrete technical detail. It’s a social, structural issue: the Python.org builds are produced by the people who make CPython, who are more likely to know about the nuances of what options it can be built with, and who are more likely to adopt their own improvements as they are released. Third party builders who are focused on a more niche use-case may not realize that there are build options or environment requirements that could make their Pythons better.

I’m being a bit vague deliberately here, because at any particular moment in time, this may not be an advantage at all. Third party integrators generally catch up to changes, and eventually achieve parity. But for a specific upcoming example, PEP 703 will have extensive build-time implications, and I would trust the python.org team to be keeping pace with all those subtle details immediately as releases happen.

(And Auto-Update It)

The one downside of the official build is that you have to return to the website to check for security updates. Unlike other options described below, there’s no built-in auto-updater for security patches. If you follow the normal process, you still have to click around in a GUI installer to update it once you’ve clicked around on the website to get the file.

I have written a micro-tool to address this and you can pip install mopup and then periodically run mopup and it will install any security updates for your current version of Python, with no interaction besides entering your admin password.

(And Always Use Virtual Environments)

Once you have installed Python from python.org, never pip install anything globally into that Python, even using the --user flag. Always, always use a virtual environment of some kind. In fact, I recommend configuring it so that it is not even possible to do so, by putting this in your ~/.pip/pip.conf:

1
2
[global]
require-virtualenv = true

This will avoid damaging your Python installation by polluting it with libraries that you install and then forget about. Any time you need to do something new, you should make a fresh virtual environment, and then you don’t have to worry about library conflicts between different projects that you may work on.

If you need to install tools written in Python, don’t manage those environments directly, install the tools with pipx. By using pipx, you allow each tool to maintain its own set dependencies, which means you don’t need to worry about whether two tools you use have conflicting version requirements, or whether the tools conflict with your own code.2

The Others

There are, of course, several other ways to install Python, which you probably don’t want to use.

The One For Running Other People’s Code, Not Yours: Homebrew

In general, Homebrew Python is not for you.

The purpose of Homebrew’s python is to support applications packaged within Homebrew, which have all been tested against the versions of python libraries also packaged within Homebrew. It may upgrade without warning on just about any brew operation, and you can’t downgrade it without breaking other parts of your install.

Specifically for creating redistributable binaries, Homebrew python is typically compiled only for your specific architecture, and thus will not create binaries that can be used on Intel macs if you have an Apple Silicon machine, or will run slower on Apple Silicon machines if you have an Intel mac. Also, if there are prebuilt wheels which don’t yet exist for Apple Silicon, you cannot easily arch -x86_64 python ... and just install them; you have to install a whole second copy of Homebrew in a different location, which is a headache.

In other words, homebrew is an alternative to pipx, not to Python. For that purpose, it’s fine.

The One For When You Need 20 Different Pythons For Debugging: pyenv

Like Homebrew, pyenv will default to building a single-architecture binary. Even worse, it will not build a Framework build of Python, which means several things related to being a mac app just won’t work properly. Remember those build-time esoterica that the core team is on top of but third parties may not be? “Should I use a Framework build” is an enduring piece of said esoterica.

The purpose of pyenv is to provide a large matrix of different, precise legacy versions of python for library authors to test compatibility against those older Pythons. If you need to do that, particularly if you work on different projects where you may need to install some random super-old version of Python that you would not normally use to test something on, then pyenv is great. But if you only need one version of Python, it’s not a great way to get it.

The Other One That’s Exactly Like pyenv: asdf-python

The issues are exactly the same as with pyenv, as the tool is a straightforward alternative for the exact same purpose. It’s a bit less focused on Python than pyenv, which has pros and cons; it has broader community support, but it’s less specifically tuned for Python. But a comparative exploration of their differences is beyond the scope of this post.

The Built-In One That Isn’t Really Built-In: /usr/bin/python3

There is a binary in /usr/bin/python3 which might seem like an appealing option — it comes from Apple, after all! — but it is provided as a developer tool, for running things like build scripts. It isn’t for building applications with.

That binary is not a “system python”; the thing in the operating system itself is only a shim, which will determine if you have development tools, and shell out to a tool that will download the development tools for you if you don’t. There is unfortunately a lot of folk wisdom among older Python programmers who remember a time when apple did actually package an antedeluvian version of the interpreter that seemed to be supported forever, and might suggest it for things intended to be self-contained or have minimal bundled dependencies, but this is exactly the reason that Apple stopped shipping that.

If you use this option, it means that your Python might come from the Xcode Command Line Tools, or the Xcode application, depending on the state of xcode-select in your current environment and the order in which you installed them.

Upgrading Xcode via the app store or a developer.apple.com manual download — or its command-line tools, which are installed separately, and updated via the “settings” application in a completely different workflow — therefore also upgrades your version of Python without an easy way to downgrade, unless you manage multiple Xcode installs. Which, at 12G per install, is probably not an appealing option.3

The One With The Data And The Science: Conda

As someone with a limited understanding of data science and scientific computing, I’m not really qualified to go into the detailed pros and cons here, but luckily, Itamar Turner-Trauring is, and he did.

My one coda to his detailed exploration here is that while there are good reasons to want to use Anaconda — particularly if you are managing a data-science workload across multiple platforms and you want a consistent, holistic development experience across a large team supporting heterogenous platforms — some people will tell you that you need Conda to get you your libraries if you want to do data science or numerical work with Python at all, because Conda is how you install those libraries, and otherwise things just won’t work.

This is a historical artifact that is no longer true. Over the last decade, Python Wheels have been comprehensively adopted across the Python community, and almost every popular library with an extension module ships pre-built binaries to multiple platforms. There may be some libraries that only have prebuilt binaries for conda, but they are sufficiently specialized that I don’t know what they are.

The One for Being Consistent With Your Cloud Hosting

Another way to run Python on macOS is to not run it on macOS, but to get another computer inside your computer that isn’t running macOS, and instead run Python inside that, usually using Docker.4

There are good reasons to want to use a containerized configuration for development, but they start to drift away from the point of this post and into more complicated stuff about how to get your Python into the cloud.

So rather than saying “use Python.org native Python instead of Docker”, I am specifically not covering Docker as a replacement for a native mac Python here because in a lot of cases, it can’t be one. Many tools require native mac facilities like displaying GUIs or scripting applications, or want to be able to take a path name to a file without elaborate pre-work to allow the program to access it.

Summary

If you didn’t want to read all of that, here’s the summary.

If you use a mac:

  1. Get your Python interpreter from python.org.
  2. Update it with mopup so you don’t fall behind on security updates.
  3. Always use venvs for specific projects, never pip install anything directly.
  4. Use pipx to manage your Python applications so you don’t have to worry about dependency conflicts.
  5. Don’t worry if Homebrew also installs a python executable, but don’t use it for your own stuff.
  6. You might need a different Python interpreter if you have any specialized requirements, but you’ll probably know if you do.

Acknowledgements

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “which Python is the really good one”.


  1. If somebody sent you this article because you’re trying to get into Python and you got stuck on this point, let me first reassure you that all the information about this really is highly complex and confusing; if you’re feeling overwhelmed, that’s normal. But the good news is that you can really ignore most of it. Just read the next little bit. 

  2. Some tools need to be installed in the same environment as the code they’re operating on, so you may want to have multiple installs of, for example, Mypy, PuDB, or sphinx. But for things that just do something useful but don’t need to load your code — such as this small selection of examples from my own collection: certbot, pgcli, asciinema, gister, speedtest-clipipx means you won’t have to debug wonky dependency interactions. 

  3. The command-line tools are a lot smaller, but cannot have multiple versions installed at once, and are updated through a different mechanism. There are odd little details like the fact that the default bundle identifier for the framework differs, being either org.python.python or com.apple.python3. They’re generally different in a bunch of small subtle ways that don’t really matter in 95% of cases until they suddenly matter a lot in that last 5%. 

  4. Or minikube, or podman, or colima or whatever I guess, there’s way too many of these containerization Pokémon running around for me to keep track of them all these days. 

Bilithification

Not sure how to do microservices? Split your monolith in half.

Several years ago at O’Reilly’s Software Architecture conference, within a comprehensive talk on refactoring “Technical Debt: A Masterclass”, r0ml1 presented a concept that I think should be highlighted.

If you have access to O’Reilly Safari, I think the video is available there, or you can get the slides here. It’s well worth watching in its own right. The talk contains a lot of hard-won wisdom from a decades-long career, but in slides 75-87, he articulates a concept that I believe resolves the perennial pendulum-swing between microservices and monoliths that we see in the Software as a Service world.

I will refer to this concept as “the bilithification strategy”.

Background

Personally, I have long been a microservice skeptic. I would generally articulate this skepticism in terms of “YAGNI”.

Here’s the way I would advise people asking about microservices before encountering this concept:

Microservices are often adopted by small teams due to their advertised benefits. Advocates from very large organizations—ones that have been very successful with microservices—frequently give talks claiming that microservices are more modular, more scalable, and more fault-tolerant than their monolithic progenitors. But these teams rarely appreciate the costs, particularly the costs for smaller orgs. Specifically, there is a fixed operational marginal cost to each new service, and a fairly large fixed operational overhead to the infrastructure for an organization deploying microservices in at all.

With a large enough team, the operational cost is easy to absorb. As the overhead is fixed, it trends towards zero as your total team size and system complexity trend towards infinity. Also, in very large teams, the enforced isolation of components in separate services reduces complexity. It does so specifically intentionally causing the software architecture to mirror the organizational structure of the team that deploys it. This — at the cost of increased operational overhead and decreased efficiency — allows independent parts of the organization to make progress independently, without blocking on each other. Therefore, in smaller teams, as you’re building, you should bias towards building a monolith until the complexity costs of the monolith become apparent. Then you should build the infrastructure to switch to microservices.

I still stand by all of this. However, it’s incomplete.

What does it mean to “switch to microservices”?

The biggest thing that this advice leaves out is a clear understanding of the “micro” in “microservice”. In this framing, I’m implicitly understanding “micro” services to be services that are too small — or at least, too small for your team. But if they do work for large organizations, then at some point, you need to have them. This leaves open several questions:

  • What size is the right size for a service?
  • When should you split your monolith up into smaller services?
  • Wait, how do you even measure “size” of a service? Lines of code? Gigabytes of memory? Number of team members?

In a specific situation I could probably look at these questions for that situation, and make suggestions as to the appropriate course of action, but that’s based largely on vibes. There’s just a lot of drawing on complex experiences, not a repeatable pattern that a team could apply on their own.

We can be clear that you should always start with a monolith. But what should you do when that’s no longer working? How do you even tell when it’s no longer working?

Bilithification

Every codebase begins as a monolith. That is a single (mono) rock (lith). Here’s what it looks like.

a circle with the word “monolith” on it

Let’s say that the monolith, and the attendant team, is getting big enough that we’re beginning to consider microservices. We might now ask, “what is the appropriate number of services to split the monolith into?” and that could provoke endless debate even among a team with total consensus that it might need to be split into some number of services.

Rather than beginning with the premise that there is a correct number, we may observe instead that splitting the service into N services where N is more than one may be accomplished splitting the service in half N-1 times.

So let’s bi (two) lithify (rock) this monolith, and take it from 1 to 2 rocks.

The task of splitting the service into two parts ought to be a manageable amount of work — two is a definitively finite number, as composed to the infinite point-cloud of “microservices”. Thus, we should search, first, for a single logical seam along which we might cleave the monolith.

a circle with the word “monolith” on it and a line through it

In many cases—as in the specific case that r0ml gave—the easiest way to articulate a boundary between two parts of a piece of software is to conceptualize a “frontend” and a “backend”. In the absence of any other clear boundary, the question “does this functionality belong in the front end or the back end” can serve as a simple razor for separating the service.

Remember: the reason we’re splitting this software up is because we are also splitting the team up. You need to think about this in terms of people as well as in terms of functionality. What division between halves would most reduce the number of lines of communication, to reduce the quadratic increase in required communication relationships that comes along with the linear increase in team size? Can you identify two groups who need to talk amongst themselves, but do not need to talk with all of each other?2

two circles with the word “hemilith” on them and a double-headed arrow
between them

Once you’ve achieved this separation, we no longer have a single rock, we have two half-rocks: hemiliths to borrow from the same Greek root that gave us “monolith”.

But we are not finished, of course. Two may not be the correct number of services to end up with. Now, we ask: can we split the frontend into a frontend and backend? Can we split the backend? If so, then we now have four rocks in place of our original one:

four circles with the word “tetartolith” on them and double-headed arrows
connecting them all

You might think that this would be a “tetralith” for “four”, but as they are of a set, they are more properly a tetartolith.

Repeat As Necessary

At some point, you’ll hit a point where you’re looking at a service and asking “what are the two pieces I could split this service into?”, and the answer will be “none, it makes sense as a single piece”. At that point, you will know that you’ve achieved services of the correct size.

One thing about this insight that may disappoint some engineers is the realization that service-oriented architecture is much more an engineering management tool than it is an engineering tool. It’s fun to think that “microservices” will let you play around with weird technologies and niche programming languages consequence-free at your day job because those can all be “separate services”, but that was always a fantasy. Integrating multiple technologies is expensive, and introducing more moving parts always introduces more failure points.

Advanced Techniques: A Multi-Stack Microservice Environment

You’ll note that splitting a service heavily implies that the resulting services will still all be in the same programming language and the same tech stack as before. If you’re interested in deploying multiple stacks (languages, frameworks, libraries), you can proceed to that outcome via bilithification, but it is a multi-step process.

First, you have to complete the strategy that I’ve already outlined above. You need to get to a service that is sufficiently granular that it is atomic; you don’t want to split it up any further.

Let’s call that service “X”.

Second, you identify the additional complexity that would be introduced by using a different tech stack. It’s important to be realistic here! New technology always seems fun, and if you’re investigating this, you’re probably predisposed to think it would be an improvement. So identify your costs first and make sure you have them enumerated before you move on to the benefits.

Third, identify the concrete benefits to X’s problem domain that the new tech stack would provide.

Finally, do a cost-benefit analysis where you make sure that the costs from step 2 are clearly exceeded by the benefits from step three. If you can’t readily identify that in advance – sometimes experimentation is required — then you need to treat this as an experiment, rather than as a strategic direction, until you’ve had a chance to answer whatever questions you have about the new technology’s benefits benefits.

Note, also, that this cost-benefit analysis requires not only doing the technical analysis but getting buy-in from the entire team tasked with maintaining that component.

Conclusion

To summarize:

  1. Always start with a monolith.
  2. When the monolith is too big, both in terms of team and of codebase, split the monolith in half until it doesn’t make sense to split it in half any more.
  3. (Optional) Carefully evaluate services that want to adopt new technologies, and keep the costs of doing that in mind.

There is, of course, a world of complexity beyond this associated with managing the cost of a service-oriented architecture and solving specific technical problems that arise from that architecture.

If you remember the tetartolith, though, you should at least be able to get to the right number and size of services for your system.


Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from more specificity on the sort of insight you've seen here.


  1. AKA “my father” 

  2. Denouncing “silos” within organizations is so common that it’s a tired trope at this point. There is no shortage of vaguely inspirational articles across the business trade-rag web and on LinkedIn exhorting us to “break down silos”, but silos are the point of having an organization. If everybody needs to talk to everybody else in your entire organization, if no silos exist to prevent the necessity of that communication, then you are definitionally operating that organization at minimal efficiency. What people actually want when they talk about “breaking down silos” is a re-org into a functional hierarchy rather than a role-oriented hierarchy (i.e., “this is the team that makes and markets the Foo product, this is the team that makes and markets the Bar product” as opposed to “this is the sales team”, “this is the engineering team”). But that’s a separate post, probably. 

Post-PyCon-US 2023 Notes

Some stream of consciousness post-conference notes.

PyCon 2023 was last week, and I wanted to write some notes on it while the memory is fresh. Much of this was jotted down on the plane ride home and edited a few days later.

Health & Safety

Even given my smaller practice run at PyBay, it was a bit weird for me to be back around so many people, given that it was all indoors.

However, it was very nice that everyone took masking seriously. I personally witnessed very few violations of the masking rules, and they all seemed to be momentary, unintentional slip-ups after eating or drinking something.

As a result, I’ve now been home for 4 full days, am COVID negative and did not pick up any more generic con crud. It’s really nice to be feeling healthy after a conference!

Overall Vibe

I was a bit surprised to find the conference much more overwhelming than I remembered it being. It’s been 4 years since my last PyCon; I was out of practice! It was also odd since last year was in person and at the same venue, so most folks had a sense of Salt Lake, and I really didn’t.

I think this was good, since I’ll remember this experience and have a fresher sense of what it feels like (at least a little bit) to be a new attendee next year.

The Schedule

I only managed to attend a few talks, but every one was excellent. In case you were not aware, un-edited livestream VODs of the talks are available with your online ticket, in advance of the release of the final videos on the YouTube channel, so if you missed these but you attended the conference you can still watch them1

My Talk

My talk, “How To Keep A Secret”, seemed to be very well received.2

I got to talk to a lot of people who said they learned things from it. I had the idea to respond to audience feedback by asking “will you be doing anything differently as a result of seeing the talk?” and so I got to hear about which specific information was actually useful to help improve the audience’s security posture. I highly recommend this follow-up question to other speakers in the future.

As part of the talk, I released and announced 2 projects related to its topic of better security posture around secrets-management:

  • PINPal, a little spaced-repetition tool to help you safely rotate your “core” passwords, the ones you actually need to memorize.

  • TokenRing, a backend for the keyring module which uses a hardware token to require user presence for any secret access, by encrypting your vault and passwords as Fernet tokens.

I also called for donations to a local LGBT+ charity in Salt Lake City and made a small matching donation, to try to help the conference have a bit of a positive impact on the area’s trans population, given the extremely bigoted law passed by the state legislature in the run-up to the conference.

We raised $330 in total3, and I think other speakers were making similar calls. Nobody wanted any credit; everyone who got in touch and donated just wanted to help out.

Open Spaces

I went to a couple of open spaces that were really engaging and thought-provoking.

  • Hynek hosted one based on his talk (which is based on this blog post) where we explored some really interesting case-studies in replacing subclassing with composition.

  • There was a “web framework maintainers” open-space hosted by David Lord, which turned into a bit of a group therapy session amongst framework maintainers from Flask, Django, Klein (i.e. Twisted), and Sanic. I had a few key takeaways from this one:

    • We should try to keep our users in the loop with what is going on in the project. Every project should have a project blog so that users have a single point of contact.

      • It turns out Twisted does actually have one of these. But we should actually post updates to that blog so that users can see new developments. We have forgotten to even post.

      • We should repeatedly drive users to those posts, from every communications channel; social media (mastodon, twitter), chat (discord, IRC, matrix, gitter), or mailing lists. We should not be afraid to repeat ourselves a bit. We’re often afraid to spam our users but there’s a lot of room between where we are now — i.e. “users never hear from us” — and spamming them.

    • We should regularly remind ourselves, and each other, that any work doing things like ticket triage, code review, writing for the project blog, and writing the project website are valuable work. We all kinda know this already, but psychologically it just feels like ancillary “stuff” that isn’t as real as the coding itself.

    • We should find ways to recognize contributions, especially the aforementioned less-visible stuff, like people who hang out in chat and patiently direct users to the appropriate documentation or support channels.

The Sprints

The sprints were not what I expected. I sat down thinking I’d be slogging through some Twisted org GitHub Actions breakage on Klein and Treq, but what I actually did was:

  • Request an org on the recently-released PyPI “Organizations” feature, got it approved, and started adding a few core contributors.

  • Have some lovely conversations with PyCon and PSF staff about several potential projects that I think could really help the ecosystem. I don’t want to imply anyone has committed to anything here, so I’ll leave a description of exactly what those were for later.

  • Filed a series of issues against BeeWare™ Briefcase™ detailing exactly what I needed from Encrust that wasn’t already provided by Briefcase’s existing Mac support.

  • I also did much more than I expected on Pomodouroboros, including:

  • I talked to my first in-the-wild Pomodouroboros user, someone who started using the app early enough to get bitten by a Pickle data-migration bug and couldn’t upgrade! I’d forgotten that I’d released a version that modeled time as a float rather than a datetime.

  • Started working on a design with Moshe Zadka for integrations for external time-tracking services and task-management services.

  • I had the opportunity to review datetype with Paul Ganssle and explore options for integrating it with or recommending it from the standard library, to hopefully start to address the both the datetime-shouldn’t-subclass-date problem and the how-do-you-know-if-a-datetime-is-timezone-aware problem.

  • Speaking of Twisted infrastructure maintenance, special thanks to Paul Kehrer, who noticed that pyasn1 was breaking Twisted’s CI, and submitted a PR to fix it. I finally managed to do a review a few days after the conference and that’s landed now.

Everything Else

I’m sure I’m forgetting at least a half a dozen other meaningful interactions that I had; the week was packed, and I talked to lots of interesting people as always.

See you next year in Pittsburgh!.


  1. Go to your dashboard and click the “Join PyCon US 2023 Online Now!” button at the top of the page, then look for the talk on the “agenda” tab or the speaker in the search box on the right. 

  2. Talks like these and software like PINPal and TokenRing are the sorts of things things that I hope to get support for from my Patreon, so please go there if you’d like to support my continuing to do this sort of work. 

  3. If you’d like to make that number bigger, I’ll do another $100 match on this blog post, and update that paragraph if I receive anything; just send the receipt to encircle@glyph.im. A reader sent in another matching donation and I made a contribution, so the total raised is now $530.