Python Packaging Is Good Now

setup.py is your friend. It’s real sorry about what happened last time.

Okay folks. Time’s up. It’s too late to say that Python’s packaging ecosystem terrible any more. I’m calling it.

Python packaging is not bad any more. If you’re a developer, and you’re trying to create or consume Python libraries, it can be a tractable, even pleasant experience.

I need to say this, because for a long time, Python’s packaging toolchain was … problematic. It isn’t any more, but a lot of people still seem to think that it is, so it’s time to set the record straight.

If you’re not familiar with the history it went something like this:

The Dawn

Python first shipped in an era when adding a dependency meant a veritable Odyssey into cyberspace. First, you’d wait until nobody in your whole family was using the phone line. Then you’d dial your ISP. Once you’d finished fighting your SLIP or PPP client, you’d ask a netnews group if anyone knew of a good gopher site to find a library that could solve your problem. Once you were done with that task, you’d sign off the Internet for the night, and wait about 48 hours too see if anyone responded. If you were lucky enough to get a reply, you’d set up a download at the end of your night’s web-surfing.

pip search it wasn’t.

For the time, Python’s approach to dependency-handling was incredibly forward-looking. The import statement, and the pluggable module import system, made it easy to get dependencies from wherever made sense.

In Python 2.01, Distutils was introduced. This let Python developers describe their collections of modules abstractly, and added tool support to producing redistributable collections of modules and packages. Again, this was tremendously forward-looking, if somewhat primitive; there was very little to compare it to at the time.

Fast forwarding to 2004; setuptools was created to address some of the increasingly-common tasks that open source software maintainers were facing with distributing their modules over the internet. In 2005, it added easy_install, in order to provide a tool to automate resolving dependencies and downloading them into the right locations.

The Dark Age

Unfortunately, in addition to providing basic utilities for expressing dependencies, setuptools also dragged in a tremendous amount of complexity. Its author felt that import should do something slightly different than what it does, so installing setuptools changed it. The main difference between normal import and setuptools import was that it facilitated having multiple different versions of the same library in the same program at the same time. It turns out that that’s a dumb idea, but in fairness, it wasn’t entirely clear at the time, and it is certainly useful (and necessary!) to be able to have multiple versions of a library installed onto a computer at the same time.

In addition to these idiosyncratic departures from standard Python semantics, setuptools suffered from being unmaintained. It became a critical part of the Python ecosystem at the same time as the author was moving on to other projects entirely outside of programming. No-one could agree on who the new maintainers should be for a long period of time. The project was forked, and many operating systems’ packaging toolchains calcified around a buggy, ancient version.

From 2008 to 2012 or so, Python packaging was a total mess. It was painful to use. It was not clear which libraries or tools to use, which ones were worth investing in or learning. Doing things the simple way was too tedious, and doing things the automated way involved lots of poorly-documented workarounds and inscrutable failure modes.

This is to say nothing of the fact that there were critical security flaws in various parts of this toolchain. There was no practical way to package and upload Python packages in such a way that users didn’t need a full compiler toolchain for their platform.

To make matters worse for the popular perception of Python’s packaging prowess2, at this same time, newer languages and environments were getting a lot of buzz, ones that had packaging built in at the very beginning and had a much better binary distribution story. These environments learned lessons from the screw-ups of Python and Perl, and really got a lot of things right from the start.

Finally, the Python Package Index, the site which hosts all the open source packages uploaded by the Python community, was basically a proof-of-concept that went live way too early, had almost no operational resources, and was offline all the dang time.

Things were looking pretty bad for Python.


Intermission

Here is where we get to the point of this post - this is where popular opinion about Python packaging is stuck. Outdated information from this period abounds. Blog posts complaining about problems score high in web searches. Those who used Python during this time, but have now moved on to some other language, frequently scoff and dismiss Python as impossible to package, its packaging ecosystem as broken, PyPI as down all the time, and so on. Worst of all, bad advice for workarounds which are no longer necessary are still easy to find, which causes users to pre-emptively break their environments where they really don’t need to.


From The Ashes

In the midst of all this brokenness, there were some who were heroically, quietly, slowly fixing the mess, one gnarly bug-report at a time. pip was started, and its various maintainers fixed much of easy_install’s overcomplexity and many of its flaws. Donald Stufft stepped in both on Pip and PyPI and improved the availability of the systems it depended upon, as well as some pretty serious vulnerabilities in the tool itself. Daniel Holth wrote a PEP for the wheel format, which allows for binary redistribution of libraries. In other words, it lets authors of packages which need a C compiler to build give their users a way to not have one.

In 2013, setuptools and distribute un-forked, providing a path forward for operating system vendors to start updating their installations and allowing users to use something modern.

Python Core started distributing the ensurepip module along with both Python 2.7 and 3.3, allowing any user with a recent Python installed to quickly bootstrap into a sensible Python development environment with a one-liner.

A New Renaissance

I won’t give you a full run-down of the state of the packaging art. There’s already a website for that. I will, however, give you a précis of how much easier it is to get started nowadays. Today, if you want to get a sensible, up-to-date python development environment, without administrative privileges, all you have to do is:

1
2
3
$ python -m ensurepip --user
$ python -m pip install --user --upgrade pip
$ python -m pip install --user --upgrade virtualenv

Then, for each project you want to do, make a new virtualenv:

1
2
3
$ python -m virtualenv lets-go
$ . ./lets-go/bin/activate
(lets-go) $ _

From here on out, now the world is your oyster; you can pip install to your heart’s content, and you probably won’t even need to compile any C for most packages. These instructions don’t depend on Python version, either: as long as it’s up-to-date, the same steps work on Python 2, Python 3, PyPy and even Jython. In fact, often the ensurepip step isn’t even necessary since pip comes preinstalled. Running it if it’s unnecessary is harmless, even!

Other, more advanced packaging operations are much simpler than they used to be, too.

  • Need a C compiler? OS vendors have been working with the open source community to make this easier across the board:
    1
    2
    3
    4
    5
    $ apt install build-essential python-dev # ubuntu
    $ xcode-select --install # macOS
    $ dnf install @development-tools python-devel # fedora
    C:\> REM windows
    C:\> start https://www.microsoft.com/en-us/download/details.aspx?id=44266
    

Okay that last one’s not as obvious as it ought to be but they did at least make it freely available!

  • Want to upload some stuff to PyPI? This should do it for almost any project:

    1
    2
    3
    $ pip install twine
    $ python setup.py sdist bdist_wheel
    $ twine upload dist/*
    
  • Want to build wheels for the wild and wooly world of Linux? There’s an app4 for that.

Importantly, PyPI will almost certainly be online. Not only that, but a new, revamped site will be “launching” any day now3.

Again, this isn’t a comprehensive resource; I just want to give you an idea of what’s possible. But, as a deeply experienced Python expert I used to swear at these tools six times a day for years; the most serious Python packaging issue I’ve had this year to date was fixed by cleaning up my git repo to delete a cache file.

Work Still To Do

While the current situation is good, it’s still not great.

Here are just a few of my desiderata:

  • We still need better and more universally agreed-upon tooling for end-user deployments.
  • Pip should have a GUI frontend so that users can write Python stuff without learning as much command-line arcana.
  • There should be tools that help you write and update a setup.py. Or a setup.python.json or something, so you don’t actually need to write code just to ship some metadata.
  • The error messages that you get when you try to build something that needs a C compiler and it doesn’t work should be clearer and more actionable for users who don’t already know what they mean.
  • PyPI should automatically build wheels for all platforms by default when you upload sdists; this is a huge project, of course, but it would be super awesome default behavior.

I could go on. There are lots of ways that Python packaging could be better.

The Bottom Line

The real takeaway here though, is that although it’s still not perfect, other languages are no longer doing appreciably better. Go is still working through a number of different options regarding dependency management and vendoring, and, like Python extensions that require C dependencies, CGo is sometimes necessary and always a problem. Node has had its own well-publicized problems with their dependency management culture and package manager. Hackage is cool and all but everything takes a literal geological epoch to compile.

As always, I’m sure none of this applies to Rust and Cargo is basically perfect, but that doesn’t matter, because nobody reading this is actually using Rust.

My point is not that packaging in any of these languages is particularly bad. They’re all actually doing pretty well, especially compared to the state of the general programming ecosystem a few years ago; many of them are making regular progress towards user-facing improvements.

My point is that any commentary suggesting they’re meaningfully better than Python at this point is probably just out of date. Working with Python packaging is more or less fine right now. It could be better, but lots of people are working on improving it, and the structural problems that prevented those improvements from being adopted by the community in a timely manner have almost all been addressed.

Go! Make some virtualenvs! Hack some setup.pys! If it’s been a while and your last experience was really miserable, I promise, it’s better now.


Am I wrong? Did I screw up a detail of your favorite language? Did I forget to mention the one language environment that has a completely perfect, flawless packaging story? Do you feel the need to just yell at a stranger on the Internet about picayune details? Feel free to get in touch!


  1. released in October, 2000 

  2. say that five times fast. 

  3. although I’m not sure what it means to “launch” when the site is online, and running against the production data-store, and you can use it for pretty much everything... 

  4. “app” meaning of course “docker container” 

The One Python Library Everyone Needs

Use attrs. Use it. Use it for everything.

attrs is still an excellent library, and it still has much to recommend it over the standard library for many applications. However, as of this update in 2023, much new code — perhaps even most of it — should just be using dataclasses. If you want to see where my thinking has moved on to in the intervening 7 years, you may want to check out this more recent post. To make a long story short though, I was right, and attrs set us on the road to a major upgrade to best practices for class definition in Python.

Do you write programs in Python? You should be using attrs.

Why, you ask? Don’t ask. Just use it.

Okay, fine. Let me back up.

I love Python; it’s been my primary programming language for 10+ years and despite a number of interesting developments in the interim I have no plans to switch to anything else.

But Python is not without its problems. In some cases it encourages you to do the wrong thing. Particularly, there is a deeply unfortunate proliferation of class inheritance and the God-object anti-pattern in many libraries.

One cause for this might be that Python is a highly accessible language, so less experienced programmers make mistakes that they then have to live with forever.

But I think that perhaps a more significant reason is the fact that Python sometimes punishes you for trying to do the right thing.

The “right thing” in the context of object design is to make lots of small, self-contained classes that do one thing and do it well. For example, if you notice your object is starting to accrue a lot of private methods, perhaps you should be making those “public”1 methods of a private attribute. But if it’s tedious to do that, you probably won’t bother.

Another place you probably should be defining an object is when you have a bag of related data that needs its relationships, invariants, and behavior explained. Python makes it soooo easy to just define a tuple or a list. The first couple of times you type host, port = ... instead of address = ... it doesn’t seem like a big deal, but then soon enough you’re typing [(family, socktype, proto, canonname, sockaddr)] = ... everywhere and your life is filled with regret. That is, if you’re lucky. If you’re not lucky, you’re just maintaining code that does something like values[0][7][4][HOSTNAME][“canonical”] and your life is filled with garden-variety pain rather than the more complex and nuanced emotion of regret.


This raises the question: is it tedious to make a class in Python? Let’s look at a simple data structure: a 3-dimensional cartesian coordinate. It starts off simply enough:

1
class Point3D(object):

So far so good. We’ve got a 3 dimensional point. What next?

1
2
class Point3D(object):
    def __init__(self, x, y, z):

Well, that’s a bit unfortunate. I just want a holder for a little bit of data, and I’ve already had to override a special method from the Python runtime with an internal naming convention? Not too bad, I suppose; all programming is weird symbols after a fashion.

At least I see my attribute names in there, that makes sense.

1
2
3
class Point3D(object):
    def __init__(self, x, y, z):
        self.x

I already said I wanted an x, but now I have to assign it as an attribute...

1
2
3
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x

... to x? Uh, obviously ...

1
2
3
4
5
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

... and now I have to do that once for every attribute, so this actually scales poorly? I have to type every attribute name 3 times?!?

Oh well. At least I’m done now.

1
2
3
4
5
6
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):

Wait what do you mean I’m not done.

1
2
3
4
5
6
7
8
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))

Oh come on. So I have to type every attribute name 5 times, if I want to be able to see what the heck this thing is when I’m debugging, which a tuple would have given me for free?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)

7 times?!?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)

9 times?!?!?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from functools import total_ordering
@total_ordering
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)

Okay, whew - 2 more lines of code isn’t great, but now at least we don’t have to define all the other comparison methods. But now we’re done, right?

1
2
from unittest import TestCase
class Point3DTests(TestCase):

You know what? I’m done. 20 lines of code so far and we don’t even have a class that does anything; the hard part of this problem was supposed to be the quaternion solver, not “make a data structure which can be printed and compared”. I’m all in on piles of undocumented garbage tuples, lists, and dictionaries it is; defining proper data structures well is way too hard in Python.2


namedtuple to the (not really) rescue

The standard library’s answer to this conundrum is namedtuple. While a valiant first draft (it bears many similarities to my own somewhat embarrassing and antiquated entry in this genre) namedtuple is unfortunately unsalvageable. It exports a huge amount of undesirable public functionality which would be a huge compatibility nightmare to maintain, and it doesn’t address half the problems that one runs into. A full enumeration of its shortcomings would be tedious, but a few of the highlights:

  • Its fields are accessable as numbered indexes whether you want them to be or not. Among other things, this means you can’t have private attributes, because they’re exposed via the apparently public __getitem__ interface.
  • It compares equal to a raw tuple of the same values, so it’s easy to get into bizarre type confusion, especially if you’re trying to use it to migrate away from using tuples and lists.
  • It’s a tuple, so it’s always immutable. Sort of.

As to that last point, either you can use it like this:

1
Point3D = namedtuple('Point3D', ['x', 'y', 'z'])

in which case it doesn’t look like a type in your code; simple syntax-analysis tools without special cases won’t recognize it as one. You can’t give it any other behaviors this way, since there’s nowhere to put a method. Not to mention the fact that you had to type the class’s name twice.

Alternately you can use inheritance and do this:

1
2
class Point3D(namedtuple('_Point3DBase', 'x y z'.split()])):
    pass

This gives you a place you can put methods, and a docstring, and generally have it look like a class, which it is... but in return you now have a weird internal name (which, by the way, is what shows up in the repr, not the class’s actual name). However, you’ve also silently made the attributes not listed here mutable, a strange side-effect of adding the class declaration; that is, unless you add __slots__ = 'x y z'.split() to the class body, and then we’re just back to typing every attribute name twice.

And this doesn’t even mention the fact that science has proven that you shouldn’t use inheritance.

So, namedtuple can be an improvement if it’s all you’ve got, but only in some cases, and it has its own weird baggage.


Enter The attr

So here’s where my favorite mandatory Python library comes in.

Let’s re-examine the problem above. How do I make Point3D with attrs?

1
2
import attr
@attr.s

Since this isn’t built into the language, we do have to have 2 lines of boilerplate to get us started: the import and the decorator saying we’re about to use it.

1
2
3
import attr
@attr.s
class Point3D(object):

Look, no inheritance! By using a class decorator, Point3D remains a Plain Old Python Class (albeit with some helpful double-underscore methods tacked on, as we’ll see momentarily).

1
2
3
4
import attr
@attr.s
class Point3D(object):
    x = attr.ib()

It has an attribute called x.

1
2
3
4
5
6
import attr
@attr.s
class Point3D(object):
    x = attr.ib()
    y = attr.ib()
    z = attr.ib()

And one called y and one called z and we’re done.

We’re done? Wait. What about a nice string representation?

1
2
>>> Point3D(1, 2, 3)
Point3D(x=1, y=2, z=3)

Comparison?

1
2
3
4
5
6
>>> Point3D(1, 2, 3) == Point3D(1, 2, 3)
True
>>> Point3D(3, 2, 1) == Point3D(1, 2, 3)
False
>>> Point3D(3, 2, 3) > Point3D(1, 2, 3)
True

Okay sure but what if I want to extract the data defined in explicit attributes in a format appropriate for JSON serialization?

1
2
>>> attr.asdict(Point3D(1, 2, 3))
{'y': 2, 'x': 1, 'z': 3}

Maybe that last one was a little on the nose. But nevertheless, it’s one of many things that becomes easier because attrs lets you declare the fields on your class, along with lots of potentially interesting metadata about them, and then get that metadata back out.

1
2
3
4
5
>>> import pprint
>>> pprint.pprint(attr.fields(Point3D))
(Attribute(name='x', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='y', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='z', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None))

I am not going to dive into every interesting feature of attrs here; you can read the documentation for that. Plus, it’s well-maintained, so new goodies show up every so often and I might miss something important. But attrs does a few key things that, once you have them, you realize that Python was sorely missing before:

  1. It lets you define types concisely, as opposed to the normally quite verbose manual def __init__.... Types without typing.
  2. It lets you say what you mean directly with a declaration rather than expressing it in a roundabout imperative recipe. Instead of “I have a type, it’s called MyType, it has a constructor, in the constructor I assign the property ‘A’ to the parameter ‘A’ (and so on)”, you say “I have a type, it’s called MyType, it has an attribute called a”, and behavior is derived from that fact, rather than having to later guess about the fact by reverse engineering it from behavior (for example, running dir on an instance, or looking at self.__class__.__dict__).
  3. It provides useful default behavior, as opposed to Python’s sometimes-useful but often-backwards defaults.
  4. It adds a place for you to put a more rigorous implementation later, while starting out simple.

Let’s explore that last point.

Progressive Enhancement

While I’m not going to talk about every feature, I’d be remiss if I didn’t mention a few of them. As you can see from those mile-long repr()s for Attribute above, there are a number of interesting ones.

For example: you can validate attributes when they are passed into an @attr.s-ified class. Our Point3D, for example, should probably contain numbers. For simplicity’s sake, we could say that that means instances of float, like so:

1
2
3
4
5
6
7
import attr
from attr.validators import instance_of
@attr.s
class Point3D(object):
    x = attr.ib(validator=instance_of(float))
    y = attr.ib(validator=instance_of(float))
    z = attr.ib(validator=instance_of(float))

The fact that we were using attrs means we have a place to put this extra validation: we can just add type information to each attribute as we need it. Some of these facilities let us avoid other common mistakes. For example, this is a popular “spot the bug” Python interview question:

1
2
3
4
5
6
7
class Bag:
    def __init__(self, contents=[]):
        self._contents = contents
    def add(self, something):
        self._contents.append(something)
    def get(self):
        return self._contents[:]

Fixing it, of course, becomes this:

1
2
3
4
5
class Bag:
    def __init__(self, contents=None):
        if contents is None:
            contents = []
        self._contents = contents

adding two extra lines of code.

contents inadvertently becomes a global varible here, making all Bag objects not provided with a different list share the same list. With attrs this instead becomes:

1
2
3
4
5
6
7
@attr.s
class Bag:
    _contents = attr.ib(default=attr.Factory(list))
    def add(self, something):
        self._contents.append(something)
    def get(self):
        return self._contents[:]

There are several other features that attrs provides you with opportunities to make your classes both more convenient and more correct. Another great example? If you want to be strict about extraneous attributes on your objects (or more memory-efficient on CPython), you can just pass slots=True at the class level - e.g. @attr.s(slots=True) - to automatically turn your existing attrs declarations a matching __slots__ attribute. All of these handy features allow you to make better and more powerful use of your attr.ib() declarations.


The Python Of The Future

Some people are excited about eventually being able to program in Python 3 everywhere. What I’m looking forward to is being able to program in Python-with-attrs everywhere. It exerts a subtle, but positive, design influence in all the codebases I’ve seen it used in.

Give it a try: you may find yourself surprised at places where you’ll now use a tidily explained class, where previously you might have used a sparsely-documented tuple, list, or a dict, and endure the occasional confusion from co-maintainers. Now that it’s so easy to have structured types that clearly point in the direction of their purpose (in their __repr__, in their __doc__, or even just in the names of their attributes), you might find you’ll use a lot more of them. Your code will be better for it; I know mine has been.


  1. Scare quotes here because the attributes aren’t meaningfully exposed to the caller, they’re just named publicly. This pattern, getting rid of private methods entirely and having only private attributes, probably deserves its own post... 

  2. And we hadn’t even gotten to the really exciting stuff yet: type validation on construction, default mutable values... 

Far too many things can stop the BLOB

Local mutable filesystem usage is a scalability problem.

It occurs to me that the lack of a standard, well-supported, memory-efficient interface for BLOBs in multiple programming languages is one of the primary driving factors of poor scalability characteristics of open source SaaS applications.

Applications like Gitlab, Redmine, Trac, Wordpress, and so on, all need to store potentially large files (“attachments”). Frequently, they elect to store these attachments (at least by default) in a dedicated filesystem directory. This leads to a number of tricky concurrency issues, as the filesystem has different (and divorced) concurrency semantics from the backend database, and resides only on the individual API nodes, rather than in the shared namespace of the attached database.

Some databases do support writing to BLOBs like files. Postgres, SQLite, and Oracle do, although it seems MySQL lags behind in this area (although I’d love to be corrected on this front). But many higher-level API bindings for these databases don’t expose support for BLOBs in an efficient way.

Directly using the filesystem, as opposed to a backing service, breaks the “expected” scaling behavior of the front-end portion of a web application. Using an object store, like Cloud Files or S3, is a good option to achieve high scalability for public-facing applications, but that creates additional deployment complexity.

So, as both a plea to others and a note to myself: if you’re writing a database-backed application that needs to store some data, please consider making “store it in the database as BLOBs” an option. And if your particular database client library doesn’t support it, consider filing a bug.

Stop Working So Hard

In response to a thoughtful reply from John Carmack, I share some thoughts on why we all need to stop working so damn hard.

Recently, I saw this tweet where John Carmack posted to a thread on Hacker News about working hours. As this post propagated a good many bad ideas about working hours, particularly in the software industry, I of course had to reply. After some further back-and-forth on Twitter, Carmack followed up.

First off, thanks to Mr. Carmack for writing such a thorough reply in good faith. I suppose internet arguments have made me a bit cynical in that I didn’t expect that. I still definitely don’t agree, but I think there’s a legitimate analysis of the available evidence there now, at least.

When trying to post this reply to HN, I was told that the comment was too long, and I suppose it is a bit long for a comment. So, without further ado, here are my further thoughts on working hours.

... if only the workers in Greece would ease up a bit, they would get the productivity of Germany. Would you make that statement?

Not as such, no. This is a hugely complex situation mixing together finance, culture, management, international politics, monetary policy, and a bunch of other things. That study, and most of the others I linked to, is interesting in that it confirms the general model of ability-to-work (i.e. “concentration” or “willpower”) as a finite resource that you exhaust throughout the day; not in that “reduction in working hours” is a panacea solution. Average productivity-per-hour-worked would definitely go up.

However, I do believe (and now we are firmly off into interpretation-of-results territory, I have nothing empirical to offer you here) that if the average Greek worker were less stressed to the degree of the average German one, combining issues like both overwork and the presence of a constant catastrophic financial crisis in the news, yes; they’d achieve equivalent productivity.

Total net productivity per worker, discounting for any increases in errors and negative side effects, continues increasing well past 40 hours per week. ... Only when you are so broken down that even when you come back the following day your productivity per hour is significantly impaired, do you open up the possibility of actually reducing your net output.

The trouble here is that you really cannot discount for errors and negative side effects, especially in the long term.

First of all, the effects of overwork (and attendant problems, like sleep deprivation) are cumulative. While productivity on a given day increases past 40 hours per week, if you continue to work more, you productivity will continue to degrade. So, the case where “you come back the following day ... impaired” is pretty common... eventually.

Since none of this epidemiological work tracks individual performance longitudinally there are few conclusive demonstrations of this fact, but lots of compelling indications; in the past, I’ve collected quantitative data on myself (and my reports, back when I used to be a manager) that strongly corroborates this hypothesis. So encouraging someone to work one sixty-hour week might be a completely reasonable trade-off to address a deadline; but building a culture where asking someone to work nights and weekends as a matter of course is inherently destructive. Once you get into the area where people are losing sleep (and for people with other responsibilities, it’s not hard to get to that point) overwork starts impacting stuff like the ability to form long-term memories, which means that not only do you do less work, you also consistently improve less.

Furthermore, errors and negative side effects can have a disproportionate impact.

Let me narrow the field here to two professions I know a bit about and are germane to this discussion; one, health care, which the original article here starts off by referencing, and two, software development, with which we are both familiar (since you already raised the Mythical Man Month).

In medicine, you can do a lot of valuable life-saving work in a continuous 100-hour shift. And in fact residents are often required to do so as a sort of professional hazing ritual. However, you can also make catastrophic mistakes that would cost a person their life; this happens routinely. Study after study confirms this, and minor reforms happen, but residents are still routinely abused and made to work in inhumane conditions that have catastrophic outcomes for their patients.

In software, defects can be extremely expensive to fix. Not only are they hard to fix, they can also be hard to detect. The phenomenon of the Net Negative Producing Programmer also indicates that not only can productivity drop to zero, it can drop below zero. On the anecdotal side, anyone who has had the unfortunate experience of cleaning up after a burnt-out co-worker can attest to this.

There are a great many tasks where inefficiency grows significantly with additional workers involved; the Mythical Man Month problem is real. In cases like these, you are better off with a smaller team of harder working people, even if their productivity-per-hour is somewhat lower.

The specific observation from the Mythical Man Month was that the number of communication links on a fully connected graph of employees increases geometrically whereas additional productivity (in the form of additional workers) increases linearly. If you have a well-designed organization, you can add people without requiring that your communication graph be fully connected.

But of course, you can’t always do that. And specifically you can’t do that when a project is already late: you already figured out how the work is going to be divided. Brooks’ Law is formulated as: “Adding manpower to a late software project makes it later.” This is indubitable. But one of the other famous quotes from this book is “The bearing of a child takes nine months, no matter how many women are assigned.”

The bearing of a child also takes nine months no matter how many hours a day the woman is assigned to work on it. So “in cases like these” my contention is that you are not “better off with ... harder working people”: you’re just screwed. Some projects are impossible and you are better off acknowledging the fact that you made unrealistic estimates and you are going to fail.

You called my post “so wrong, and so potentially destructive”, which leads me to believe that you hold an ideological position that the world would be better if people didn’t work as long. I don’t actually have a particularly strong position there; my point is purely about the effective output of an individual.

I do, in fact, hold such an ideological position, but I’d like to think that said position is strongly justified by the data available to me.

But, I suppose calling it “so potentially destructive” might have seemed glib, if you are really just looking at the microcosm of what one individual might do on one given week at work, and not at the broader cultural implications of this commentary. After all, as this discussion shows, if you are really restricting your commentary to a single person on a single work-week, the case is substantially more ambiguous. So let me explain why I believe it’s harmful, as opposed to merely being incorrect.

First of all, the problem is that you can’t actually ignore the broader cultural implications. This is Hacker News, and you are John Carmack; you are practically a cultural institution yourself, and by using this site you are posting directly into the broader cultural implications of the software industry.

Software development culture, especially in the USA, suffers from a long-standing culture of chronic overwork. Startup developers in their metaphorical (and sometimes literal) garages are lionized and then eventually mythologized for spending so many hours on their programs. Anywhere that it is celebrated, this mythology rapidly metastasizes into a severe problem; the Death March

Note that although the term “death march” is technically general to any project management, it applies “originally and especially in software development”, because this problem is worse in the software industry (although it has been improving in recent years) than almost anywhere else.

So when John Carmack says on Hacker News that “the effective output of an individual” will tend to increase with hours worked, that sends a message to many young and impressionable software developers. This is the exact same phenomenon that makes pop-sci writing terrible: your statement may be, in some limited context, and under some tight constraints, empirically correct, but it doesn’t matter because when you expand the parameters to the full spectrum of these people’s careers, it’s both totally false and also a reinforcement of an existing cognitive bias and cultural trope.

I can’t remember the name of this cognitive bias (and my Google-fu is failing me), but I know it exists. Let me call it the “I’m fine” bias. I know it exists because I have a friend who had the opportunity to go on a flight with NASA (on the Vomit Comet), and one of the more memorable parts of this experience that he related to me was the hypoxia test. The test involved basic math and spatial reasoning skills, but that test wasn’t the point: the real test was that they had to notice and indicate when the oxygen levels were dropping and indicate that to the proctor. Concentrating on the test, many people failed the first few times, because the “I’m fine” bias makes it very hard to notice that you are impaired.

This is true of people who are drunk, or people who are sleep deprived, too. Their abilities are quantifiably impaired, but they have to reach a pretty severe level of impairment before they notice.

So people who are overworked might feel generally bad but they don’t notice their productivity dropping until they’re way over the red line.

Combine this with the fact that most people, especially those already employed as developers, are actually quite hard-working and earnest (laziness is much more common as a rhetorical device than as an actual personality flaw) and you end up in a scenario where a good software development manager is responsible much more for telling people to slow down, to take breaks, and to be more realistic in their estimates, than to speed up, work harder, and put in more hours.

The trouble is this goes against the manager’s instincts as well. When you’re a manager you tend to think of things in terms of resources: hours worked, money to hire people, and so on. So there’s a constant nagging sensation for a manager to encourage people to work more hours in a day, so you can get more output (hours worked) out of your input (hiring budget). The problem here is that while all hours are equal, some hours are more equal than others. Managers have to fight against their own sense that a few more worked hours will be fine, and their employees’ tendency to overwork because they’re not noticing their own burnout, and upper management’s tendency to demand more.

It is into this roiling stew of the relentless impulse to “work, work, work” that we are throwing our commentary about whether it’s a good idea or not to work more hours in the week. The scales are weighted very heavily on one side already - which happens to be the wrong side in the first place - and while we’ve come back from the unethical and illegal brink we were at as an industry in the days of ea_spouse, software developers still generally work far too much.

If we were fighting an existential threat, say an asteroid that would hit the earth in a year, would you really tell everyone involved in the project that they should go home after 35 hours a week, because they are harming the project if they work longer?

Going back to my earlier explanation in this post about the cumulative impact of stress and sleep deprivation - if we were really fighting an existential threat, the equation changes somewhat. Specifically, the part of the equation where people can have meaningful downtime.

In such a situation, I would still want to make sure that people are as well-rested and as reasonably able to focus as they possibly can be. As you’ve already acknowledged, there are “increases in errors” when people are working too much, and we REALLY don’t want the asteroid-targeting program that is going to blow apart the asteroid that will wipe out all life on earth to have “increased errors”.

But there’s also the problem that, faced with such an existential crisis, nobody is really going to be able to go home and enjoy a fine craft beer and spend some time playing with their kids and come back refreshed at 100% the next morning. They’re going to be freaking out constantly about the comet, they’re going to be losing sleep over that whether they’re working or not. So, in such a situation, people should have the option to go home and relax if they’re psychologically capable of doing so, but if the option for spending their time that makes them feel the most sane is working constantly and sleeping under their desk, well, that’s the best one can do in that situation.

This metaphor is itself also misleading and out of place, though. There is also a strong cultural trend in software, especially in the startup ecosystem, to over-inflate the importance of what the company is doing - it is not “changing the world” to create a website for people to order room-service for their dogs - and thereby to catastrophize any threat to that goal. The vast majority of the time, it is inappropriate to either to sacrifice -- or to ask someone else to sacrifice -- health and well-being for short-term gains. Remember, given the cumulative effects of overwork, that’s all you even can get: short-term gains. This sacrifice often has a huge opportunity cost in other areas, as you can’t focus on more important things that might come along later.

In other words, while the overtime situation is complex and delicate in the case of an impending asteroid impact, there’s also the question of whether, at the beginning of Project Blow Up The Asteroid, I want everyone to be burnt out and overworked from their pet-hotel startup website. And in that case, I can say, unequivocally, no. I want them bright-eyed and bushy-tailed for what is sure to be a grueling project, no matter what the overtime policy is, that absolutely needs to happen. I want to make sure they didn’t waste their youth and health on somebody else’s stock valuation.

Python Option Types

Ask not for whom the NULL tolls; it tolls for thee.

Updated 2020-07-19: While many of the conclusions in this article remain valid and interesting, a few of them — particularly those related to raising run-time exceptions rather than using Nones to signal errors — have been subtly changed by the advent of mypy's ability to force the caller to check for None automatically. Effectively, when this was written, Python did not have real Optional[] types, and now, thanks to mypy, it does, so you should probably use them!

NULL has, rightly, been called a “billion dollar mistake”. If that is so, then None is a hundred million dollar mistake, at least.

Forgetting, for the moment, about the numerous pitfalls of a C-style NULL, Python’s None has a very significant problem of its own. Of course, the problem is not None itself; the fact that the default return value of a function is None is (in my humble opinion, at least) fine; it’s just a marker that means “nothing to see here, move along”. The problem arises from values which might be None, or might be some other, useful thing.

APIs present values in a number of ways. A value might be exposed as the return value of a method, an attribute of an object, or an entry in a collection data structure such as a list or dictionary. If a value presented in an API might be None or it might be something else, every single client of that API needs to check the type of the value that it’s calling before doing anything.

Since it is rude to use a “simple suite” (a line of code like if x: y() with no newline after the colon), that means the minimum number of lines of code for interacting with your API is now 4: one for the if statement, one for the then clause, and one for the else clause.

Worse than the code-bloat required here, the default behavior, if your forget to do this checking, is that it works sometimes (like when you’re testing it), and that other times (like when you put it into production), you get an unhelpful exception like this:

1
2
3
4
Traceback (most recent call last):
  File "<your code>", line 1, in <module>
    value.method()
AttributeError: 'NoneType' object has no attribute 'method'

Of course NoneType doesn’t have an attribute called method, but why is value a NoneType? Science may never know.

In languages with static type declarations, there’s a concept of an Option type. Simply put, in a language with option types, the API declares its result value as “maybe something, maybe null”, and then if the caller fails to account for the “null” case, it is a compile-time error.

Python doesn’t have this kind of ahead-of-time checking though, so what are we to do? In order of my own personal preference, here are three strategies for getting rid of maybe-None-maybe-not data types in your Python code.

1: Just Say No

Some APIs - especially those that require building deeply complex nested trees of data structures - use None as a way to provide a convenient mechanism for leaving a space for a future value to be filled out. Using such an API sometimes looks like this:

1
2
3
4
5
value = MyValue()
value.foo = 1
value.bar = 2
value.baz = 3
value.do_something()

In this case, the way to get rid of None is simple: just stop doing that. Instead of .foo having an implicit type of “int or None”, just make it always be int, like this:

1
2
3
4
value = MyValue(
    foo=1, bar=2, baz=3
)
value.do_something()

Or, if do_something is the only method you’re going to call with this data structure, opt for the even simpler:

1
do_something_with_value(foo=1, bar=2, baz=3)

If MyValue has dozens of fields that need to be initialized with different subsystems, so you actually want to pass around a partially-initialized value object, consider the Builder pattern, which would make this code look like the following:

1
2
3
4
5
6
builder = MyValueBuilder()
foo = builder.with_foo(1)
bar = foo.with_bar(2)
baz = bar.with_baz(3)
value = baz.build()
value.do_something()

This acknowledges that the partially-constructed MyValueBuilder is a different type than MyValue, and, crucially, if you look at its API documentation, it does not misleadingly appear to support the do_something operation which in fact requires foo, bar, and baz all be initialized.

Wherever possible, just require values of the appropriate type be passed in in the first place, and don’t ever default to None.

2: Make The Library Handle The Different States, Not The Caller

Sometimes, None is a placeholder indicating an implicit state machine, where the states are “initialized” and “not initialized”.

For example, imagine an RPC Client which may or may not be connected. You might have an API you have to use like this:

1
2
3
4
5
6
7
8
def ask_question(self):
    message = {"question":
               "what is the air speed velocity of an unladen swallow?"}
    if self.rpc_client.connection is None:
        self.outbound_messages.append(message)
        self.rpc_client.when_connected(self.flush_outbound_messages)
    else:
        self.rpc_client.send_message(message)

By leaking through the connection attribute, rpc_client is providing an incomplete abstraction and foisting off too much work to its callers. Instead, callers should just have to do this:

1
2
3
4
def ask_question(self):
    message = {"question":
               "what is the air speed velocity of an unladen swallow?"}
    self.rpc_client.send_message(message)

Internally, rpc_client still has to maintain a private _connection attribute which may or may not be present, but by hiding this implementation detail, we centralize the complexity associated with managing that state in one place, rather than polluting every caller with it, which makes for much better API design.

Hopefully you agree that this is a good idea, but this is more what to do rather than how to do it, so here are two strategies for achieving “make the library do it”:

2a: Use Placeholder Implementations

However, rather than using None as the _connection attribute, rpc_client’s internal implementation could instead use a placeholder which provides the same interface. Let’s the expected interface of _connection in this case is just a send method that takes some bytes. We could initialize it initially with this:

1
2
3
4
5
class NotConnectedConnection(object):
    def __init__(self):
        self._buffer = b""
    def send(self, data):
        self._buffer += b

This allows the code within rpc_client itself to blindly call self._connection.send whether it’s actually connected already or not; upon connection, it could un-buffer that data onto the ready connection.

2b: Use an Explicit State Machine

Sometimes, you actually have quite a few states you need to manage, and this starts looking like an ugly proliferation of lots of weird little flags; various values which may be True or False, or None or not-None.

In those cases it’s best to be clear about the fact that there are multiple states, and enumerating the valid transitions between them. Then, expose a method which always has the same signature and return type.

Using a state machine library like ClusterHQ’s “machinist” or my Automat can allow you to automate the process of checking all the states.

Automat, in particular, goes to great lengths to make your objects look like plain old Python objects. Providing an input is just calling a method on your object: my_state_machine.provide_an_input() and receiving an output is just examining its return value. So it’s possible to refactor your code away from having to check for None by using this library.

For example, the connection-handling example above could be dealt with in the RPC client using Automat like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class RPCClient(object):
    _machine = MethodicalMachine()
    @_machine.state()
    def _connected(self):
        "We have a connection."
    @_machine.state()
    def _not_connected(self):
        "We have no connection."
    @_machine.input()
    def send_message(self, message):
        "Send a message now if we're connected, or later if not."
    @_machine.output()
    def _send_message_now(self, message):
        "Send a message immediately."
        # ...
    @_machine.output()
    def _send_message_later(self, message):
        "Enqueue a message for when we are connected."
        # ...
    @_machine.output()
    def _send_queued_messages(self, connection):
        "send all messages enqueued by _send_message_later"
        # ...
    @_machine.input()
    def connection_established(self, connection):
        "A connection was established."
    _connected.upon(send_message, enter=_connected, output=[_send_message_now])
    _not_connected.upon(send_message, enter=_connected,
                        output=[_send_message_later])
    _not_connected.upon(connection_established, enter=_connected,
                        output=[_send_queued_messages])

3: Make The Caller Account For All Cases With Callbacks

The absolute lowest-level way to deal with multiple possible states is to, instead of exposing an attribute that the caller has to retrieve and test, expose a function which takes multiple callbacks, one for each case. This way you can provide clear and immediate error feedback if the caller forgets to handle a case - meaning that they forgot to pass a callback. This is only really suitable if you can’t think of any other way to handle it, but it does at least provide a very clear expectation of the interface.

To re-use our connection-handling logic above, you might do something like this:

1
2
3
4
5
6
7
def try_to_send(self):
    def connection_present(connection):
        connection.send_message(my_message)
    def connection_not_present():
        self.enqueue_message_for_later(my_message)
    self.rpc_client.with_connection(connection_present,
                                    connection_not_present)

Notice that while this is slightly awkward, it has the nice property that the connection_present callback receives the value that it needs, whereas the connection_not_present callback doesn’t receive anything, because there’s nothing for it to receive.

The Zeroth Strategy

Of course, the best strategy, if you can get away with it, may be the non-strategy: refuse the temptation to provide a maybe-None, just raise an exception when you are in a state where you can’t handle. If you intentionally raise a specific, meaningful exception type with a good error message, it will be a lot more pleasant to use your API than if return codes that the caller has to check for pop up all over the place, None or otherwise.

The Principle Of The Thing

The underlying principle here is the same: when designing an API, always provide a consistent interface to your callers. An API is 1 to N: you have 1 API implementation to N callers. As N→∞, it becomes more important that any task that needs performing frequently is performed on the “1” side of the equation, and that you don’t force callers to repeat the same error checking over and over again. None is just one form of this, but it is a particularly egregious form.