Stop Writing `__init__` Methods

The History

Before dataclasses were added to Python in version 3.7 — in June of 2018 — the __init__ special method had an important use. If you had a class representing a data structure — for example a 2DCoordinate, with x and y attributes — you would want to be able to construct it as 2DCoordinate(x=1, y=2), which would require you to add an __init__ method with x and y parameters.

The other options available at the time all had pretty bad problems:

You could remove 2DCoordinate from your public API and instead expose a make_2d_coordinate function and make it non-importable, but then how would you document your return or parameter types?
You could document the x and y attributes and make the user assign each one themselves, but then 2DCoordinate() would return an invalid object.
You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all 2DCoordinate objects to be not just mutable, but mutated at every call site.
You could fix the problems with option 1 by adding a new abstract class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse, typing.Protocol didn’t even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable.

Also, an __init__ method that does nothing but assign a few attributes doesn’t have any significant problems, so it is an obvious choice in this case. Given all the problems that I just described with the alternatives, it makes sense that it became the obvious default choice, in most cases.

However, by accepting “define a custom __init__” as the default way to allow users to create your objects, we make a habit of beginning every class with a pile of arbitrary code that gets executed every time it is instantiated.

Wherever there is arbitrary code, there are arbitrary problems.

The Problems

Let’s consider a data structure more complex than one that simply holds a couple of attributes. We will create one that represents a reference to some I/O in the external world: a FileReader.

Of course Python has its own open-file object abstraction, but I will be ignoring that for the purposes of the example.

Let’s assume a world where we have the following functions, in an imaginary fileio module:

open(path: str) -> int
read(fileno: int, length: int)
close(fileno: int)

Our hypothetical fileio.open returns an integer representing a file descriptor¹, fileio.read allows us to read length bytes from an open file descriptor, and fileio.close closes that file descriptor, invalidating it for future use.

With the habit that we have built from writing thousands of __init__ methods, we might want to write our FileReader class like this:

class FileReader:
    def __init__(self, path: str) -> None:
        self._fd = fileio.open(path)
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

For our initial use-case, this is fine. Client code creates a FileReader by doing something like FileReader("./config.json"), which always creates a FileReader that maintains its file descriptor int internally as private state. This is as it should be; we don’t want user code to see or mess with _fd, as that might violate FileReader’s invariants. All the necessary work to construct a valid FileReader — i.e. the call to open — is always taken care of for you by FileReader.__init__.

However, additional requirements will creep in, and as they do, FileReader.__init__ becomes increasingly awkward.

Initially we only care about fileio.open, but later, we may have to deal with a library that has its own reasons for managing the call to fileio.open by itself, and wants to give us an int that we use as our _fd, we now have to resort to weird workarounds like:

def reader_from_fd(fd: int) -> FileReader:
    fr = object.__new__(FileReader)
    fr._fd = fd
    return fr

Now, all those nice properties that we got from trying to force object construction to give us a valid object are gone. reader_from_fd’s type signature, which takes a plain int, has no way of even suggesting to client code how to ensure that it has passed in the right kind of int.

Testing is much more of a hassle, because we have to patch in our own copy of fileio.open any time we want an instance of a FileReader in a test without doing any real-life file I/O, even if we could (for example) share a single file descriptor among many FileReader s for testing purposes.

All of this also assumes a fileio.open that is synchronous. Although for literal file I/O this is more of a hypothetical concern, there are many types of networked resource which are really only available via an asynchronous (and thus: potentially slow, potentially error-prone) API. If you’ve ever found yourself wanting to type async def __init__(self): ... then you have seen this limitation in practice.

Comprehensively describing all the possible problems with this approach would end up being a book-length treatise on a philosophy of object oriented design, so I will sum up by saying that the cause of all these problems is the same: we are inextricably linking the act of creating a data structure with whatever side-effects are most often associated with that data structure. If they are “often” associated with it, then by definition they are not “always” associated with it, and all the cases where they aren’t associated become unweildy and potentially broken.

Defining an __init__ is an anti-pattern, and we need a replacement for it.

The Solutions

I believe this tripartite assemblage of design techniques will address the problems raised above:

using dataclass to define attributes,
replacing behavior that previously would have previously been in __init__ with a new classmethod that does the same thing, and
using precise types to describe what a valid instance looks like.

Using `dataclass` attributes to create an `init` for you

To begin, let’s refactor FileReader into a dataclass. This does get us an __init__ method, but it won’t be one an arbitrary one we define ourselves; it will get the useful constraint enforced on it that it will just assign attributes.

@dataclass
class FileReader:
    _fd: int
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

Except... oops. In fixing the problems that we created with our custom __init__ that calls fileio.open, we have re-introduced several problems that it solved:

We have removed all the convenience of FileReader("path"). Now the user needs to import the low-level fileio.open again, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build a FileReader in a practical scenario, we will have to add something in our documentation to point at a separate module entirely.
There’s no enforcement of the validity of _fd as a file descriptor; it’s just some integer, which the user could easily pass an incorrect instance of, with no error.

In isolation, dataclass by itself can’t solve all our problems, so let’s add in the second technique.

Using `classmethod` factories to create objects

We don’t want to require any additional imports, or require users to go looking at any other modules — or indeed anything other than FileReader itself — to figure out how to create a FileReader for its intended usage.

Luckily we have a tool that can easily address all of these concerns at once: @classmethod. Let’s define a FileReader.open class method:

from typing import Self
@dataclass
class FileReader:
    _fd: int
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(fileio.open(path))

Now, your callers can replace FileReader("path") with FileReader.open("path"), and get all the same benefits.

Additionally, if we needed to await fileio.open(...), and thus we needed its signature to be @classmethod async def open, we are freed from the constraint of __init__ as a special method. There is nothing that would prevent a @classmethod from being async, or indeed, from having any other modification to its return value, such as returning a tuple of related values rather than just the object being constructed.

Using `NewType` to address object validity

Next, let’s address the slightly trickier issue of enforcing object validity.

Our type signature calls this thing an int, and indeed, that is unfortunately what the lower-level fileio.open gives us, and that’s beyond our control. But for our own purposes, we can be more precise in our definitions, using NewType:

from typing import NewType
FileDescriptor = NewType("FileDescriptor", int)

There are a few different ways to address the underlying library, but for the sake of brevity and to illustrate that this can be done with zero run-time overhead, let’s just insist to Mypy that we have versions of fileio.open, fileio.read, and fileio.write which actually already take FileDescriptor integers rather than regular ones.

from typing import Callable
_open: Callable[[str], FileDescriptor] = fileio.open  # type:ignore[assignment]
_read: Callable[[FileDescriptor, int], bytes] = fileio.read
_close: Callable[[FileDescriptor], None] = fileio.close

We do of course have to slightly adjust FileReader, too, but the changes are very small. Putting it all together, we get:

from typing import Self
@dataclass
class FileReader:
    _fd: FileDescriptor
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(_open(path))
    def read(self, length: int) -> bytes:
        return _read(self._fd, length)
    def close(self) -> None:
        _close(self._fd)

Note that the main technique here is not necessarily using NewType specifically, but rather aligning an instance’s property of “has all attributes set” as closely as possible with an instance’s property of “fully valid instance of its class”; NewType is just a handy tool to enforce any necessary constraints on the places where you need to use a primitive type like int, str or bytes.

In Summary - The New Best Practice

From now on, when you’re defining a new Python class:

Make it a dataclass².
Use its default __init__ method³.
Add @classmethods to provide your users convenient and discoverable ways to build your objects.
Require that all dependencies be satisfied by attributes, so you always start with a valid object.
Use typing.NewType to enforce any constraints on primitive data types (like int and str) which might have magical external attributes, like needing to come from a particular library, needing to be random, and so on.

If you define all your classes this way, you will get all the benefits of a custom __init__ method:

All consumers of your data structures will receive valid objects, because an object with all its attributes populated correctly is inherently valid.
Users of your library will be presented with convenient ways to create your objects that do as much work as is necessary to make them easy to use, and they can discover these just by looking at the methods on your class itself.

Along with some nice new benefits:

You will be future-proofed against new requirements for different ways that users may need to construct your object.
If there are already multiple ways to instantiate your class, you can now give each of them a meaningful name; no need to have monstrosities like def __init__(self, maybe_a_filename: int | str | None = None):
Your test suite can always construct an object by satisfying all its dependencies; no need to monkey-patch anything when you can always call the type and never do any I/O or generate any side effects.

Before dataclasses, it was always a bit weird that such a basic feature of the Python language — giving data to a data structure to make it valid — required overriding a method with 4 underscores in its name. __init__ stuck out like a sore thumb. Other such methods like __add__ or even __repr__ were inherently customizing esoteric attributes of classes.

For many years now, that historical language wart has been resolved. @dataclass, @classmethod, and NewType give you everything you need to build classes which are convenient, idiomatic, flexible, testable, and robust.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like “but what is a ‘class’, really?”.

If you aren’t already familiar, a “file descriptor” is an integer which has meaning only within your program; you tell the operating system to open a file, it says “I have opened file 7 for you”, and then whenever you refer to “7” it is that file, until you close(7). ↩
Or an attrs class, if you’re nasty. ↩
Unless you have a really good reason to, of course. Backwards compatibility, or compatibility with another library, might be good reasons to do that. Or certain types of data-consistency validation which cannot be expressed within the type system. The most common example of these would be a class that requires consistency between two different fields, such as a “range” object where start must always be less than end. There are always exceptions to these types of rules. Still, it’s pretty much never a good idea to do any I/O in __init__, and nearly all of the remaining stuff that may sometimes be a good idea in edge-cases can be achieved with a __post_init__ rather than writing a literal __init__. ↩

Stop Writing `__init__` Methods

The History

The Problems

The Solutions

Using dataclass attributes to create an __init__ for you

Using classmethod factories to create objects

Using NewType to address object validity

In Summary - The New Best Practice

Acknowledgments

Stop Writing `init` Methods

Using `dataclass` attributes to create an `init` for you

Using `classmethod` factories to create objects

Using `NewType` to address object validity