Let’s say you’re writing a Python library.
In this library, you have some collection of state that represents “options” or “configuration” for a bunch of operations. Such a set of options is a bundle of potentially ever-increasing complexity. Thus, you will want it to have an extremely minimal compatibility surface, with a very carefully chosen public interface, that is either small, or perhaps nothing at all. Such an object conveys state and might have some private behavior, but all you want consumers to be able to do is build it in very constrained, specific ways, and then pass it along as a parameter to your own APIs.
By way of example, imagine that you’re wrapping a library that handles shipping physical packages.
There are a zillion ways to do it ship a package. There are different carriers who can ship it for you. There’s air freight, and ground freight, and sea freight. There’s overnight shipping. There’s the option to require a signature. There’s package tracking and certified mail. Suffice it to say, lots of stuff.
If you are starting out to implement such a library, you might need an object
called something like ShippingOptions that encapsulates some of this. At the
core of your library you might have a function like this:
1 2 3 4 5 | |
If you are starting out implementing such a library, you know that you’re
going to get the initial implementation of ShippingOptions wrong; or, at the
very least, if not “wrong”, then “incomplete”. You should not want to commit
to an expansive public API with a ton of different attributes until you really
understand the problem domain pretty well.
Yet, ShippingOptions is absolutely vital to the rest of your library. You’ll
need to construct it and pass it to various methods like estimateShippingCost
and shipPackage. So you’re not going to want a ton of complexity and churn
as you evolve it to be more complex.
Worse yet, this object has to hold a ton of state. It’s got attributes, maybe even quite complex internal attributes that relate to different shipping services.
Right now, today, you need to add something so you can have “no rush”, “standard” and “expedited” options. You can’t just put off implementing that indefinitely until you can come up with the perfect shape. What to do?
The tool you want here is the opaque data type design pattern. C is lousy
with such things (FILE, pthread_*_t, fd_set, etc). A typedef in a
header file can easily achieve this.
But in Python, if you expose a dataclass — or any class, really — even if
you keep all your fields private, the constructor is still, inherently,
public. You can make it raise an exception or something, but your type checker
still won’t help your users; it’ll still look like it’s a normal class.
Luckily, Python typing provides a tool for this:
typing.NewType.
Let’s review our requirements:
- We need a type that our client code can use in its type annotations; it needs to be public.
- They need to be able to consruct it somehow, even if they shouldn’t be able to see its attributes or its internal constructor arguments.
- To express high-level things (like “ship fast”) that should stay supported as we add more nuanced and complex configurations in the future (like “ship with the fastest possible option provided by the lowest-cost carrier that supports signature verification”).
In order to solve these problems respectively, we will use:
- a public
NewType, which gives us our public name... - which wraps a private class with entirely private attributes, to give us an actual data structure, while not exposing the constructor,
- a set of public constructor functions, which returns our
NewType.
When we put that all together, it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
As a snapshot in time, this is not all that interesting; we could have just
exposed _RealShipOpts as a public class and saved ourselves some time. The
fact that this exposes a constructor that takes a string is not a big deal for
the present moment. For an initial quick and dirty implementation, we can just
do checks like if options._speed == "fast" in our shipping and estimation
code.
However, the main thing we are doing here is preserving our flexibility to evolve the related APIs into the future, so let’s see how we might do that. For example, let’s allow the shipping options to contain a concrete and specific carrier and freight method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
As a NewType, our public ShippingOptions type doesn’t have a constructor.
Since _RealShipOpts is private, and all its attributes are private, we can
completely remove the old versions.
Anything within our shipping library can still access the private variables
on ShippingOptions; as a NewType, it’s the same type as its base at
runtime, so it presents minimal1 overhead.
Clients outside our shipping library can still call all of our public
constructors: shipFast, shipNormal, and shipSlow all still work with the
same (as far as calling code knows) signature and behavior.
If you need to build and convey some state within your public API, while avoiding breakages associated with compatibility churn, hopefully this technique can help you do that!
Acknowledgments
Thanks for reading, and thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor.
-
The overhead is minimal, but it is not completely zero. The suggested idiom for converting to a
NewTypeis to call it like a function, as I’ve done in these examples, but if you are wanting to use this pattern inside of a hot loop, you can use# type: ignore[return-value]comments to avoid that small cost. ↩



