Have you ever written some Python code that looks like this?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
That is to say, have you written code that:
- defined an enum with several members
- associated custom behavior, or custom values, with each member of that enum,
- needed one or more
match
/case
statements (or, if you’ve been programming in Python for more than a few weeks, probably a bigif
/elif
/elif
/else
tree) to do that association?
In this post, I’d like to submit that this is an antipattern; let’s call it the “passive enum” antipattern.
For those of you having a generally positive experience organizing your discrete values with enums, it may seem odd to call this an “antipattern”, so let me first make something clear: the path to a passive enum is going in the correct direction.
Typically - particularly in legacy code that predates Python 3.4 - one begins
with a value that is a bare int
constant, or maybe a str
with some
associated values sitting beside in a few global dict
s.
Starting from there, collecting all of your values into an enum at all is a great first step. Having an explicit listing of all valid values and verifying against them is great.
But, it is a mistake to stop there. There are problems with passive enums, too:
- The behavior can be defined somewhere far away from the data, making it difficult to:
- maintain an inventory of everywhere it’s used,
- update all the consumers of the data when the list of enum values changes, and
- learn about the different usages as a consumer of the API
- Logic may be defined procedurally (via
if
/elif
ormatch
) or declaratively (via e.g. adict
whose keys are your enum and whose values are the required associated value).- If it’s defined procedurally, it can be difficult to build tools to interrogate it, because you need to parse the AST of your Python program. So it can be difficult to build interactive tools that look at the associated data without just calling the relevant functions.
- If it’s defined declaratively, it can be difficult for existing tools
that do know how to interrogate ASTs (mypy, flake8, Pyright, ruff,
et. al.) to make meaningful assertions about it. Does your linter know
how to check that a
dict
whose keys should be every value of your enum is complete?
To refactor this, I would propose a further step towards organizing one’s enum-oriented code: the active enum.
An active enum is one which contains all the logic associated with the first-party provider of the enum itself.
You may recognize this as a more generalized restatement of the object-oriented lens on the principle of “separation of concerns”. The responsibilities of a class ought to be implemented as methods on that class, so that you can send messages to that class via method calls, and it’s up to the class internally to implement things. Enums are no different.
More specifically, you might notice it as a riff on the Active Nothing pattern described in this excellent talk by Sandi Metz, and, yeah, it’s the same thing.
The first refactoring that we can make is, thus, to mechanically move the
method from an external function living anywhere, to a method on SomeNumber
.
At least like this, we present an API to consumers externally that shows that
SomeNumber
has a behavior
method that can be invoked.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
However, this still leaves us with a match
statement that repeats all the
values that we just defined, with no particular guarantee of completeness. To
continue the refactoring, what we can do is change the value of the enum
itself into a simple dataclass to structurally, by definition, contain all the
fields we need:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Here, we give SomeNumber
members a value of NumberValue
, a dataclass that
requires a result: int
and an effect: Callable
to be constructed. Mypy
will properly notice that if x
is a SomeNumber
, that x
will have the type
NumberValue
and we will get proper type checking on its result
(a static
value) and effect
(some associated behaviors)1.
Note that the implementation of behavior
method - still conveniently
discoverable for callers, and with its signature unchanged - is now vastly
simpler.
But what about...
Lookups?
You may be noticing that I have hand-waved over something important to many
enum
users, which is to say, by-value lookup. enum.auto
will have
generated int values for one
, two
, and three
already, and by transforming
those into NumberValue
instances, I can no longer do SomeNumber(1)
.
For the simple, string-enum case, one where you might do class MyEnum: value =
“value”
so that you can do name lookups via MyEnum("value")
, there’s a
simple solution: use square brackets instead of round ones. In this case, with
no matching strings in sight, SomeNumber["one"]
still works.
But, if we want to do integer lookups with our dataclass version here, there’s a simple one-liner that will get them back for you; and, moreover, will let you do lookups on whatever attribute you want:
1 |
|
enum.Flag
?
You can do this with Flag
more or less unchanged, but in the same way that
you can’t expect all your list[T]
behaviors to be defined on T
, the lack of
a 1-to-1 correspondence between Flag
instances and their values makes it more
complex and out of scope for this pattern specifically.
3rd-party usage?
Sometimes an enum is defined in library L and used in application A, where L provides the data and A provides the behavior. If this is the case, then some amount of version shear is unavoidable; this is a situation where the data and behavior have different vendors, and this means that other means of abstraction are required to keep them in sync. Object-oriented modeling methods are for consolidating the responsibility for maintenance within a single vendor’s scope of responsibility. Once you’re not responsible for the entire model, you can’t do the modeling over all of it, and that is perfectly normal and to be expected.
The goal of the Active Enum pattern is to avoid creating the additional complexity of that shear when it does not serve a purpose, not to ban it entirely.
A Case Study
I was inspired to make this post by a recent refactoring I did from a more obscure and magical2 version of this pattern into the version that I am presenting here, but if I am going to call passive enums an “antipattern” I feel like it behooves me to point at an example outside of my own solo work.
So, for a more realistic example, let’s consider a package that all Python
developers will recognize from their day-to-day work,
python-hearthstone
, the
Python library for parsing the data files associated with Blizzard’s popular
computerized collectible card game
Hearthstone.
As I’m sure you already know, there are a lot of enums in this library, but
for one small case study, let’s look a few of the methods in
hearthstone.enums.GameType
.
GameType
has already taken the “step 1” in the direction of an active enum,
as I described above: as_bnet
is an instancemethod on GameType
itself,
making it at least easy to see by looking at the class definition what
operations it supports. However, in the implementation of that
method
(among many others) we can see the worst of both worlds:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
We have procedural code mixed with a data lookup table; raise ValueError
mixed together with value returns. Overall, it looks like this might be hard
to maintain this going forward, or to see what’s going on without a
comprehensive understanding of the game being modeled. Of course for most
python programmers that understanding can be assumed, but, still.
If GameType
were refactored in the manner above3, you’d be able to look at the
member definition for GT_RANKED
and see a mapping of FormatType
to
BnetGameType
, or GT_BATTLEGROUNDS_DUO_FRIENDLY
to see an unconditional
value of BGT_BATTLEGROUNDS_DUO_FRIENDLY
. Given that this enum has 40
elements, with several renamed or removed, it seems reasonable to expect that
more will be added and removed as the game is developed.
Conclusion
If you have large enums that change over time, consider placing the responsibility for the behavior of the values alongside the values directly, and any logic for processing the values as methods of the enum. This will allow you to quickly validate that you have full coverage of any data that is required among all the different members of the enum, and it will allow API clients a convenient surface to discover the capabilities associated with that enum.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!
-
You can get even fancier than this, defining a
typing.Protocol
as your enum’s value, but it’s best to keep things simple and use a very simpledataclass
container if you can. ↩ -
derogatory ↩
-
I did not submit such a refactoring as a PR before writing this post because I don’t have full context for this library and I do not want to harass the maintainers or burden them with extra changes just to make a rhetorical point. If you do want to try that yourself, please file a bug first and clearly explain how you think it would benefit their project’s maintainability, and make sure that such a PR would be welcome. ↩