Is there a place for non-@dataclass
classes in Python any more?
I have previously — and somewhat famously — written
favorably about @dataclass
’s venerable progenitor,
attrs, and how you should use it for pretty
much everything.
At the time, attrs
was an additional dependency, a piece of technology that
you could bolt on to your Python stack to make your particular code better.
While I advocated for it strongly, there are all the usual implicit reasons
against using a new thing. It was an additional dependency, it might not
interoperate with other convenience mechanisms for type declarations that you
were already using (i.e. NamedTuple
), it might look weird to other Python
programmers familiar with existing tools, and so on. I don’t think that any of
these were good counterpoints, but there was nevertheless a robust discussion
to be had in addressing them all.
But for many years now, dataclasses have been — and currently are — built in to the language. They are increasingly integrated to the toolchain at a deep level that is difficult for application code — or even other specialized tools — to replicate. Everybody knows what they are. Few or none of those reasons apply any longer.
For example, classes defined with @dataclass
are now optimized as a C
structure might be when you compile them with
mypyc
, a trick that is extremely
useful in some circumstances, which even attrs itself now has trouble keeping
up with.
This all raises the question for me: beyond backwards compatibility, is there
any point to having non-@dataclass
classes any more? Is there any
remaining justification for writing them in new code?
Consider my original example, translated from attrs to dataclasses. First, the non-dataclass version:
1 2 3 4 5 |
|
And now the dataclass one:
1 2 3 4 5 6 7 |
|
Many of my original points still stand. It’s still less repetitive. In fewer
characters, we’ve expressed considerably more information, and we get more
functionality (repr
, sorting, hashing, etc). There doesn’t seem to be much
of a downside besides the strictness of the types, and if typing.Any
were a
builtin, x: any
would be fine for those who don’t want to unduly constrain
their code.
The one real downside of the latter over the former right now is the need for an import. Which, at this point, just seems… confusing? Wouldn’t it be nicer to be able to just write this:
1 2 3 4 |
|
and not need to faff around with decorator semantics and fudging the difference
between Mypy (or Pyright or Pyre) type-check-time and Mypyc or Cython compile
time? Or even better, to not need to explain the complexity of all these
weird little distinctions to new learners of Python, and to have to cover
import
before class
?
These tools all already treat the @dataclass
decorator as a totally special
language construct, not really like a decorator at all, so to really explore it
you have to explain a special case and then a special case of a special case.
The extension hook for this special case of the special
case notwithstanding.
If we didn’t want any new syntax, we would need a from __future__ import
dataclassification
or some such for a while, but this doesn’t seem like an
impossible bar to clear.
There are still some folks who don’t like type annotations at
all,
and there’s still the possibility of awkward implicit changes in meaning when
transplanting code from a place with dataclassification
enabled to one
without, so perhaps an entirely new unambiguous syntax could be provided. One
that more closely mirrors the meaning of parentheses in def
, moving
inheritance (a feature which, whether you like it or not, is clearly far less
central to class definitions than ‘what fields do I have’) off to its own part
of the syntax:
1 2 3 |
|
which, for the “I don’t like types” contingent, could reduce to this in the minimal case:
1 2 |
|
Just thinking pedagogically, I find it super compelling to imagine moving from
teaching def foo(x, y, z):...
to data Foo(x, y, z):...
as opposed to
@dataclass class Foo: x: int...
.
I don’t have any desire for semantic changes to accompany this, just to make it
possible for newcomers to ignore the circuitous historical route of the
@dataclass
syntax and get straight into defining their own types with legible
repr
s from the very beginning of their Python journey.
(And make it possible for me to skip a couple of lines of boilerplate in short examples, as a bonus.)
I’m curious to know what y’all think, though. Shoot me an email or a toot and let me know.
In particular:
- Do you think there’s some reason I’m missing why Python’s current method for defining classes via a bunch of dunder methods is still better than dataclasses, or should stick around into the future for reasons beyond “compatibility”?
- Do you think “compatibility” is sufficient reason to keep the syntax the way it is forever, and I’m underestimating the cost of adding a keyword like this?
- If you do think that a change should be made, would you prefer:
- changing the meaning of
class
itself via a__future__
import, - a new
data
keyword like the one I’ve proposed, - a new keyword that functions exactly like the one I have proposed but
really want to bikeshed the word
data
a bunch, - something more incremental like just putting
dataclass
andfield
in builtins, - or an option I haven’t even contemplated here?
- changing the meaning of
If I find I’m not alone in this perhaps I will wander over to the Python discussion boards to have a more substantive conversation...
Thank you to my patrons who are helping me while I try to turn… whatever this is… along with open source maintenance and application development, into a real job. Do you want to see me pursue ideas like this one further? If so, you can support my work as a sponsor!