“Composition is better than inheritance.”.
This is a true statement. “Inheritance is bad.” Also true. I’m a well-known
compositional extremist. There’s a
great
talk you
can watch if I haven’t talked your ear off about it already.
Which is why I was extremely surprised in a recent conversation when my
interlocutor said that while inheritance might be bad, composition is worse.
Once I understood what they meant by “composition”, I was even more surprised
to find that I agreed with this assertion.
Although inheritance is bad, it’s very important to understand why. In a
high-level language like Python, with first-class runtime datatypes (i.e.: user
defined classes that are objects), the computational difference between what
we call “composition” and what we call “inheritance” is a matter of where we
put a pointer: is it on a type or on an instance? The important distinction
has to do with human factors.
First, a brief parable about real-life inheritance.
You find yourself in conversation with an indolent heiress-in-waiting. She
complains of her boredom whiling away the time until the dowager countess
finally leaves her her fortune.
“Inheritance is bad”, you opine. “It’s better to make your own way in life”.
“By George, you’re right!” she exclaims. You weren’t expecting such an
enthusiastic reversal.
“Well,”, you sputter, “glad to see you are turning over a new leaf”.
She crosses the room to open a sturdy mahogany armoire, and draws forth a belt
holstering a pistol and a menacing-looking sabre.
“Auntie has only the dwindling remnants of a legacy fortune. The real money
has always been with my sister’s manufacturing concern. Why passively wait
for Auntie to die, when I can murder my dear sister now, and take what is
rightfully mine!”
Cinching the belt around her waist, she strides from the room animated and full
of purpose, no longer indolent or in-waiting, but you feel less than satisfied
with your advice.
It is, after all, important to understand what the problem with inheritance
is.
The primary reason inheritance is bad is confusion between namespaces.
The most important role of code organization (division of code into files,
modules, packages, subroutines, data structures, etc) is division of
responsibility. In other
words, Conway’s Law isn’t just an
unfortunate accident of budgeting, but a fundamental property of software
design.
For example, if we have a function called multiply(a, b)
- its presence in
our codebase suggests that if someone were to want to multiply two numbers
together, it is multiply
’s responsibility to know how to do so. If there’s
a problem with multiplication, it’s the maintainers of multiply
who need to
go fix it.
And, with this responsibility comes authority over a specific scope within the
code. So if we were to look at an implementation of multiply
:
| def multiply(a, b):
product = a * b
return product
|
The maintainers of multiply
get to decide what product
means in the context
of their function. It’s possible, in Python, for some other funciton to
reach into multiply
with frame objects and mangle the meaning of product
between its assignment and return
, but it’s generally understood that it’s
none of your business what product
is, and if you touch it, all bets are
off about the correctness of multiply
. More importantly, if the maintainers
of multiply wanted to bind other names, or change around existing names, like
so, in a subsequent version:
| def multiply(a, b):
factor1 = a
factor2 = b
result = a * b
return result
|
It is the maintainer of multiply
’s job, not the caller of multiply
, to make
those decisions.
The same programmer may, at different times, be both a caller and a maintainer
of multiply
. However, they have to know which hat they’re wearing at any
given time, so that they can know which stuff they’re still repsonsible for
when they hand over multiply
to be maintained by a different team.
It’s important to be able to forget about the internals of the local
variables in the functions you call. Otherwise, abstractions give us no power:
if you have to know the internals of everything you’re using, you can never
build much beyond what’s already there, because you’ll be spending all your
time trying to understand all the layers below it.
Classes complicate this process of forgetting somewhat. Properties of class
instances “stick out”, and are visible to the callers. This can be powerful —
and can be a great way to represent shared data structures — but this is
exactly why we have the ._
convention in Python: if something starts with an
underscore, and it’s not in a namespace you own, you shouldn’t mess with it.
So: other._foo
is not for you to touch, unless you’re maintaining
type(other)
. self._foo
is where you should put your own private state.
So if we have a class like this:
| class A(object):
def __init__(self):
self._note = "a note"
|
we all know that A()._note
is off-limits.
But then what happens here?
| class B(A):
def __init__(self):
super().__init__()
self._note = "private state for B()"
|
B()._note
is also off limits for everyone but B
, except... as it turns out,
B
doesn’t really own the namespace of self
here, so it’s clashing with what
A
wants _note
to mean. Even if, right now, we were to change it to
_note2
, the maintainer of A
could, in any future release of A
, add a new
_note2
variable which conflicts with something B
is using. A
’s
maintainers (rightfully) think they own self
, B
’s maintainers (reasonably)
think that they do. This could continue all the way until we get to _note7
,
at which point it would explode violently.
So that’s why Inheritance is bad. It’s a bad way for two layers of a system to
communicate because it leaves each layer nowhere to put its internal state that
the other doesn’t need to know about. So what could be worse?
Let’s say we’ve convinced our junior programmer who wrote A
that inheritance
is a bad interface, and they should instead use the panacea that cures all
inherited ills, composition. Great! Let’s just write a B
that composes
in an A
in a nice clean way, instead of doing any gross inheritance:
| class Bprime(object):
def __init__(self, a):
for var in dir(a):
setattr(self, var, getattr(a, var))
|
Uh oh. Looks like composition is worse than inheritance.
Let’s enumerate some of the issues with this “solution” to the problem of
inheritance:
- How do we know what attributes
Bprime
has?
- How do we even know what type
a
is?
- How is anyone ever going to
grep
for relevant methods in this code and have
them come up in the right place?
We briefly reclaimed self
for Bprime
by removing the inheritance from A
,
but what Bprime
does in __init__
to replace it is much worse. At least
with normal, “vertical” inheritance, IDEs and code inspection tools can have
some idea where your parents are and what methods they declare. We have to
look aside to know what’s there, but at least it’s clear from the code’s
structure where exactly we have to look aside to.
When faced with a class like Bprime
though, what does one do? It’s just
shredding apart some apparently totally unrelated object, there’s nearly no way
for tooling to inspect this code to the point that they know where
self.<something>
comes from in a method defined on Bprime
.
The goal of replacing inheritance with composition is to make it clear and
easy to understand what code owns each attribute on self
. Sometimes that
clarity comes at the expense of a few extra keystrokes; an __init__
that
copies over a few specific attributes, or a method that does nothing but
forward a message, like def something(self): return self.other.something()
.
Automatic composition is just lateral inheritance. Magically auto-proxying all
methods, or auto-copying all attributes, saves a few keystrokes at the time
some new code is created at the expense of hours of debugging when it is being
maintained.
If readability counts, we should
never privilege the writer over the reader.