The xUnit Paradox

Billions of years ago, at the dawn of the digital age, Kent Beck invented the idea of testing computer programs and wrote a paper about it called "Simple Smalltalk Testing: With Patterns".  In it, he expressed an API which is now colloquially known as "xUnit", because there are implementations for various languages called xunit, where x is a reference to the language of implementation: for example, Java's "JUnit", Python's "PyUnit", and the C++ "CppUnit".

Many test-driven developers who use an xUnit implementation eventually become test-framework developers.  After some time using an xUnit implementation, they realize that it lacks certain features which would make it more useful, and then writing their own testing tool.  Such attempts, in Python, include:
  • Twisted's own trial tool, for whose genesis I am to blame.1
  • py.test
  • TestOOB
  • nose
  • doctest
Unfortunately, many of these test frameworks are written only looking at the problems with xUnit, and not realizing its benefits.

There are two things that every potential test framework developer needs to know about xUnit.
The Toast Paradox
The xUnit API is great.

You need a structured, well-specified way to manipulate tests.  Maybe you don't realize it yet, but when you have a test suite with a hundred thousand tests, you will want to selectively run parts of it.  You will want to be able to arbitrarily group tests differently, change their grouping, ordering, and runtime environment.  Most importantly you'll want to do all this in a program, not necessarily with a GUI or command-line tool.  To write that program, you need a coherent, well-designed, stable API for interacting with tests as first-class objects.

The biggest thing that xUnit does right is that it exists.  It is a structured API for interacting with tests as first class objects.  Many attempts to implement specific features end up architecturally breaking or ignoring this API, or adding extra, implicit stuff, which will "just work" if you use the particular TestCase class that came with your tool of choice, but break if you try to customize it too heavily or start overriding internal methods.

xUnit also factors some important responsibilities away from each other.  A test case is different from a test result; a test suite contains multiple tests.  The magic that is often associated with the 'testXXX' naming convention aside, a single test case object represents a single test, and multiple runs of the same tests must create multiple test objects.  These might seem obvious, but they are all important insights which must be preserved: and problems occur when they don't seem quite so obvious.  As James Newkirk said on his blog a few years ago: "I think one of the biggest screw-ups that was made when we wrote NUnit V2.0 was to not create a new instance of the test fixture class for each contained test method."  Trial had this screw-up as well, and while it has been fixed, the '-u' option will still re-use a TestCase object to run its own method again for the second invocation.

Sadly, xUnit is not the solution to all of your problems.  There is something else you need to know about it.

The xUnit API is terrible.

The xUnit API is missing a lot of features that it really needs to be a generally useful test-manipulation layer.  It is because of these deficiencies that it is constantly being re-invented or worked around.  If it did some more of these things right, then the ad-hoc extensions which don't conceptually work with it properly wouldn't keep springing up.

The main problem with xUnit that is not simply a missing feature is that it uses the "composite" pattern, rather than the "visitor" pattern, to implement test suites.  If you want to discover what test cases that a test suite contains, you are supposed to call 'run' and it will run them for you.  It is impossible to programmatically decompose a suite without stepping outside of the official xUnit API.

For example, let's say you wanted to have a "runner" object, which would selectively identify tests to be run in different subprocesses.  Without getting involved in the particular implementation details of an xUnit implementation, there's no way to figure out how many tests are going to be run when you have discovered and invoked a TestCase; you just call 'run' and hope for the best.  Of course, just about every actual xUnit implementation cheats a little bit in order to do things like generate progress bars to figure out how far done with the test run you are; but it's always possible - and intentionally supported, even - to generate tests on the fly within 'run' and run them.

xUnit has no intermediary representation of a test result.  A vanilla xUnit API assumes that a result is the same thing as a reporter: in other words, when you call a method on the result, it reports it to the user immediately.  This absence of a defined way to manipulate test results for later display means that you have to take over the running of the tests if you want to do something interesting with the report; it's not possible in a strict xUnit implementation to cooperatively define two separate tools which analyze or report data about the same test run.

The use of the composite pattern is linked to the lack of an intermediary "test discovery" step.  In Beck's original framework, you have to manually construct your suites out of individual test case objects and call a top-level function that then calls them all, although most xUnit implementations accepted as "standard" these days will provide some level of convenience functionality there.  For example, all the implementations I'm aware of will introspect for methods that begin with "test" and automatically create suites that contain one test case instance per method.

xUnit has no notion of cooperating hooks.  If you want to provide some common steps for setUp and tearDown in both library A and library B, you have no recourse but to build complicated diamond inheritance structures around "a.testsupport.TestCase" and "b.testsupport.TestCase", or give up and manually call functions from setUp and tearDown.  This is where Trial has gotten into the most trouble, because it sets up and tears down Twisted-specific state as part of the tool rather than allowing test cases to independently set it up.

The Alternative Is Worse

Those who have not learned from the benefits that xUnit provides, however, will be doomed to repeat its mistakes, and often make even more.  The alternative - let's call it "AdHocUnit" - is very popular.  It's to start glomming features into random parts of the test manipulation API to support specific use-cases without attention to the overall design of the system.

Twisted's Trial, the Zope Test Runner, Nose, and I'm sure quite a few other testing tools in Python all do this, and the result is a situation where the concepts are all basically compatible, but you can't write a test that uses features from two of these systems at once - and if you're a person (as there are many such people) who write code that relies heavily on both Twisted and Zope, this can be painful.

You can definitely tell when you've got an AdHocUnit implementation when your APIs are internally throwing around strings that represent file names, module names, and method names, without reference to test objects; when you've got global, non-optional start and stop steps (not in setUp or tearDown methods) which twiddle various pieces of framework state, or when you've started coupling the semantics of 'run' to some ad-hoc internal lists of tests or suites.  You can tell you have an AdHocUnit implementation when your test discovery is convenient, but hard-coded to a particular idiom of paths and filenames.  Most of all, you can tell you've got an AdHocUnit implementation when you've got some things called "TestCase" objects which claim to be xUnit objects that subclass from a TestCase class but mysteriously cannot be run by another xUnit framework in the same language.

What To Do Now

The object-oriented programming community needs a better API which is as high-level and generic as xUnit.  Anyone looking to "fix" xUnit should be careful to create a well-defined structure for tests and document the API and the reasons for every decision in it.  It's interesting to note that the (very successful) Beck Testing Framework was originally presented as a paper, not as an open source project.

It might seem like testing APIs don't require this kind of rigor.  They seem deceptively simple to design: at first, all you need to do is run this bit of code, then that bit of code, and make sure they both work.  For a while all you're doing is piling on more and more tests; making sure they all work.

Then, one day, you want to start doing things with all these tests.  You want to know how long they take to run, how much disk space they use, how many of them call a certain API.  You want to run them in different orders to make sure that they are in fact isolated, and don't interact with each other.  You'll find out, as I did, that huge numbers of your tests are badly written because your first ad-hoc attempt at the framework was wrong.

Unlike most frameworks, you can't gradually evolve them and depend on your tests to verify their behavior, because by changing the testing framework, you might have changed the meaning of the tests themselves.  Having a well-tested test framework can help, of course, but while you can test the test framework, you won't be testing your tests.

Of course, everything is possible in this world of infinite possibilities.  Test frameworks can, and do, evolve; but the process is slower, and more painful than other kinds of evolution.  So, when you're looking to write your own conveniences for testing, don't throw the baby out with the bathwater: keep what your xUnit implementation does well, retain compatibility with it, and build upon it.

Acknowledgments

I'd like to thank Jonathan Lange, who encouraged me to consider the benefits of xUnit in the first place, and the Massachusetts Bay Transportation Authority, without whose constant delays and breakdowns I wouldn't have had time to have written this at all.



1:  I didn't really start 'trial', but a few nasty hacks in the redistribution of pyunit.  In my defense, it predated 'unittest' as a standard-library module.  Jonathan Lange, its current maintainer, was the one who made it an independent tool.  Thanks to him, it is now actually compatible to a large extent with the standard 'unittest' module.

Thank You, Microsoft!

It looks like the ever-popular Vista may be the impetus which drives game developers towards open platforms.  Jonathan Blow, the author of the long-delayed, very original, and kind of quirky indie game "Braid" has decided that,thanks to Vista being so annoying, he is going to develop and release Braid on Ubuntu.

Until now, only relatively unknown developers such as Id Software have bothered to release or develop games on the Linux platform, but now that the more popular indie developers are starting to look at it, the commercial mainstream might not be far behind.  Hooray!

There is a flash of light! Your PYTHON has evolved into PSYDUCK!

I guess if you don't read any other python-related blogs (or news, or mailing lists, for that matter) you might not already know that the first alpha of Python 3.0 has been released.

A sample of the release notes:
  • There are a few memory leaks
  • SSL support is disabled
  • Platform support is reduced
  • There may be additional issues on 64-bit architectures
  • There are still some open issues on Windows
  • Some new features are very fresh, and probably contain bugs
  • IDLE still has some open issues
In other words, this is not a product for general consumption, and is not labeled as such.  Do not use it expecting to be able to get real work done with it.  This release is to help the Python development team find bugs and get feedback from the community.

I've already blogged about my inability to get excited about Python 3.0.  Now that it's begun to arrive, I can more clearly see the scope of the work required to get in sync with it, and the new features that it is actually going to encompass.  It's a staggering amount of work.

Jean-Paul Calderone has prepared a preliminary run of the 2to3 tool over the Twisted codebase.  This includes a 820 kilobyte diff, which took 12 minutes to produce (on a fairly fast, modern piece of hardware).  However, this is not even a complete run, because the 2to3 tool cannot even parse two of the files in the repository, despite the fact that they are all valid python.  Many of the transformations (especially in the area of the unicode/str conversion) are almost certainly going to drastically change the semantics our test suite, if not break it - although enough of Twisted's dependencies are missing on 3.0 that I haven't even had an opportunity to try.

I still hold out hope that the 3.0 branch will gradually be abandoned, as these changes are rolled back into older versions (2.6+) of Python and gradually phased in, with their deprecated alternatives being gradually phased out.  Right now, though, the plan is to continue parallel development in the 2.x series until 3.0 is ready to "take over", although I'm not sure how that determination will be made.

While I wish I could be more excited about something in the 3.0 roadmap, it worries me that some of the excitement I see from others is enthusiasm for the idea of using it as an excuse to break their own users' software too.  If you've written a library for Python, please consider that its users are going to be having a hard enough time upgrading from python 2.x to 3.x; you should really try to provide the smoothest migration path from here to there, and keep your APIs as compatible as possible.

Although I'd like to say something nice and congratulatory, the thought of spending a year just pushing little piles of syntax into other little piles of syntax, even with the help of a tool like a hypothetically-much-more-advanced 2to3, is honestly just depressing.  I'd have to get Twisted (and Axiom and Nevow and Mantissa and Quotient and a handful of proprietary projects that I work on) to work on Python 3.0 before I can use it.  If I'm going to work on Twisted, I have a lot of other things I'd rather be doing.  So my plan, for the moment, is to ignore Python 3.0 as long as I possibly can.

But hey, that's what the magic of open source is supposed to be about, right?  Do you like Python 3.0?  Do you want to see Twisted (and the various Divmod projects) eventually support it?  A fairly substantial portion of the diff in question is a litany of non-controversial stylistic changes to update old, and sometimes creaky parts of the codebase.  For example, there are a bunch of usages of the 'print' statement that need to be transformed; you might consider submitting a patch which simply removes all usages of "print", since we probably shouldn't be using that syntax anyway.  That will reduce the size of the changes that we need to consider, generate, and apply in order to be 3.0 compliant. and probably improve the code's cleanliness quite a bit.  Once those parts are applied, and thereby removed from the output of 2to3, we can have a better view of what work is actually necessary to support 2.x and 3.0 versions simultaneously from the same codebase.

Of course, comprehensive test coverage is something else frequently brought up when talking about the 2-to-3 migration.  Any patches which increase or improve our test coverage will alway be greatly appreciated regardless of any migration issues.

The one prospect that appeals to me is that, if 3.0 successfully adheres to the vague promise of breaking backwards compatibility "just this once", Python may move towards being a real platform instead of simply a tool that you run on another platform.  Of course, backwards incompatible changes can be a bit like potato chips - "you can't eat just one" - but I trust that the Python team can stick to it if they see the value in it and have made the incompatible changes they think are significant and necessary.

A few days ago I ran Neverwinter Nights on my Ubuntu (feisty) machine and was pleased to discover that despite the fact that there are new versions of SDL, X11, the Linux kernel, GNOME, ALSA, and a dozen other dependencies (all of which are dynamically bound) since five years ago when it was written, it still runs beautifully, with no configuration or re-installation on my part.  Getting that kind of reliability from Python, and being able to provide it for Twisted, would be worth a fair amount of time spent overhauling syntax.

Pondering Python Path Programming Problems

Most Python programmers are at least vaguely aware of sys.path, PYTHONPATH, and the effect they have on importing modules.  However, there's a lot of confusion about how to use them properly, and how powerful these concepts can be if you know how to apply them.  Twisted - and in particular the plugin system - make very nuanced use of the python path, which can sometimes make things that use them a bit hard to explain, since there isn't a well-defined common terminology or good library support for working with paths, except to the extent that they are used by importers.

This article is really about two things: the general concept of paths, and the Twisted module "twisted.python.modules", which provides some specific implementations of my ideas about the python path.

First of all, why should you care about python paths?  To put it simply, because very bizarre problems can result if you use them incorrectly.  Also, you need to know about them in order to use Twisted's plugin system effectively, and of course you want to use Twisted, right?  :)

What kind of problems?  Even very popular, well-regarded Python packages by very experienced Python programmers sometimes mess this up pretty badly.  Here's a simple example of what can go wrong with a package you probably know of, the Python Imaging Library:
>>> import Image
>>> import PIL.Image
>>> img = PIL.Image.Image()
>>> Image.__file__
'/usr/lib/python2.5/site-packages/PIL/Image.pyc'
>>> PIL.Image.__file__
'/usr/lib/python2.5/site-packages/PIL/Image.pyc'
Here we can see that you can import PIL's "Image" module as either "PIL.Image" or simply "Image".  Both these modules are loaded from the same file.  On the face of it, this is simply a convenience.  But let's dig deeper:
>>> PIL.Image == Image
False
The modules aren't the same object!  This has some nasty practical repercussions:
>>> isinstance(img, Image.Image)
False
For example, Image objects created from one of these PIL modules do not register as instances from the other, even though they're all the same code.  Worse yet, this mistake can become "sticky" if you use them along with a module like pickle, which carries the module and class name into the data:
>>> from cPickle import dumps
>>> img2 = Image.Image()
>>> dumps(img)
"(iPIL.Image\nImage\n ...
>>> dumps(img2)
"(iImage\nImage\n ...
Many Python features and packages depend on matching types.  Zope Interface, for example, will not let you use adapters for one Image type for the other, the objects will not compare equivalent even if they really are, and so on.  And none of this is a bug in the code!  Why does it happen?

PIL is a package; that is, a directory with Python source code and an "__init__.py" in it, named "PIL".  However, it also installs a ".pth" file as part of its installation.  ".pth" files are one way to add entries to your sys.path.  This particular one adds the "PIL" directory to your path, which means it can be loaded from two entries: as a package, from your "site-packages" directory.

This isn't to pick on PIL or the Effbot; I've seen lots of projects which have a "lib" directory with an __init__.py and change its name at installation time, or inconsistently reference subpackages with relative and absolute imports, or do any number of things which are just as bad.  I hope that I've convinced you not to do the same thing with your project, but I won't dwell on the problem here, since I have a solution handy.

Unless you already know what is going on (although I'm sure many of you reading this already do), this can be a bit confusing to figure out.  You can use twisted.python.modules to ask this question rather directly.  Here's how:
>>> from twisted.python.modules import getModule
>>> imageModule = getModule("Image")
>>> pilImageModule = getModule("PIL.Image")
>>> imageModule.pathEntry
PathEntry<FilePath('/usr/lib/python2.5/site-packages/PIL')>
>>> pilImageModule.pathEntry
PathEntry<FilePath('/usr/lib/python2.5/site-packages')>
Here we're asking twisted.python.modules to give us objects that represent metadata about two modules, without actually loading them.  The attribute here is the 'pathEntry' attribute, which tells us what entry on sys.path the module would be loaded from, if it's imported.
>>> import sys
>>> pilImageModule.isLoaded()
False
>>> imageModule.isLoaded()
False
>>> 'PIL.Image' in sys.modules
False
>>> 'Image' in sys.modules
False
Look, no modules!

Of course, if we wanted to load those modules, it's easy enough:
>>> pilImageModule.load()
<module 'PIL.Image' from '/usr/lib/python2.5/site-packages/PIL/Image.pyc'>
>>> imageModule.load()
<module 'Image' from '/usr/lib/python2.5/site-packages/PIL/Image.pyc'>
You can also get lists of modules.  For example, you can see that the list of modules in the "PIL" package is suspiciously similar to the list of top-level modules that comes from the path entry
where the "Image" module was loaded:
>>> pilModule = getModule("PIL")
>>> pprint(list(pilModule.iterModules())[:5])
[PythonModule<'PIL.ArgImagePlugin'>,
 PythonModule<'PIL.BdfFontFile'>,
 PythonModule<'PIL.BmpImagePlugin'>,
 PythonModule<'PIL.BufrStubImagePlugin'>,
 PythonModule<'PIL.ContainerIO'>]
>>> pprint(list(imageModule.pathEntry.iterModules())[:5])
[PythonModule<'ArgImagePlugin'>,
 PythonModule<'BdfFontFile'>,
 PythonModule<'BmpImagePlugin'>,
 PythonModule<'BufrStubImagePlugin'>,
 PythonModule<'ContainerIO'>]
As you might imagine, the ability to list modules and load the ones that seem interesting is a great way to load plugins - and that's exactly how Twisted's plugin system is implemented.  While the plugin system itself is a topic for another post (or perhaps you could just read the documentation) the way it finds plugins is interesting.

For example, let's take a look at the list of Mantissa plugin modules I have installed:
>>> xmplugins = getModule('xmantissa.plugins')
>>> pprint(list(xmplugins.iterModules()))
[PythonModule<'xmantissa.plugins.adminoff'>,
 PythonModule<'xmantissa.plugins.baseoff'>,
 PythonModule<'xmantissa.plugins.free_signup'>,
 PythonModule<'xmantissa.plugins.offerings'>]
This simple query is actually an incomplete list.  It's just the modules that come with Mantissa itself.  Python has a special little-known rule when loading modules from packages, and twisted.python.plugins honors it: if there is a special variable called "__path__" in a package, it is a list of path names to load modules from.  However, twisted.python.plugins doesn't load modules unless you ask it to, so it can't determine the value of that attribute.  As it so happens, twisted.plugins uses the __path__ attribute in order to allow you to keep your development installations separate, so twisted.python.plugins can't determine all the places you might need to look for plugins without some help.  Let's just load that package so we can look at its __path__ attribute:
>>> xmplugins.load()
<module 'xmantissa.plugins' from '/home/glyph/Projects/Divmod/trunk/Mantissa/xmantissa/plugins/__init__.pyc'>
Now that we've loaded it, let's have a look at that list:
>>> pprint(list(xmplugins.iterModules()))
[PythonModule<'xmantissa.plugins.adminoff'>,
 PythonModule<'xmantissa.plugins.baseoff'>,
 PythonModule<'xmantissa.plugins.free_signup'>,
 PythonModule<'xmantissa.plugins.offerings'>,
 PythonModule<'xmantissa.plugins.mailoff'>,
 PythonModule<'xmantissa.plugins.radoff'>,
 PythonModule<'xmantissa.plugins.sineoff'>,
 PythonModule<'xmantissa.plugins.hyperbolaoff'>,
 PythonModule<'xmantissa.plugins.imaginaryoff'>,
 PythonModule<'xmantissa.plugins.blendix_offering'>,
 PythonModule<'xmantissa.plugins.billed_signup'>,
 PythonModule<'xmantissa.plugins.billoff'>,
 PythonModule<'xmantissa.plugins.derivoff'>]

That's my full list of Mantissa plugins, including my super secret Divmod proprietary plugins.

This list is generated because plugins packages use a feature (which was previously kind of a gross hack but will be an officially supported feature of the next version of Twisted) to set their path to every directory with the same name as the plugin package which is not also a package on your python path.  In other words, if you have 2 sys.path entries, a/ and b/, and one package, x.plugins, in b/x/plugins/__init__.py with this trick in it, then if you have a file b/x/plugins/foo.py, it will be considered to contain the module "x.plugins.foo".  This requires that you do not have a file b/x/__init__.py or b/x/plugins/__init__.py.  If you do, this hack will treat the two paths the same way that Python does: duplicate packages in your path, so the package in a/ is loaded and the package in b/ is ignored.

The distinction between packages and path entries is why all the Twisted and Divmod projects conventionally have capitalized directory names but lowercase package names.  "Twisted" is where your path entry should point; "twisted" is the python package that is loaded from that path entry.  "Twisted" should never have an __init__.py in it.  "twisted" always should.  This goes the same for "Axiom" and "axiom", "Mantissa" and (the unfortunately named) "xmantissa".  You will sometimes encounter other examples of this style of naming floating around the web.

When using Twisted and Divmod infrastructure, keeping this distinction is clear is critical, because otherwise it is difficult to develop plugins independently.  You probably don't want to copy your development plugins into your Twisted installation - they're part of your source repository, after all, not ours.  However, keeping the distinction clear in your mind will avoid lots of obscure problems with duplicate classes and naming, so it's generally a good idea even if you don't like our naming conventions.

Please let me know in the comments which parts of this post you found useful, if any.  I know it's a bit rambling, and covers a number of different topics, some of which may be obvious and some of which might be inscrutable.  I've experienced quite a bit of confusion when talking to other python programmers about this stuff, but I'm not sure if it was my awkward explanation of Twisted's plugin system or some inherent issue in Python's path management.

Not Just The Faithful

As I've said before, Microsoft Windows Vista is a terrible disaster which I hope I never have to deal with in any capacity, professional or otherwise.  I suspect that it is inevitable, but I will resist it for as long as possible.

The FSF has a campaign, "BADVISTA", to educate end-users about the ways in which Vista is limiting your freedom more aggressively than any other commercial software product to date.  Unfortunately this can sometimes sound a bit ... overdramatic, even if it is pretty much all true.  For example, a prominently featured quotation:
Windows Vista includes an array of “features” that you don't want. These features will make your computer less reliable and less secure. They'll make your computer less stable and run slower. They will cause technical support problems. They may even require you to upgrade some of your peripheral hardware and existing software. And these features won't do anything useful. In fact, they're working against you.
I recently had the experience of talking to a Regular User in a consumer electronics store about his vista "upgrade".  His "computer guy" had told him that Vista was like XP, but better.  Little did he know that the "better" would mean that the computer ran visibly slower, had reduced functionality, and required the purchase of newer, more expensive hardware.

Of course, I gave him my rant about the other reasons he shouldn't have upgraded, and the poor guy turned white as a sheet.  I don't think he's going to be purchasing any more "upgrades" from his "computer guy".

But, what does the other side have to say about this fancy new operating system?  Surely there are some worthwhile new conveniences that we are trading this freedom for?  Let's see what one ex-Microsoft employee and prominent Windows developer has to say about it:
"I've been using Vista on my home laptop since it shipped, and can say with some conviction that nobody should be using it as their primary operating system -- it simply has no redeeming merits to overcome the compatibility headaches it causes. Whenever anyone asks, my advice is to stay with Windows XP (and to purchase new systems with XP preinstalled)."
... and there you have it.  Friends don't let friends use Vista.