Diesel: A Case Study In That Thing I Just Said

Thanks to jamwt for the shout-out on the announcement of Diesel.

Since the reaction to my reaction to tornado was so good (or at least so ... energetic), I figure I should comment on Diesel as well.  Spoiler alert: my reaction is ... largely similar, but since jamwt has been kind of nice to Twisted in the past, and didn't actually say anything mean this time, I'm somewhat reluctant to have that reaction.  Nevertheless, I swore a solemn oath to tell it like it is, keep it real, and soforth.  So I must.

Once again, I'm happy that event-driven programming is getting some love.  This time, I'm pleased that nobody is saying anything especially snarky or FUD-ish about Twisted.  I do feel like it's a little weird not to mention Twisted, or include some comparisons to Nevow or Orbited, both of which provide different, comprehensive approaches to COMET with Twisted.

(Worth noting: Orbited also originally started out using its own event-driven I/O layer, but switched to Twisted later, because Twisted is "crazy delicious".)

Diesel has many more interesting ideas at the level of async I/O than Tornado did.  I think the generator-based approach for implementing protocols is interesting and deserves some more exploration.  I'm not sold on it for every use-case, and I think the implementation might have some flaws, but it definitely has some advantages.

I'd give jamwt a hard time for not reporting issues and communicating with Twisted more before re-writing the core, but for three issues:
  1. jamwt's been around in the Twisted community for a while.  He's written a bunch of fairly deep Twisted code and he clearly knows what the framework is capable of.
  2. I've spoken with him on a number of occasions, and for all I know I might have discussed this with him.  I don't remember it, but it would be pretty embarrassing to write a big rant about how nobody talks to us only to have him paste some chat log where he explained why he was writing Diesel six months ago, and I said "oh, okay" ;-).
  3. Nobody is calling Twisted names or making vague, unsubstantiated accusations.  You're not obligated to examine Twisted, nor Nevow, nor Orbited, I just feel that you owe us some explanation if you publicly say that you tried it and found it wanting.  The tone on the Diesel announcement, in its one brief mention of Twisted, is "we tried it, but we kinda wanted to do our own thing".  So, good for them, they did their own thing, I hope they had fun.
Now, personally, I'd like to leave it at that, but there is a certain inevitable comparison that I think is going to take place.  Diesel has a nicer web page than Twisted.  They have entwittered ... twitified ... uh ... tweetened ... the project, and we haven't; we just have an old-fashioned "blog".  Diesel is smaller than Twisted, so it's easier to explain, and so the people approaching it will have a better idea of its scope.  This might give the immediate impression that it is a simpler, better, more "modern" replacement for Twisted's I/O layer, and this is not the case.  So I still feel it's important that I set the record straight.

Before I launch into my critique, I should say that I don't want to harsh on Diesel too bad. It's a neat little hack and you should go play with it.  And I feel bad pointing out problems with it, since as I mentioned above, nobody's dumping on Twisted.  So, Diesel fans, please take this in the spirit of a frank code-review, not a complaint about your behavior.

The interesting generator-munging bits could be easily adapted to run on top of Twisted's loop, which, arguably, they should have been in the first place; and the toy "hub" that they've written might be good enough for some simple applications where reliability under load is not a serious concern.  In fact, inlineCallbacks might provide a good deal of what is needed to support Diesel's programming style.  Alternately, Diesel might provide some hints as to how things like inlineCallbacks could be made more efficient.

That said, Diesel's I/O loop sucks.

It's disappointing to see the same mistakes getting made over and over again.  First and foremost: no tests.  Come on, Python community!  You can do better!  Write your damn tests first!

The #1 benefit that a brand-new I/O loop project could have over Twisted is that Twisted was written in the bad old days before everybody knew that TDD was the right way to write programs, so we don't have 100% test coverage.  But, we strive to get closer every day, while every new project decides that they don't need no stinking quality control.

Predictably, as it has no tests, Diesel's I/O layer is full of dead code, inaccurate  documentation, and unhandled errors.  Consider this gem, which I found about 30 seconds into reading the code: KqueueEventHub is documented to be "an epoll-based event hub", and its initializer defines an inner function which is never used.  I'm not going to belabor the point by enumerating all the typo bugs I found, but you may find the output of 'pyflakes diesel' interesting.

Instead of Tornado's inaccurate handling of EINTR, Diesel has no handling of EINTR, as far as I can tell.  It also doesn't handle EPERM, ENOBUFS, EMFILE, or even EAGAIN on accept().  To be fair, it has a catch-all exception handler all the way at the top of the stack, so none of these will cause instant crashes, but they will cause surprising behavior in odd situations (and possibly infinite traceback-spewing loops).

More surprisingly - I had to re-read the code about five times to make sure - it doesn't appear that sockets are ever set to be non-blocking, and EAGAIN is not handled from accept(), recv(), or send().  And yes, this can happen even if your multiplexor says your socket is ready for reading and/or writing.  The conditions are somewhat obscure, but nevertheless they do happen.  So, occasionally, Diesel will hiccup and block until some slow network client manages to send or receive some traffic.  In other words: Diesel is not really async.  It just fakes it convincingly, most of the time.

Once again, there's no way to asynchronously spawn a process, and no way to asynchronously connect a TCP client.  Sure, this looks like an asynchronous connect call, but it's misleading: it blocks on resolving the hostname, and it potentially blocks on the initial SYN/ACK/SYN+ACK exchange.  There's no asynchronous SSL support.  And no, that is not trivial.  Not to mention handling all the crazy errors that spew out of the Windows TCP stack.  And since the loop is implemented to be incompatible with Twisted, it's not obviously trivial to compatibly plug it in and get those features.

Again, I don't want to dump on Diesel here; for what it is, i.e. an experiment in how to idiomatically structure asynchronous applications, it's all right.  For that matter Twisted has its fair share of bugs too, which would be pretty easy to lay out in a similar post; you wouldn't even need to do the research yourself, just go look at our bug tracker.

But both Diesel and Tornado make the mistake of attempting to replace the years of trial-and-error, years of testing discipline, and years of portability and feature work that Twisted has accumulated with a few oversimplified, untested hacks.

What they could have done is contributed any extensions that they needed to Twisted's loop, or modifications to Twisted's packaging that would allow them to get a smaller sliver of Twisted's core to bootstrap, if that's what they needed.

My goal in pointing out all these flaws is not to illustrate any particular point about Diesel, but to reinforce the point I implicitly made in my Tornado post, which is that if you try to write a new mainloop (especially without tests) you will screw it up.  You will most likely screw it up in ways which will only surface later, under mysterious circumstances, when your servers are under load and you are under the gun for a deadline.

Or if I happen to get wind of it and write a blog post about it, of course.  Then you get to cheat a little.

It's not an indictment of Diesel that it screwed this up; everyone screws it up.  I would probably screw it up, if I didn't have Twisted sitting in front of me as a direct reference.  POSIX by itself is unreasonably subtle and difficult, but POSIX, plus the subtle variations in different platforms which implement it, plus the Windows APIs which are almost-but-not-quite-exactly-nothing-like the POSIX APIs, presents an inhuman challenge.

Hopefully Diesel will grow some tests.  Hopefully it will fix, or better yet shed, its somewhat unfortunate I/O hub.  I am hopeful that someone will follow Dustin's excellent lead (perhaps Dustin himself!) and port Diesel's API and generator system over to Twisted's I/O architecture and eliminate all these silly bugs.  Of course, it someone did that, you could use Dustin's tornado port with Diesel.

With the silly bugs from the I/O loop out of the way, the Diesel team can write tests for the more interesting pieces, and fix the bugs which aren't entirely silly :-).

Making Twisted Specific

"pffft. twisted isn't specific."
          — W. Allen Short
The original goal of the Twisted project, as I have been frequently reminded of late, is to create a general, inter-operable mainloop that isn't specific to any particular protocol.  The main loop wasn't a goal in itself, as the point of making it general was to provide an opportunity for all protocols could have serious, production-quality implementations that any Twisted application could have access to.  Twisted itself ships with many different protocol implemenations in furtherance of this goal, in an attempt to get critical mass.

This generality is a great strength.  It means that we've attracted a small crowd of generalists.  We have an excellent development process, ever-increasing quality of both code and documentation, and a wide variety of different protocol implementations and libraries for doing common networking and inter-process communication tasks.  We have recently been lucky to attract a few more excellent developers to help with this.

The one thing we haven't been so lucky about is attracting specifists.  Although we still need more people to make Twisted awesome as a library, our community is getting better and better at doing that.  What we need even more than that is individuals with a very specific, focused interest on just one thing that Twisted does.  Czars, if you will, to push the development of Twisted as a suite of interoperating applications.

Twisted already has within it the seeds of excellent replacements for Apache httpd, OpenSSH, BIND, hybrid ircd, Sendmail, imapd, pop3d, and a few other servers, not to mention clients like Pidgin and the OpenSSH command-line client.  In order to sprout and take root, those seeds each need a dedicated advocate, someone who cares deeply about the experience of a user or administrator who just wants Twisted to perform one particular function and doesn't want to write their own application code to make it do that.

Projects like the ones above - OpenSSH and BIND, for example - have an advantage in becoming useful: they have dedicated people who care deeply about satisfying a particular use-case, and are singularly focused on that case.  Since they only have the one problem to worry about, they can give it a much more direct treatment.

However, given the team of infrastructure programmers already working on Twisted, such a focused individual would have an incredible force multiplier.  Consider the statistics on Conch from our 2003 USENIX paper on Twisted: going just by line count, Conch was 4x easier to write than even J2SSH, which was itself substantially smaller than OpenSSH.  It was 10x easier to write than OpenSSH.  So, with the support of Twisted as infrastructure, one Twisted application programmer can do the work of ten merely mortal ones ;-).

It might seem to those of you looking to write a chat client, DNS server, or whatever open-source giant that you want to do battle with, that Twisted is just a library, and you want to write an application.  But we really want twisted to be a comprehensive suite of applications, we're just stretched too thin already to make it realize that potential.

So please rest assured that we would love to have your help with turning Twisted itself into a worthy competitor for these open-source giants - or, for that matter, if you want to build your own competitor as a layer on top of Twisted (for whatever reason: you love .ini files and we don't, you want a more freewheeling development process, or you want a different shade of green on your web pages) we'd still love to help you out and support that effort by fixing whatever issues you have with Twisted's core or protocols.  There's even a super-project on Launchpad for Twisted-but-not-part-of-Twisted projects.  I invite all you application developers out there to join that group and help us with world domination.

(If all that stuff about being ten times more effective as a programmer wasn't enough for you, how about this?  On the Twisted Matrix Labs map of the post-revolutionary world, I'm pretty sure the Emancipated Territory of New Jersey is still missing an archduke and several viscounts.  I can't make any promises, but if you get in on the ground floor of this thing there's still a chance you could be a ruling member of the Twisted over-government!)

The Hole At The End Of The Pipe

Matt Campbell, a long-time fan of my ramblings, pointed out a post from John Resig that reads almost like a response to my ideas about the browser as a deployment target, despite the fact that it was written several years ago.

While Mr. Resig isn't adamantly against "language abstractions" - he notes many of their benefits - his counterpoint is summed up in this paragraph:

In the case of these language abstractions you are gaining none of the benefit of learning the JavaScript language. When a leak in the abstraction occurs (and it will occur - just as it's bound to occur in any abstraction) what resources do you have, as a developer, to correct the problem? If you've learned nothing about JavaScript then you stand no chance in trying to repair, or work around, the issue.


This is becoming a popular fallacy in programming language circles; treating Joel Spolsky's "Law of Leaky Abstractions" as if it were an actual law.

Let's examine the metaphor of the "leak".  In plumbing, a leak is a hole in a pipe where water gets out.  Joel has noticed that every pipe has a hole in it, and therefore all pipes are leaky.

But that's not quite accurate.  There's another hole in pipes where water gets out: it's called the "faucet", and without that part, the rest of the pipe is pretty useless.  To say that a pipe whose faucet is turned on is "leaky" is somewhat misleading, just as it's misleading to say that an abstraction that propagates errors in its lower levels is misleading.  Joel's entire original essay is based on a subtle (and, I suspect, intentional) misunderstanding of TCP: the error conditions that result from failures in the lower level, unreliable packet delivery mechanism are not leaks in the abstraction, they are very carefully specified and thoroughly documented.  They are part of the abstraction.  The abstraction of TCP does not try to pretend that connections are never broken, it just provides a unified idea of a "broken connection" that is clearly specified so you don't need to understand the five million ways that packet delivery can go wrong.

Put more simply: there are abstractions which do not leak.  The example that Joel provides is one of them: TCP is a comprehensive abstraction.

Then there are abstractions which really do leak.  Every object-relational mapper that provides a facility where you need to directly execute SQL, for example, is leaking the SQL through the abstraction.  Every web templating framework where you can directly generate strings is leaky: the browser speaks DOM, and if you're generating strings, then bytes are leaking through the abstraction.

But "language abstractions" — or as those of us who are not hip to the new web lingo call them, "compilers" — are generally accepted to be the kind of thing that work well enough that you can trust them.  I don't know the specifics of the current crop of javascript-targeting compilers.  Maybe GWT and Pyjamas have issues that would require some knowledge of JavaScript to use them correctly.  A well-written compiler, one that really lived up to the promise of treating the browser as a deployment target, wouldn't have those kinds of issues though.  Let's turn the wayback machine to 1969 and cast Mr. Resig's argument against the contemporary contender for moving up the abstraction stack:

In the case of UNIX, you are gaining none of the benefit of learning the PDP-11 instruction set. When a bug in the C compiler occurs (and it will occur - just as it's bound to occur in any compiler) what resources do you have, as a developer, to correct the problem? If you've learned nothing about PDP-11 assembler then you stand no chance in trying to repair, or work around, the issue.


So, for those of you who work on UNIX-like operating systems using that fancy "C" machine-code abstraction: how much PDP-11 assembler have you written recently?

Tornado + Twisted

Many kudos to Dustin Sallings, who has already created a branch of Tornado which uses Twisted for both networking and HTTP parsing, in probably less time than it took me to write my previous post about how somebody should do that.  Awesome!

(The method it uses is currently a little weird, where you create a "Site" object, but it looks like it would be pretty simple to use a Resource instead if you were so inclined.)

What I Wish Tornado Were

FriendFeed has released its web server, Tornado.  It seems like everyone's blogging about it, and it's obviously relevant to my interests, so I feel like I should say something.

Let me start with the good stuff.  First of all, I think it's great that we have yet another asynchronous contender in the Python world.  Every time something like this comes out, it means that Twisted has to fight that much less hard to get over the huge hump of event-driven programming being too hard, or too weird, or whatever.  It's good to have an endorsement of the general message "if you need a web server to handle COMET requests, it needs to be asynchronous to perform acceptably" from such a high-profile company as Facebook.

Unfortunately I think the larger picture here is a failure of communication in the open source community.  In the course of developing Tornado, there are several things that FriendFeed could have done to move the Twisted community forward, at no cost to themselves.  I don't want to rag on FriendFeed, or Bret Taylor, or Facebook here; they're not the first to re-write something without communicating.  In fact I recently had almost this exact same discussion with another project that did the same thing.  Since Tornado is such a high-profile example, though, I want to draw attention to the problem so that there's some hope that maybe the next project won't forget to communicate first.

My main point here is that if you're about to undergo a re-write of a major project because it didn't meet some requirements that you had, please tell the project that you are rewriting what you are doing.  In the best case scenario, someone involved with that project will say, "Oh, you've misunderstood the documentation, actually it does do that".  In the worst case, you go ahead with your rewrite anyway, but there is some hope that you might be able to cooperate in the future, as the project gradually evolves to meet your requirements.  Somewhere in the middle, you might be able to contribute a few small fixes rather than re-implementing the whole thing and maintaining it yourself.

This is especially important if you are later going to make claims about that project not living up to your vaguely-described requirements, and thereby damage its reputation.  Bret Taylor claims in his blog:

We ended up writing our own web server and framework after looking at existing servers and tools like Twisted because none matched both our performance requirements and our ease-of-use requirements.

First and foremost, it would have been great to hear from Bret when he started off using Twisted about any performance problems or ease-of-use problems.  I'm guessing that Twisted itself had only ease-of-use problems, and other "tools like Twisted" were the ones with performance problems, since later, in a comment on the same post, he says:

I can't imagine there is much of a performance difference [between Twisted Web and Tornado].  The bottom is not that complex in my opinion.

It would also be great if he had explicitly said that Twisted didn't have performance problems rather than making me guess, because I'm sure that is what lots of developers will take away from this.  When you have the bully pulpit, off-the-cuff comments like this can do serious damage to smaller projects.

More to the point, what is the problem with "ease of use", exactly?  The fact that he found Deferred tedious, in particular, seems very strange to me, given that it is so un-tedious that it has become a de-facto standard even in the JavaScript community.  We had no opportunity to help him or anyone else out, because as far as I can tell from searching our archives, we never heard from him or from anyone else at FriendFeed when they were trying out Twisted at first.  Even as he's saying that Twisted is hard to use and (maybe?) performs poorly, he isn't pointing to any particular example of what about it is hard to use, or what performs poorly.  There's still nothing we can do to address this criticism.  And there's still not much we can do to make sure that future potential Twisted users won't have this problem.

Later, in yet another comment, Bret points out the root problem:

... the HTTP/web support in Twisted is very chaotic (see http://twistedmatrix.com/trac/wiki/WebDevelopme... - even they acknowledge this)...

This is true.  However, as I frequently like to note, Twisted is starved for resources.  Reconciling the chaos described on the page about web development with Twisted is an ongoing process.  For a tiny fraction of the effort invested in Tornado, FriendFeed could have worked with us to resolve many of the issues creating that chaos.

This is the main thing I want to reinforce here.  If half a dozen occasional contributors with a real focused interest in web development showed up to help us on Twisted, we'd have an awesome, polished web story within a few months.  If even one person really took responsibility for twisted.web, things would pick up.  But if everyone who wants an asynchronous webserver either uses twisted.web (because it's great!) without talking to us or decides not to use it (because it doesn't meet their unstated requirements) without talking to us, it's going to continue to improve at the same sluggish pace.

Even at the current rate, by the time we have an excellent HTTP story, I somehow doubt that Tornado will have a good SSHv2 protocol story ;-).

In his comment, Bret also takes a couple of pot-shots at Twisted that I think are unnecessary, and I'd like to address those too.

In general, it seems like Twisted is full of demo-quality stuff, but most of the protocols have tons of bugs.

We're not talking about "most" of the protocols here, Tornado is only concerned with HTTP.  And the HTTP implementation(s) in Twisted do not have "tons of bugs".  They are production quality, used on lots of different websites, and have lots of automated tests.  While much of the code in twisted.web doesn't have complete test coverage, since it's old enough to predate our testing requirements, I note that Tornado appears to have zero test coverage.

There's a kernel of truth here — some of the older, less frequently used protocols have a few problems — but in most cases the "bugs" are really just a lack of functionality.  Twisted overall has very few protocol-related bugs, and again, our test policy makes sure that new bugs are introduced very rarely.

Given all those factors, it didn't seem to provide a lot of value. Our core I/O loop is actually pretty small and simple, and I think resulted in fewer bugs than would have come up if we had used Twisted.

I must respectfully disagree.  Again, I don't want to rag on FriendFeed here, but here are several features that Tornado would have, and bugs that it wouldn't have, if it used Twisted for the event loop and none of the HTTP stuff:
  1. EINTR wouldn't cause your application to exit if run in a non-US-english locale.
  2. You don't have the opportunity to forget to set a socket to be non-blocking and thereby make your entire application stop.
  3. It would be possible to run your application on Windows.
  4. Firewalled connections and running out of file descriptors wouldn't cause your server to spew errors forever (at least, it won't any more).
  5. You could write a TCP client that didn't block for an arbitrary amount of time in connect().
  6. Finally, of course, you could use all of Twisted's other protocols, client and server: IMAP, POP, SMTP, IRC, AIM, etc.  You could also use external protocol implementations like Thift.
  7. You could spawn asynchronous subprocesses.
and this is a very short list, based on a cursory reading of the source code, not actually running tornado and not a particularly deep audit.  Some of these bugs might not be as serious as I think, and there might be plenty of other bugs.  But I can't really be sure what works for sure, since again: there are no automated tests.

This list is a great example of why projects like Tornado really should use Twisted.  Tornado implements some innovative web-framework stuff, but absolutely nothing interesting that I can see at the level of async I/O.  Using Twisted would have allowed them to focus exclusively on cool web things and left the never-ending stream of incremental surprising platform-specific, only-happens-in-weird-situations bugfixes to a single, common source.

What To Do Now

I hope that someone at FriendFeed will be a little heavier on detail and a little lighter on FUD in some future conversation about Twisted.  However, I'm sure they're going to have their hands full maintaining their own code, so I don't have high expectations in this area.  I'm sure Bret wasn't intentionally slamming Twisted, either; it wasn't like he wrote a big screed about it, he just dropped in a few unsubstantiated comments into a much larger post about Tornado. So I just want to be clear: I don't have sore feelings, I don't need anybody to apologize to me or to Twisted.

If any of you out there are fans of both Tornado and of Twisted, it would be great if you could contribute a patch to Tornado which would allow it to at least optionally use Twisted as an I/O back-end.  It would be great, of course, if lots of people interested in web stuff would help us out with our web situation, but supporting the Twisted event loop would be good regardless. It would mean that when people wanted to speak multiple protocols, they wouldn't need to re-write or kludge in their existing Tornado application, so it would increase the chances that we could get some help with our SSH, FTP, IRC, or XMPP code instead.  It would also open up a much wider multi-protocol landscape to users of Tornado, even if Tornado's default mode of operation still used ioloop.py.

Even better would be to hook up something that made a Tornado IResource implementation, so that Tornado applications and twisted.web and Nevow applications could all be seamlessly integrated into one server.

The whole point of Twisted is to have a common I/O layer that lots of different libraries can use, share, and build on, so that we can solidify the common and highly complex abstraction required of a comprehensive, cross-platform, event-driven I/O layer.  In order to realize that vision, we need help not just with the code; we need more Twisted ambassadors to go out into the community and help us integrate these disparate applications, help us find out where real users are finding the documentation inadequate or the organization confusing.

Tornado could be an excellent opportunity for those ambassadors to go out and introduce others to the wonders of Twisted, because its endorsement from FriendFeed guarantees it an audience of a tens of thousands of developers, at least for its first few months of life.  If you've shied away from contributing to Twisted itself because of our aggressive testing and documentation requirements, well, Tornado apparently doesn't have any, so it would be a great place for you to start :).