Databases and Twisted: When Threads Are OK (For Some Purposes)

Last month, a thread on the twisted-web mailing list got me thinking about a frequently implemented, but seldom understood usage of Twisted: writing applications backed by a traditional database server.  I tried to write a timely reply on the mailing list, but found what I had to say on the topic was overflowing the polite bounds of an email message.  I've tried to write about this before, buried in the middle of a post about something else.  I don't think I really got my message across, though, because I believe this was quoted as saying that I find asynchronous data-access APIs "extremely painful".  Asynchronousness is not the point that I find difficult, as much as I do transactionality (and integration with existing database bindings).

So this time, please bear with me as I explain enough context to properly frame my opinion.

I think that concurrency is a difficult problem that affects every aspect of your code, and so it is important to have a comprehensive, consistent, and easy to understand plan to deal with it in any given system.

Twisted's "cooperatively multitasking / callbacks-scheduled-by-I/O-and-timers" idiom is one concurrency model.  Deferreds are a super important convenience mechanism in that model, but they're not completely necessary; you can do this with just dataReceived, connectionLost, callLater etc.

In general this model - let's call it something memorable, like "CM/CSBIOAT" - is a pretty easy concurrency model to work with once you know how it works.  In particular, it's pretty easy to avoid making a common variety of serious concurrency mistakes, since you don't need to remember to declare any locks, and the behavior of the system under load is unsurprising, if not necessarily ideal for performance.

Threading is another concurrency model with which we are all familiar.  Shared-state multithreading is a pretty bad concurrency model for general use.  In particular, it's very easy to make mistakes that are impossible to diagnose or reproduce.  Despite its unsuitability for applications, threading can be a useful building-block as a low-level tool to construct higher-level concurrency models.  In many practical cases I am aware of, threading is the only available building-block at this level for building efficient implementations of other concurrency models, because operating systems and compilers don't provide anything better.

There is an antipattern that arises from a somewhat naive understanding of these two models.  The Twisted novitiate discovers that Twisted Is Good, and Threads Are Bad.  Experimentally they discover that this is indeed true, and that despite its eccentricities, writing and debugging "Twisted" code (whose benefits really come from the CM/CSBIOAT pattern) is a lot easier than writing and debugging threaded code.

So, our unfortunate Twisted novice now needs to write a database application: what to do?  Well, one way they can write it is to "just use threads" for data-access logic and communicate with Twisted some other how - for example, to put all their database logic in a function they pass to runInteraction.  The only other apparent option is to "use Deferreds" and invoke adbapi.ConnectionPool.runQuery or runOperation.  Deferreds are Twisted - Good!  Threads are ... threads!  Bad!  The answer seems obvious.

However, in choosing to use this facility, you've done far more than choosing between "twisted" and "threads".  If you use runInteraction, you can easily keep all of your work in a single transaction; since database APIs are blocking, you can only safely do a read followed by a write in the same transaction if you can block between those calls.  If you do a runQuery, take the result of that and pass it as input to a runOperation, you're sharing data between two different transactions and potentially two different cursors.  Whether Deferreds or good or not, this breaks the assumptions that the underlying database uses to keep its data consistent.  Consider incrementing a counter.  In the "threaded" case, you'd do something like this:

  def interaction(txn):
      x = doSql(txn, "select thingy from foo where bar = baz;")
      doSql(txn, "update foo set thingy = ? where bar = baz;", x+1)
  cp.runInteraction(interaction)

This always results in foo.thingy being set to foo.thingy + 1.  If your database is set up properly (and most are by default) there's no opportunity for other code to execute between those two statements.

But in the "twisted" case you do something like this:

  @inlineCallbacks
  def stuff(cp):
      x = yield cp.runQuery("select thingy from foo where bar = baz;");
      yield cp.runOperation("update foo set thingy = ? where bar = baz;", x+1)

As syntactically pleasant as that appears to be, and as convenient as it might seem to be able to call Twisted APIs as much as you want in the middle of this work, any amount of code can run between the first line and the second, thanks to those pesky 'yield's.  That means if you run 'stuff' twice, there's a good chance that your callbacks will stomp on each other and one of the increments will be lost.

Transactional relational database access is a really different concurrency model, all its own.  In many cases it appears to be the same as plain old shared-state multithreading; not least of which because it is implemented using threads and the threads are completely exposed to application code, and made part of the database interface's API.  However, using a transactional database to store your interesting state is much, much safer than just using threads to access any old datastructure.  An ACID database is specifically implemented to provide a consistent view of your data to any executing client at any time, and in the cases where that would be impossible, to schedule execution of various clients to provide an ordering where data is consistent.  (You'll notice that I have avoided saying "thread", but in practice an executing SQL client is a thread in your application.)  Caching middleware confuses this issue, making it more like regular multithreading; but in a good SQL database, using threads rather than just separate processes is just an optimization; one which should be completely transparent to your application code.

Axiom doesn't really have a concurrency model (it ought to, but that's a discussion for another day).  The idea there is that, like the rest of Twisted, you try hard never to block for too long.  It is possible — too easy, really — to screw this up and block for a long time waiting for the disk in an Axiom program, but to some extent that's true of any Python code.  Since Axiom is typically accessed by one, or at most two processes at a time, you won't end up blocking on your database for a long time because some other code is using it; the main thing Twisted's concurrency model is designed to prevent is your code blocking and getting stuck or being idle, not your code blocking at all.  So, I'm going to give you advice for using Storm or ADBAPI: the only advice for Axiom is "write fast queries".

Assuming that you're writing a traditional database application, here's my advice for you.

Let's assume that Storm (or ADBAPI) does not have any thread-safety bugs itself.  This assumption is unlikely to be completely true, but you probably have to make it regardless if you're going to use either of these things at all, regardless of my advice :).  With that assumption, you can use Storm (or ADBAPI) with Twisted from a thread-pool and pretend, in your application, that the threads do not exist.  You should avoid accessing global state and pretend that your code might be run in a subprocess or a thread or even on a different computer. If you're lucky, one day it will be, and your application will "magically" scale!  If you follow this simple discipline, you can cleanly interface between the Twisted concurrency model (where you do all of your non-database I/O) and the RDBMS concurrency model (where you interact with all of your "data" objects).

Don't touch any database objects in your Twisted mainloop.  Don't touch any Twisted objects in your database transactions.  This has the added benefit of not needing to worry that you're sending out information about partially-completed database operations to a network connection, or injecting potentially transient network state into a persistent database operation that may need to be re-tried.

In theory, there's nothing stopping an asynchronous data-access API from doing all of the same stuff that I just described threads doing.  All you'd need is good non-blocking database infrastructure, non-blocking transactions, and a bunch of code to associate a running transaction object with a particular database transaction and cursor.  It is possible, if you go down to the database-protocol level, to write a database wrapper which actually integrated with the Twisted concurrency model and treated your database as just another source of input and output.  In terms of preventing errors and assisting making code testable and deterministic, I think this would be an improvement over the threaded version of this solution.

However, implementing such an improvement would likely take quite a bit of time.  Time that most small database-backed projects don't have, so it's unlikely someone will need to scratch this particular itch any time soon.  Even if someone did do all that work for one database, it's likely that a lot of it would need to be done over again for each subsequent set of database bindings; so, using a DB-API module in a thread would remain the only way to retain database portability.

For the moment, threads and threadpools are the tools that existing database bindings give us to manage transactions, and it's likely that they're adequate for a huge majority of applications.  The only real problem is that you can't completely hide threads from the application and make sure they're not being used for evil.

Search History: L

I'm a bit late to the party, but I just found an old screenshot of my search history beginning with "L", from my laptop.



An Underserved Market

I play video games.  Also, I'm married.  Ying also plays video games.  More than I do, even.  Where are the games — besides WoW — that we can play together?

I know a couple of other guys who like to play games with their significant others.  I really feel like the gaming generation has grown up at this point, but where are the grown-up games?

My favorite kind of video games are immersive, story-driven games with open worlds and a lot of flexibility.  I am really digging The Witcher: Fewer Bugs Edition right now, but despite its "mature" and "philosophical" themes, it feels like a game written for a "mature" and "philosophical" adolescent male misfit rather than the usual vanilla adolescent male misfit.  That's not really a black mark against this specific title - it in particular seems to pull off the stereotypical fantasy tropes very well.

While the independent gaming scene is a lot better in terms of raw originality, I haven't seen anything I can recall on TIGSource where I thought "That would be great to play with Ying."

This is mostly a rhetorical question (get to making those games, game-makers who are reading this!) but I would also be very happy to be proven wrong.  If you leave a comment with a game we end up enjoying I'll definitely blog about this again.

Installing Software on Linux Doesn't Need to be Terrible: A Photo Essay

Installing third-party software, especially end-user GUI software, on Linux is frequently a disaster.  This is frequently taken (especially by pundits) as some kind of inherent limitation of the platform, an indictment of its design and core principles, but it isn't.  In fact, the platform has had solutions to this problem for a long time, but it seems like the people who need the solutions the most, i.e. the people packaging commercial software, don't know about them.

I love to complain about this problem.  I'll take this opportunity to do so, as a matter of fact.  Installing the vmware-server management console is a typical commercial-software-on-linux experience: you download an archive, start up a terminal, run a shar executable (as root!) to install their crappy package which doesn't fucking work, you play with obscure tools that a real "end user" is never going to figure out, like ldd and ltrace, and eventually the damn thing starts up.  (Their solution to this problem?  A web-based management console which is uglier, less usable, and still doesn't fucking work.)

I love vmware - well, vmware-server 1.0, at least.  I really like the fact that the management console uses GTK, fits right into my desktop, and seems to behave like an actual application on linux.  (If the gross web UI is some kind of trick to get me to buy Workstation instead of continuing to use Server... well, it might work.)  The packaging of the software, though, is a masterful example of snatching defeat from the jaws of victory.  It's bad enough that every time I have to set up a new vmware installation I spend a few hours surveying the virtualization options on linux to see if anything might ease the pain.

The point of this little write-up isn't to stare unblinking into the gut-wrenching horror of a vmware install on Ubuntu, though.  The various ubuntu forums do that well enough.  Thank goodness for that, too, or I probably wouldn't have figured out how to use it.  No, my purpose here is to show what happens when you do it right.  Installing software on Ubuntu can be as nice as on a Mac.

I've been telling people for a long time to make Debian packages of their software, when they release builds for Linux, but I don't know if I've really communicated why having a package is better than anything else.  Thankfully, Ubuntu has now set up everything so that there is a really obvious reason why you should build a package: it's about a million times more user-friendly than doing the opposite.

Inform 7 has managed to provide an excellent example of how an installation on Linux should go, for an end user.  It just so happens that inform 7 is not open source.  I installed the whole thing using only the mouse, except for typing my administrator password.  The story begins on their Download page.

http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/screenshot2.png
Notice how this page is clear about which thing I should click on, even if I can't read.  I just need to identify the little penguin, and the little three-hugging-people logo.  It would be pretty weird for an illiterate person to install Inform, but nevertheless, it saves us from reading lots of extra stuff.

So I click on the download link, and Firefox prompts me to decide what to do with this.
http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/screenshot3.png
In fact I would like the package installer!  I click OK (after waiting for it to activate, since firefox has helpfully prevented me from clicking that by accident...).  Now I wait for it to download...
http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/screenshot5.png
Done!  I don't even need to do anything before the Package Installer window helpfully pops up to help me install this package:
http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/screenshot6.png
Now, I click "Install Package", and my keyboard gets involved in this process for the first time:
http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/Screenshot.png
I'm pretty sure I do in fact want to install Inform 7, so I type it in, and the package installer does its magic...
http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/screenshot7.png
http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/screenshot8.png
Yay!  It's done!  I just need to look in my Applications menu to find it.
http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/Screenshot-2.png

And there you have it:

http://www.twistedmatrix.com/users/glyph/images/content/blogowebs/i7essay/screenshot9.png

This is what I want installing third-party and proprietary GUI software to be like on Linux.

Please do not use a shar archive.  I don't want to have to tell nautilus that yes I actually want to run this file which may be an executable text file, and I definitely don't want its EULA prompt on the console to be lost.

Please do not use a zip file.  Id on't want to have to drag files out of archive manager or open a terminal.

I just want to click a button and have the whole thing work.  The work required to make this happen is seriously not that substantial.  You even can pay me to do it, but chances are if you've already built an application for linux, you have enough experienced people on staff to do this yourself.

Notice that with very little additional work you could also provide a similar experience on Fedora; I haven't done it myself, but I have to assume that clicking on the little blue icon on a fedora machine would be a better experience than a bunch of shell commands.

Numbers That Go Up

A few years ago I was talking to Ying about my aspirations to one day develop my own game (Iä! Divunal! May its slumber soon be ended!).  I was telling her about its design — naïve student of interactive fiction that I was, I had decided that there would be two salient features: permanent death, and "no numbers anywhere".  Everything would be relayed to the player by way of descriptive phrases, because that was, like, more real, man.

Never one to pull any punches on my account, Ying told me she didn't think it would be any fun.  I asked her why she thought this game — which, if I could pull it off, would be artful, like a reading a novel that was written for you every day — would not be fun, whereas a grindy stat-monster like DragonRealms was fun. She said:
People like numbers that go up.
This is a phrase which I have now both said and heard in countless conversations with professional game designers.  At the time it struck me as insightful, but I didn't realize how insightful it was for many years.

Now, I am obsessed with the power of numbers that go up.  It's not just a trick for game designers.  It's a basic part of the human condition.  A power so great it can be used only for good, or evil.  A tool for positive social change.  A force which, every day, keeps little babies from dying.

You think that last bit was hyperbole, right?  Wrong.  Doctor Virginia Apgar developed the Apgar score, which is a way of rating how healthy a newborn baby is.  The development of the score itself, not any particular technique for improving the score, was responsible for drastically reducing infant mortality.  (I believe one of the largest drops in recorded history, but I can't find an online citation for that.)

Lest you think this had something to do with the culture around video games, she did this in 1952.

Many people have observed that there is also a dark side to this phenomenon.  In 1987, Alfie Kohn famously wrote an article that is now distributed with every copy of emacs which notes that giving people incentives gets them to focus on the incentive rather than the task; and enjoy the incentive more than the task.  In 2000, Joel Spolsky wrote "Incentive Pay Considered Harmful", which details the numerous problems with HR reviews and employee incentive pay.  Just this month, he expounded again in "How Hard Could It Be?", noting that when you pay people to optimize something, they will optimize it, whether that helps you or not.

I don't believe there's a contradiction here.  What these studies are observing is that, if you crudely design and crudely present an incentive, it will have crude effects.  There's an art and a science to designing incentives, and the people who write employee incentive plans (and incentive impact studies) are not really using interested in using incentives in the way that makes them effective: to make activities more fun.  The only people who really get this right are game designers.  Unfortunately, the insight that game designers have is rarely shared with other disciplines. Game studios make even software startups look tame by comparison, leaving their employees little time for professional development, or, you know, sleeping.  When their ideas are shared, they are frequently, almost implicitly dismissed — after all, it's "just a game".  The "serious" folks want measurable objectives, clearer research, not "fun".

But tell a "serious" cognitive scientist or a "businesslike" incentive plan designer to produce a scheme which will cause the user's brain to release large amounts of Dopamine on demand, and they're not likely to deliver anything useful.

I certainly don't have as much experience with this as I'd like, but I've made plenty of observations.  My personal hypothesis is that the key factor here is subtlety.  The apgar score is an arbitrary number.  It means nothing beyond what it means.  Your progress, or health in a game has no meaning beyond the game.  It's an obvious yardstick by which you can judge your progress, but it doesn't really matter.

Stack Overflow is, I think, a great example of this type of subtlety.  Your reputation is arbitrary, and there are lots of arbitrary little landmarks you can achieve.  "Badges", originally from City of Heroes, and also known as  "achievements" on Steam, are a great way to motivate users to stick around just a while longer.  "Oh, I'm done for the day, but I'm only 30 votes from civic duty.  Let me vote on a few more things."  Because the motivation is there, you stick around; but because it's subtle, it's not worth aggressively gaming the system (and thereby wrecking it).

You can see the flip side of that pretty quickly on similar sites that try to motivate participation based on money.  I can't tell how good Experts Exchange is, because it throws up roadblocks, to protect their precious "content" and make sure they can make money on it.  Those who I know who have tried it assure me that it is full of spam and fraud, largely because the incentive structure is all based around money.  The "score" doesn't represent progress or mark status, it is progress and it is status.  The point of Experts Exchange is to get money, and the questions are just there to provide you with a mechanism to do so.  The point of Stack Overflow is to provide good answers to questions, and the reputation score is just there to get a rough idea of who does that best.

I think that this principle can be applied in lots of places in everyday life.  To give you a hint of where I hope to apply it at Divmod, in Blendix: consider the feeling of getting a point in a game, and the feeling of checking off an item on a to-do list.  Compare beating a level to seeing a page full of "done" items.  Just imagine that page full of ticked-off checkmarks.  Makes you want to write a to-do list, doesn't it?