Axiom 0.3.0 - The ORM I Won't Shut Up About, Already

Wednesday November 02, 2005
We released Axiom again today.

It's making progress at an even keel these days.

A few minor issues: some bugs were fixed in upgraders, some deployment order dependency issues with Epsilon were fixed. We added some more data integrity checks, and came up with a way to dodge a few more inheritance quirks with Item.

The major upgrade in this release is that queries are objects, rather than simply generators. This shift in API is a subtle step further towards exposing as many efficient SQL operations as possible without exposing any SQL.

Since the release announcement is already over, you might have guessed - I tricked you. This is hardly a release announcement at all. It's an explanation of one of Axiom's goals: complete encapsulation of SQL in an object-oriented model.

Some people, most notably David Heinemeier Hansson, would say that's a pretty bad goal to have. Oddly enough, with the people they are generally arguing against, I'd agree. The relational model should get some respect. It's where your data actually lives. You shouldn't swaddle it to the point where it's suffocating.

The agreement ends pretty quickly though. I think of SQL in your application like fire on a cold night. It keeps me warm, sure, and I shouldn't douse it with water because I'm afraid of getting burned. I might freeze. That doesn't mean I stick my hand into the open flame.

You might ask, why such an intense metaphor? Isn't SQL just another tool in my programming toolbelt?

Metaprogramming is hard, and dress it up however you like, that's what using SQL is. Your code is generating other code, and evaluating its results.

Because metaprogramming is so hard, it is almost exclusively the province of frameworks, environments and operating systems. For good reason, too. Whenever code generates other code, there are potentially very serious mistakes that can get made. SQL is a hobbled language in most databases, so the damage is restricted. It's only restricted in the sense that it won't cause your server to segfault, though. Your database is probably where 99% of your application's (and possibly your company's) value lives anyway, not your server's heap memory. Mistakes in generating SQL that talks to that database can be very costly.

Getting away from SQL injection attacks is a minor feature of Axiom, so it's easy to forget that it is a really serious problem. People make this mistake all the time, with disastrous consequences. Unless you read Bugtraq, that is. In which case you are probably reminded of this somewhere between 2 and 20 times per week.

Having an object model for your SQL also helps you generalize things that might otherwise be overspecific, and test unrelated database features more independently. For example: there is a utility function in Axiom (go read the docstring, it's fun) which finds overlapping ranges of values. The query looks like this:

OR(
AND(startAttribute >= startValue,
startAttribute <= endValue),
AND(endAttribute >= startValue,
endAttribute <= endValue),
AND(startAttribute <= startValue,
endAttribute >= endValue)
)

If that had been accomplished with hand-generated SQL rather than objects, I'd have to quote every one of those attributes. I'd have to figure out how to make sure that the table's name was fully qualified whenever the user passed data into this function - if it needed to be. It would be hard to write a test for. Finally, it'd be hard to know how to properly involve it in a join. All the arguments would be strings, so I would have no idea if they were properly formatted or not until the database spit back a syntax error - which might not contain any useful information.

If I were generating SQL by hand though, I doubt it would have occurred to me to write such a function in the first place. I just would have written the SQL code necessary to do this for the calendar_event table and moved on.

If your database wrapper is going to do caching, queries have to be introspective so that the cache manager knows when a given chunk of SQL might invalidate your cache. Managing caching plus concurrency is hard enough with help from every layer of your system.

There are also some features which are directly supported by a complete object model for queries.

The feature I'm going to be working on tomorrow, a generic browser for tabular data, is another example of where having as much data access as possible happen through objects is helpful. The tabular data browser, or "TDB" as we call it, takes a query-like object as an argument, so you can page through complex queries. That object encapsulates both information about the objects being queried and the database itself, the TDB can both display the data appropriately and easily and quickly generate appropriate SQL. Without such an intermediary layer, the TDB would itself be a mess of string concatenations and quasi-generic SQL generation of its own.

Eventually we hope for Axiom to monitor queries, inserts, deletes and updates as they happen to provide a "live" query interface, which takes the same objects that SQL generation would, but provide an active view onto those objects as they appear, change, and disappear. If you're issuing UPDATEs and INSERTs to the database without going through an intermediary layer there is nowhere to catch this but triggers - and again I find a point of agreement with DHH, the database is not where the smarts belong.