I didn't put the full title of this article into the "subject" because it
was super long.
It Is Pretty Much a Bad Idea to Expose Raw SQL Through Your Database
Access Layer
or
Fun Things I Found Out About Your Company With Administrator Access to
your Database
I was originally inspired to write something about this
when I read
Jonathan Ellis remarking that ORMs should include direct SQL access
because
Django
recently added a different, but still dodgy, 'OR'-operator support to a
syntactically disappointing ORM query syntax. However, it really came to
a boil for me when I found that Divmod had been running some
third-party software with an SQL
injection vulnerability. (Yes, we have since
patched it, no harm done.)
Security experts have long known that code-injection attacks are pretty easy
on many popular programming platforms, and you should take steps to prevent
them. It's easy to find commentary on this. Stephen Thorne has had many
amusing and
insightful things to say about PHP's vulnerability to SQL injection
attacks, as well as the occasional dig about
just injecting PHP code
itself. If you're looking for something more serious, Steve Friedl has
written a fairly comprehensive
guide to understanding,
executing, and preventing against SQL injection.
An ORM's job is to provide an alternative interface to a database.
Interfaces should be complete things, not broken fragments of utility which
require manual crank-turning to function. If you have to use a different
mechanism to access the database within your application, the ORM is
incomplete and should be fixed. Sure, many programmers who use ORMs also
know SQL, and that is a useful skill, because today these are in closely
related problem domains, but they should not have to
use SQL within
the same context that is using the ORM.
The python "os" module provides (among other things) an alternative
interface to a large portion of the POSIX C API. As I said, interfaces
should be complete things. If you use a different mechanism to access the
POSIX C API, the "os" module is incomplete and should be fixed. Again, sure:
many programmers who use Python's "os" module also know the POSIX C API, and
that is a useful skill, because these are related problem domains, but they
should not have to
use the POSIX C API within the same context that
is using the "os" module.
In both cases, you can generate the underlying code yourself, and in both
cases, people sometimes really need to, so the fact that you can is
important. However, Few C programmers ever
want to drop back down
to C when they're using Python, and will rightly avoid it (as a complexity
cost) when they can; yet many Python programmers who use ORMs frequently and
loudly declare that they want to use SQL all the time.
Not to pick on Mr. Ellis. The syntax he's reacting to really is abhorrent
(although that's no comment on Django as a whole), and the tremendous Ruby
on Rails movement seems to largely agree with his point. There are good
reasons that people want SQL access from within ORMs. It's simple: most ORMs
are really, really awful. They are heavy on the "object" and not so much on
the "database". There are a lot of features that SQL provides which they
don't expose.
If you use an open-source ORM and find yourself wanting to use SQL to get at
one of those features, consider not clamoring for (or not using, if one
exists) an SQL execution back-door included in the library. Instead,
consider ways to integrate that SQL feature with the existing structure of
the ORM, or an extension which wraps that SQL feature on top of the ORM and
only generates the SQL you need in one small place. Obviously this goes
double if you are a user of Axiom - I do my best to accept any patches that
expose new database features that were previously obscured.
Don't just accept the status quo and generate SQL strings from within your
application. Originally this post was going to be longer and talk about API
structure and communication between programmers and preconditions and
postconditions and all kinds of fancy computer-science garbage, but I think
it would be better to leave you with just this one thought - the security
implications alone are
more than enough reason to be extremely
sparing, and careful, with the places that your code generates SQL. Isolate
it, test it, audit it, and don't make it a habit.
Update: This article is confusingly titled. In fact, Axiom does
provide an API for getting at SQL.
Store.executeSQL. The point
I am stressing here is that axiom does not "expose" it in that it is not a
supported, public API, and if you have to call it, Axiom is broken and you
should let Divmod know what you needed it for. To attempt to totally deny
access to that layer would be unwise; as I said earlier in the article,
"you can generate the underlying code yourself, and in both cases,
people sometimes really need to, so the fact that you can is
important".