This Word, "Scaling"

You keep using that word.  I do not think it means what you think it means.
        — Inigo Montoya
It seems that everyone on the blogosphere, including Divmod, is talking about "scaling" these days.  I'd like to talk a bit about what we mean ­— and by "we" I mean both the Twisted community and Divmod, Inc., — when we talk about "scaling".

First, some background.

Google Versus Rails

Everyone knows that Scaling is a Good Thing.  It's bad that Rails "doesn't scale" — see Twitter.  It's good that the Google App Engine scales — see... well, Google.  These facts are practically received wisdom in the recent web 2.0 interblag.  The common definition of "scaling" which applies to these systems is the "ability to handle growing amounts of work in a graceful manner".

And yet (for all that I'd like to rag on Twitter), Twitter serves hojillions of users umptillions of bytes every month, and (despite significant growing pains) continues to grow.  So in what sense does it "not scale"?  While that's going on, Google App Engine has some pretty draconian restrictions on how much an application can actually do.  So it remains to be seen whether GAE will actually scale, and right now you're not even allowed to scale it.  Why, exactly, do we say that one system "scales" and the other doesn't, when the actual data available now says pretty much the opposite?

A GAE application may not scale today, but when Our Benefactors over at the big "G" see fit to turn on the juice, you won't have to re-write a single line of your code.  It will all magically scale out to their demonstrably amazing pile of computers — assuming you haven't done anything particularly dumb in your own code.  All you have to do is throw money at the problem.  Well, actually, you throw the money at Google and they will take the problem away for you, and you will never see it again.  It accomplishes this by providing you with an API for accessing your data, and forbidding most things that would cause your application to start depending on local state.  These restrictions are surprisingly strict if you are trying to write an application that does things other than display web pages and store data, but that functionality does cover a huge number of applications.

Rails, on the other hand, does not provide facilities for scaling.  For one thing, it doesn't provide you with a concurrency model.  Rails itself is not thread safe, nor does it allow any multiplexing of input and ouptut, so you can't share state between multiple HTTP connections.  Yet, Rails encourages you to use "normal" ruby data structures, not inter-process-communication-friendly data structures, to enforce model constraints and do other interesting things.  It's easily to add logic to your rails application which is not amenable to splitting over multiple processes, and it's hard to make sure you haven't accidentally done so.  When you use the only concurrency model it really supports, i.e. locking and using transactions via the database, Rails strongly encourages you to consider your database connection global, so "sharding" your database requires significant reconsiderations of your application logic.

These technical details are interesting, but they all point to the same thing.  The key difference between Rails and GAE is the small matter of writing code.  If you write an application with Rails, you probably have to write a whole bunch of new code, or at least change around all of your old code, in order to get it to run on multiple computers.  With GAE, the code you start with is the code you scale with.

Economics of Scale

The key feature of "scalability" that most people care about is actually the ability of a system to efficiently convert money to increased capacity.  Nobody expects you to be able to run a networked system for a hundred million users on a desktop PC.  However, a lot of business people — especially investors — will expect you to be able to run a system for a hundred million users on a data-center with ten million cores in it.  Especially if they've just bought one for you.

Coding is an activity that is notoriously inefficient at converting money into other things.  It's difficult to predict.  It's slow.  But most unnervingly to people with money to invest, pouring money on a problematic software project is like pouring water on an oil fire: adding more manpower to a late software project makes it later.  If you have a hard software problem, you want to identify it early and add the manpower as soon as possible, because you won't be able to speed things along later if you start running into trouble.

So, the thing that pundits and entrepreneurs alike are thinking about when they start talking about "scalability" is eliminating this extra risky phase of programming.  Investors (and entrepreneurs) don't mind investing some money in a "scaling solution", but they don't want to do it when they are in the hockey-stick part of the growth curve, making first impressions with their largest number of customers, and having system failures.  So we're all talking about what hot new piece of technology will solve this problem.

At a coarse granularity, this is a useful framing of the issue.  Technology investment and third-party tools really can help with scaling.  Google and Amazon obviously know what they're doing when it comes to world-spanning scale, and if they're building tools for developers, those tools are going to help.

As you start breaking it down into details, though, problems emerge.  Front and center is the problem that scalability is actually a property of a system, not an individual layer of that system, infrastructure or no.  Even with the best, sexiest, most automatic scaling layer, you can easily write code that just doesn't scale.  As a soon-to-be purveyor of "scalability solutions" myself, this is a scary thought: it's easy to imagine a horror story where a tiny, but hard to discover error in code written on top of our infrastructure makes it difficult to scale up.

That error need not be in the application code.  The scaling infrastructure itself could have some small issue which causes problems at higher scales.  After all, you can do extensive testing, code review, profiling and load analysis and still miss something that comes up only under extremely high load.

Does Twisted Scale?

Just about any answer to this question that you can imagine is valid, so I'll go through them all and explain what they might mean.

No.

Applications written using Twisted can very easily share lots of state, require local configuration, and do all kinds of things which make them unfriendly to distribution over multiple nodes.  Since there is no 'canonical' Twisted application (in fact, you might say that the usual Twisted application is simply an application unusual enough to be unsuited to a more traditional LAMP-type infrastructure), there's no particular documented model for writing a Twisted application that scales up smoothly.  None of the included services do anything special to provide automatic scaling.  There are no state-management abstractions inside Twisted.  If you talk to a database in a Twisted application, the normal way to do it is to use a normal DB-API connection.

When I discussed Rails above, I said that the reason it doesn't scale is that it's too easy, by default, to write applications that don't scale.  Therefore we must conclude that Twisted doesn't scale.

Yes.

Twisted is mainly an I/O library, and it uses abstract interfaces to define application code's interface with sockets and timers.  Twisted itself includes several different implementations of different strategies for multiplexing between these timers, including several which are platform-specific (kqueue, iocp), squeezing the maximum scale out of your deployment platform, even if it changes.

I said above that infrastructure is scalable if it lets you increase your scale without changing your code.  It would make sense to say that Twisted scales because it allows you to increase the number of connections that you're handling by changing your reactor without changing your code.

You could also say that Twisted is scalable because it is an I/O library, and communication between different nodes is almost the definition of scale these days.  Not only can you write scalable systems easily using Twisted's facilities, you can use Twisted as a tool to make other systems scale, as part of a bespoke caching daemon or database proxy.  Several Twisted users use it this way.

Maybe.

Being mostly an I/O library, Twisted itself is rarely the component most in need of optimization.  Being mostly an implementation of mechanisms rather than policies, Twisted gives you what you need to achieve scale but doesn't force, or even encourage you, to use it that way.

For the most part, it's not really interesting to talk about whether Twisted scales or not.  The field of possibilities of what you can do with Twisted is too wide open to allow that sort of classification.

What about Divmod? Does Mantissa scale?

Mantissa, lest you have not heard of it already, is the application server that we are developing at Divmod.  Mantissa is based on Twisted, among other components.  However, there's a big difference in what the answer to the "scaling" question means than it means to Twisted.

Twisted is very general and can be used in almost any type of application, from embedded devices to web services to thick clients to system management consoles.  It's almost as general as Python itself — with the notable exception that you can't use Twisted on Google App Engine because they don't allow sockets.  As part of being general, Twisted doesn't dictate much about the structure of your application, except that it use an event loop.  You can manage persistent state however you want, deal with configuration however you want.

Mantissa, on the other hand, is only for one type of application: multi-user, server-side applications, with web interfaces.  You might be able to apply it to something else but you would be fighting it every step of the way.  (Although if you wanted to use Mantissa's components for other types of applications, the more general parts decompose neatly into Nevow and Axiom.)  So the question of "does it scale" is a bit more interesting, since we can talk about a specific type of application rather than a near-infinite expanse of possibilities.  Does Mantissa scale to large numbers of users for these types of "web 2.0" applications?

Unfortunately, the fact that the question is simpler doesn't make the answer that much simpler, so here it is:

Almost...

Mantissa has a few key ingredients that you need to build a system that scales out. The biggest one is a partitioned data-model.  Each user has their own database, where their data is stored.

A very common "web 2.0" scaling plan — perhaps the most common — is to have an increasing number of web servers, all pointed at a single giant database with an increasingly ridiculous configuration — gigabytes of RAM, terabytes of disk, fronted by a bank of caching servers.  This works for a while.  For many sites, it's actually sufficient.  But it has a few problems.

For one thing, it has a single point of failure.  If your database server goes down, your service goes down.  Your database server isn't a lightweight "glue" component, either, so it's not a single point of failure you can quickly recover if it goes down.  Even worse, it means that even in the good scenario, where you can scale to capacity, your downtime is increased.  Each time you upgrade the database, the whole site goes down.  This problem gets compounded because a lot of sites are append-only databases with increasingly large volumes of data to migrate for each upgrade.

Another issue is that it increases load on your administrators, because they are responsible for an increasingly finicky and stressed database server.  This may actually be a good thing — administrators are not programmers, after all, and are therefore a more reliable and easier resource to throw money at.  Unfortunately there are (almost by definition) fewer things that admins can do to improve the system.  Because the admins can't actually solve the root problems that make their lives difficult, it's easier for them to get frustrated and leave for an environment where they won't be so stressed.

The reason websites choose this scaling model is that popular frameworks, or even non-frameworks like "let's just do it in PHP", make it easy to just use a single database, and to write all the application logic to depend on that single database as the point of communication between system components.  So the scaling plan is just working with the code that was written before anybody thought about scaling.

If you write an application with Mantissa today, it's easiest to toss the data into different databases depending on who it is for, so when you get to dealing with the "scaling" part of the problem, you can put those databases onto different computers, and avoid the single point of failure.  Moreover, when you write an application with Mantissa, you get "global" services like signup and login as part of the application server, so your application code can avoid the usual schema pitfalls (the "users" table, for example) which require a site to have a single large database.

There's only one problem with that plan.

... but not quite.

In my humble opinion, Mantissa offers some interesting ideas, but there are a few reasons you won't get scaling "for free" with Mantissa if you use it right now, today.

You may be noticing about now that I didn't mention any way to communicate between those partitioned chunks of data.  This is what I've been spending most of my last few weeks on.  I have been working on an implementation of an "eventually consistent" message-passing API for transferring messages between user databases in a Mantissa server.  You can see the progress of this work on the Divmod tracker, where the ticket is nearing the end of its year-long journey, and already in its (hopefully) final review.

I'm particularly excited about this feature, because it completes the Mantissa programming model to the point where you can really use it.  It's the part of the system that most directly impacts your own code, and thereby allows you to more completely dodge the bullet of modifying a bunch of your application's logic when you want to scale.  There might be some dark corners — for example, a scalable API for interacting with the authentication system — but those should only affect a small portion of a small number of applications.  Unfortunately communication between databases is not the only issue we have remaining.

There's more to the scaling problem than getting the application code to be the right shape.  The infrastructure itself needs to present a container that does the heavy lifting of scalability for the code that it contains.  For example, Mantissa needs a name server and a load balancer that will direct requests to the appropriate server for the given chunk of data.  It also needs a sign-up and account management interface that will make an informed decision about where to locate a new user's data, and be able to transparently migrate users between servers if load patterns change.  Finally there are enhanced features, like replicating read-only data to multiple hosts, for applications (for example, a blogging system) which have heavy concentrations of readers on small portions of data.

Finally there are problems of optimization.  We haven't had much time to optimize Mantissa or Athena, and already on small-scale systems we have seen performance issues, especially given the large number of requests that an Athena page can generate.  We need to make some time to implement the optimizations we know we need, and when we start scaling up our first really big system, I'm sure that we'll discover other areas that need tweaking.

Why Now?

I'm fond of saying that programming is like frisbee, and predictions more specific than "hey, watch this!" are dangerous.  So you might wonder why I'm talking about such a long-running future plan in such detail.  You might be wondering why I would think that you'd be interested in something that isn't finished yet.  Perhaps you think it's odd that I've described the challenges in such detail rather than being more positive about how awesome it is.

While I certainly don't want to publicly commit to a time-frame for any of this work to be finished, I do feel pretty comfortable saying that it's going to happen.  The design for scalability I've discussed here has been a core driving concern for Mantissa since its beginning, and it's something that's increasingly important to our business and our applications.

I'm being especially detailed about Mantissa's incompleteness because I want to make sure that potential users' expectations are set appropriately.  I don't want anyone coming to the Divmod stack after having heard me say vague things about "scalability", believing that they'll get an application that scales to the moon.

I do think that this is an exciting time for other developers to get involved though.  Mantissa is at a point where there are lots of bits of polish that need to be added to make it truly useful.  Starting to investigate it for your application now will give you the opportunity to provide feedback while it's still being formed, before a bunch of final decisions have been made and a lot of application code has been written to rely on them.

More Later...

I've got more to say about scaling, Twisted, and Mantissa, of course.  In particular I'd like to explain why I think Mantissa is an interesting scaling strategy and how it compares to the other ones.  At this rate, though, I'll only write one blog post this year!  I'm sure you hope as much as I do that the next one won't be so long...

Data In, Garbage Out

"The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information." —Alan Perlis
I switched to blogger recently expecting a more "professional" blogging experience.  I thought I'd be able to use a GUI editor and not concern myself with the details of the blog engine.  Apparently I was wrong.

Writing that last post, I had some pretty serious problems with getting the formatting to come out right.  Blogger does a couple of really terrible things:
  • When you switch between "Compose" and "Edit HTML" views, some amount of whitespace (although not all of it) is destroyed.
  • Even when posting using the ATOM API, the posted HTML is mangled in semi-arbitrary ways.
    • Properly-quoted "<" and ">" (i.e. "&lt;" and "&gt;") are quoted again.
    • Additional line-breaks are added.
    • &nbsp; is converted to white-space, and then
    • white space is collapsed.
This is one of the reasons that I'm such a stickler for treating data as structured data, and not making arbitrary heuristic guesses about it.  It's not just a matter of handling obscure, nerdy edge cases that average users won't run into.  In fact, it's the opposite.  Nerds (like myself) can figure out whether you're double-quoting your HTML entities or doing improper whitespace conversions.  But what does a regular Joe do when a "frustrated" smiley (">.<") gets converted into some incomprehensible soup of HTML?

I was reminded of this same issue when reading a page on the Habari wiki:

"If you are going to produce real XHTML in a tool usable by ordinary users, then you cannot do it by string concatenation. You need to assemble your content by serializing an XML DOM tree.

If you want to allow plugins, then your plugin API cannot allow plugin authors to stick arbitrary strings in the output. Rather, they should be allowed to add nodes to the DOM tree, or to manipulate existing ones."

Strangely enough, this page concludes that the important thing is not to build their next-generation blogging tool on top of a technology that lets them produce valid output (serializing DOM trees) but that the important thing is not producing valid output, but string concatenation.  They very clearly put an implementation technique above a good experience for users.

(This is your brain.  This is your brain on PHP.  Any questions?)

I don't want to pick on the Habari developers overmuch.  After all, the problem that inspired this post was with Blogger, and Wordpress has the same issue.  In fact, the Habari guys are mostly notable for having considered the implications of their decision so carefully; it's just a surprise to me that they walked all the way up to the right answer, looked at it, made sure it was right, and then decided to ignore it and keep on going.

Here's the surprise for the Habari developers, and basically everyone else who writes web applications that process HTML: it has nothing to do with XHTML.  It is a general principle of software development.  The only reason you notice when you're doing XHTML is that the browser isn't correcting for hundreds of minor mistakes, and rather than screwing up immediately it screws up one time in a thousand when a user managed to type a "<" or a "&".

You know what else you can't build with string concatenation?  AVIs.  PNGs.  SWFs.  Lots of data on the web is treated as structured, but only because it's too hard for the people who generally build web applications to generate it.  If you want to write a program that takes input processes it, and returns output, you need an intermediary structure to hold that data so that you can ensure its validity.

That's not to say that it's always a bad idea to have user interfaces that allow people to type in a syntax that they know and understand, like an "HTML" view.  Those interfaces might even be forgiving and correct for lots of errors.  Adding line-breaks so that people can type newlines in a mishmash of pseudo-HTML is okay, as long as you know where that ends and your actual structured content begins.  For example, if you include a WYSIWYG GUI editor, you should probably internally make sure that WYS really is WYG and you're not making the same kind of heuristic guesses about the data that your own tool generated as some stuff that a user with only a smattering of HTML knowledge typed in directly.

Keeping structured data structured is near and dear to my heart in large part because as systems get ultra large, the different pieces need to be able to talk to each other using clear and unambiguous formats.  These points of integration, the places where system A talks to system B (a blogging system talks to a web browser or a blogging client, for example) are absolutely the most critical pieces to test, test, and test again.  If you have a bug in your system, you can find it and fix it; but if you have a bug which only arises from an interaction between your system and two others, your test environment needs to be 3 times bigger, and the error is at least 3 times harder to catch.  But it gets worse.  If you're dealing with 4 systems, then your test environment is 4 times bigger - but the bug is 6 times harder to catch.  And so on.

Fred Brooks observed that adding more programmers to a project running behind schedule makes it later.  This is because of the additional channels of communication.  Now imagine that one of your developers has a curious speech defect: when he says "lasagna" he actually means "critical bug", and vice versa.  When he hears one, he understands it as the other.  Working alone, this is a harmless eccentricity, but as soon as you put other developers into the mix, strange effects start taking place.  He desperately tries to tell them about the delicious lasagna he had last night, and they can't understand why he's losing sleep over it.  Or, he is sanguine as his fellow engineers tell him about all the italian food they're eating, while the business is losing millions of dollars.

It's sort of like if every time he said "<" the other developers understood him to mean "&lt;".

If I ever have more than a few hours to work on it, eventually I'll deploy my own blogging platform and I'll know that it can handle HTML correctly.  Until then though, I've worked out a strategy for posting to blogger which seems to mostly preserve the formatting that I want to see.  I figure that other Python developers might be interested in this, since I frequently see posts to blogger which eat indentation.
  1. I use ScribeFire as my HTML editor.  It manages OK, except it doesn't include linebreaks, <p>s or <div>s to separate lines.  So, leave the "Convert Line Breaks" option on in your blog's settings.
  2. In "Settings -> Basic -> Global Settings", disable "show compose mode for all your blogs".  The compose view is destructive, and switching between it and "Edit HTML" will eat whitespace each time you do it; it also seems to sometimes eat bits of formatting when you publish even if it's just on the page.
  3. Edit a post in ScribeFire.  To save drafts, use the "save as note" functionality.  This doesn't publish it to be a blogger draft, but there's no way to get the data into blogger directly.  You can use the HTML tab as you normally would, to add tags that aren't supported (such as "<pre>").
  4. Switch to the HTML ("<A>") tab in scribefire.
    1. select all.
    2. copy.
  5. Click "New Post" in the blogger web UI.
    1. click in the text field.
    2. paste.
The presence of numerous properly-escaped HTML characters in this post should be an indication that it works.

Memeventory! Inventomeme? Uh, how about "inventory meme".

A long time ago — when these sorts of things were still a going concern commercially, if that dates it for you — I remember debating the "realism" of interactive fiction (at the time, "text adventures"). A key point in the discussion was the kleptocratic structure of the game. I wish I could remember this better. For example, I wish I remember who I was talking to. I do, however, remember the critical line, "no actual adult walks around with a bag full of that much crap". I was reminded of this by Paul Swartz describing the contents of his bag. A typical "adventurer" will have at least a dozen small trinkets, gewgaws and baubles on their person at any given time. These are critical work their way out of whatever jam they happen to be in at the moment. Memorably, sometimes they will also be carrying maple syrup, masking tape, other people's passports, and cat fur.


Granted, I don't have any syrup on me at the moment. I do have a few other bits of technological detritus on my person; more, I think than the average adventure-game protagonist. Now that I'm an actual adult, to win this at-least-a-decade-old argument with someone I can't even remember, I'm going to prove it — and you're going to help me do it!


Okay, okay. You came here to memeplex the blogosphere or whatever the kids are calling it these days, not to listen to me ramble about text adventures or my vindictive streak. So here's the idea. Get ready to head out of your house, office, or apartment — wherever you happen to be reading this post — and then take an inventory of your personal flotsam. Then, post it to your weblog in the style of an adventure game. In true Glyph style (i.e. more complicated than necessary), I used Imaginary to assist me with this task. In order to see the list below in all three of its glorious colors, you'll have to grab the code from launchpad as well as figure out how to set up its dependencies.


Without further ado, here it is:


You are carrying:
a digital watch
a bluetooth stereo headset
a grey messenger bag
the grey messenger bag contains:
a mobile phone
a black macbook
a macbook MagSafe(TM) power brick
a small orange microfiber cloth
a mini-DVI to VGA adapter
a beige graph-paper notebook
a 4' USB mini-B to A cable
a wallet
the wallet contains:
a borders rewards card
a boloco frequent burrito-er card
a $35 in cash
a CharlieCard (TM)
a Massachusetts driver's license
a fortune from a fortune cookie
a keychain
the keychain contains:
a samsung MicroSD-HC reader
the samsung MicroSD-HC reader contains:
a 1GB MicroSD card
a RFID building key
an apartment key
a car key
a mailbox key
an office outer door key
an office inner door key
a shaw's rewards card
Rather than the usual fixed number of additional participants, I'm going to say that you can tag one person for each container in your list. I've got 4 (wallet, messenger bag, keychain, microSD reader), so I will tag radix, tenth, exarkun, and washort.

You get bonus points for generating the list using a program, double-bonus points for using Imaginary and quadruple-bucky-points for resolving an Imaginary ticket while you're at it. This goes for everyone tagged by this meme, not just my tag-ees.


If the resulting score increase causes you to gain a level, I will notify you by e-mail.


(Apologies for the repeated edits. Blogger seems to have a problem with <pre> tags HTML everything.)

Structured Python Editor

If you're not already familiar with Subtext, you should probably watch this video to learn about it.

I'm not a big fan of the Subtext programming model. But it does convey one idea that I really, really like. Programs are structured data. This is a very powerful idea, and I think it's a pity that it hasn't really caught on.

Most programmers subscribe to the idea of model/view separation; this phrase has especially come back into vogue with the popularity of systems like Django and Rails. But programmers are only a fan of this as far as it comes to "end-user" applications. For our own tools, rigidly gluing the model to the view (and both the model and the view to the persistence format, execution model, and innumerable other details) is the order of the day. Indeed, Python's own popularity is due in large part to the relative beauty of its syntax.

One of the problems this causes is a language gap. Guido mentioned this in his Py3K talk: different programming communities are already choosing identifiers based on their natural languages. The conflation of real-names and binding-names also creates a more subtle problem in Python: when you want to deprecate a name, let's say, twisted.web.server, you have to choose another name — probably one which isn't as good. If the binding name were, as in Subtext, an internal identifier rather than the user interface accessible to everyone, this would be an easier thing to do. For that matter, a large part of the Py3K effort itself is a change to Python's user interface; if Python were an interactive program with a separated model and view, it would be much easier to change this without changing everyone's code at the same time.

IDEs like PyDev for Eclipse and Wing IDE don't really address the problem of "program as bag of bytes". They provide tools, it's true, but those tools still treat a program as semi-structured information. One of the things you do most frequently in an IDE like this is type some code, which at least temporarily puts your program into a totally invalid state. As you're typing "def ", your module is syntactically invalid. Once you've finished adding arguments, and a docstring or method body it's valid again, but only until you make your next change. If you're using a tool to edit something other than a program, like, say, Inkscape, as you move between different states in your drawing (add a line, change a gradient, resize a shape) each one is a valid SVG document if you were to save it. This is one of the reasons that I don't really use IDEs; despite their features, the core of the experience is still hammering away on a bunch of text files, and for that, it is very hard to beat Emacs.

IDEs aren't the only tools which could benefit from a truly structured interpretation of code. Version control systems, for example, could benefit immensely by having higher level operations. How often do you really want to know "who changed this line of code last"? I don't know about you, but personally, I want to ask questions like "when was this method defined" and "was it ever moved from another module". These questions are difficult or impossible to ask of modern version control systems (even the really good ones).

Another issue with programs being effectively unstructured is that they're not discoverable. If you want to draw a line in Inkscape, you don't need to look up the SVG syntax for drawing a line; you can just hunt around for the "line" button. This is especially important for students, who frequently forget basic things like "how do you define a method" or "what does it mean when a function is outside a class" while learning. Squeak addresses this problem, somewhat: there's still a lot of text floating around, but your program itself is a bunch of objects you can look at.

One of my perpetual second week projects is to make an IDE that understands Python as the serialization format of a graph of objects — modules, classes, and functions — rather than text. This could work on existing Python programs, and it wouldn't need to introduce any wacky new programming paradigms in order to do it: simply treat Python as a runtime and a serialization format, and parse/serialize Python code as if it were any other type of data, like Inkscape does for SVG. Since I'm never realistically going to do it, does anyone else want to? Has somebody else done it already?

The App Engine Of Your Internet...?

I guess Google really does use Divmod as their source for new ideas!

First, as we were implementing our email system, they launched gmail. Then while we were working on VoIP they acquired Grand Central. Now that we're focusing on infrastructure... they've released their infrastructure!

(If you are familiar with Axiom, the appengine database model example should be good for a laugh.)

Of course, I'm wondering: should Divmod, or me by proxy, be worried about this? I think I've settled on "no", for a few reasons:
  1. Google's release of appengine is an indication that this is a hot group of ideas to be working on right now. It sets a precedent for our open source offering to be idiosyncratic. When describing Mantissa's scaling model, I've often heard the objection "But with EC2 I can run whatever code I want!" While that objection is somewhat inaccurate and I can explain with some effort that it's not true, now I have the much snappier retort: "but if you want to use Google's stuff you have to write to a specific API too".
  2. You can't get all the code to appengine, at least for now, so there's a big category of applications that can't use it. We can keep working to address it — and those were really the applications we were already focused on anyway.
  3. appengine doesn't include Athena. It doesn't include Vertex. And it doesn't include Imaginary. There's plenty of fun stuff we're working on outside the realm of simply deploying a web app.
All in all, I'm thrilled that AppEngine is in Python. I hope that we'll be able to get Twisted into the mix at some point, but there's always the possibility that we'll need to integrate with anything a major player releases, and I'm relieved that this (very) major player has decided to go with something that will be easy for us to talk to.