Resolving diverged Bazaar branches on the go with 'dead heads'.

If you're like me, occasionally you grab the latest version of a bzr branch onto your laptop before you're going somewhere without network access. But, as you're about to leave, you glance over at your laptop screen, and you see the dreaded:
bzr: ERROR: These branches have diverged. Use the missing command to see how.
Use the merge command to reconcile them.
but you don't have time to do a merge, and wait for the (reliably agonizingly slow) network round trip to negotiate with the server about what the latest revision is - the train's about to leave, or you're late for your flight, or the cafe is closing and you need to shut your laptop right now.  Sadness!  You continue to work on a diverged branch and merge later.  Which is a shame, because mechanically dealing with merge conflicts or just making sure the tests still pass after what looks like a trivial merge is exactly the sort of thing which is convenient to do when you're stuck waiting at a network-access-free bus stop.
As it turns out, Bazaar has actually already done all the hard work necessary for you to just go ahead and do that merge when you get to your potentially non-networked destination.  The diverged revisions have already been pulled into your branch and are just sitting there, waiting to be merged, but you can't see them.  The 'bzrtools' plugin provides the 'heads' command, which you can use to reveal the previously invisible revision.  You can then just 'merge .' instead of merging from your usual pull location, as long as you specify the appropriate revision.
To demonstrate, here's a transcript of a sample session which simulates this common problem:
First, set up a branch:
you@computer:~$ mkdir tmp
you@computer:~$ cd tmp
you@computer:~/tmp$ mkdir a
you@computer:~/tmp$ cd a
you@computer:~/tmp/a$ bzr init
Created a standalone tree (format: 2a)
you@computer:~/tmp/a$ touch initial.txt
you@computer:~/tmp/a$ bzr add
adding initial.txt
you@computer:~/tmp/a$ bzr ci -m "inital revision"
Committing to: /Domicile/glyph/tmp/a/
added initial.txt
Committed revision 1.
We'll call 'a' the 'server' branch. Next, let's make a branch that represents the 'on the go' branch, your local working copy:
you@computer:~/tmp/a$ cd ..
you@computer:~/tmp$ bzr get a b
Branched 1 revision(s).
Now, it's time to diverge. Let's give each branch its own revision.
you@computer:~/tmp$ cd a
you@computer:~/tmp/a$ touch a.txt
you@computer:~/tmp/a$ bzr add
badding a.txt
zyou@computer:~/tmp/a$ bzr ci -m 'revision from a'
Committing to: /Domicile/glyph/tmp/a/
added a.txt
Committed revision 2.
you@computer:~/tmp/a$ cd ../b/
you@computer:~/tmp/b$ touch b.txt
you@computer:~/tmp/b$ bzr add
adding b.txt
you@computer:~/tmp/b$ bzr ci -m 'revision from b'
Committing to: /Domicile/glyph/tmp/b/
added b.txt
Committed revision 2.
Now, it's time to get on that sad, wifi-free train. Let's make sure we're up to date with 'a' first...
you@computer:~/tmp/b$ bzr pull ../a
bzr: ERROR: These branches have diverged. Use the missing command to see how.
Use the merge command to reconcile them.
[Error: 3]
Oh no! But, here comes 'bzr heads' to the rescue:
you@computer:~/tmp/b$ bzr heads --dead
HEAD: revision-id: <strong>you@computer-123456</strong> (dead)
committer: You <you@computer>
branch nick: a
timestamp: now-ish
message:
revision from a
Now you know what the revision ID of the already-pulled-but-not-visible revision is - the tip of 'a', in other words. Now you just need to ask 'b' to merge it:
you@computer:~/tmp/b$ bzr merge . -r <strong>you@computer-123456</strong>
+N a.txt
All changes applied successfully.
you@computer:~/tmp/b$ bzr ci -m 'merge from a'
Committing to: /Domicile/glyph/tmp/b/
added a.txt
Committed revision 3.
Done! And, as you can see when you get back to your cozy 10gigE fiber connection at home, or whatever you happen to have, you see that the revision you've merged lines up neatly with 'a':
you@computer:~/tmp/b$ bzr pull ../a
No revisions to pull.
you@computer:~/tmp/b$
Et voila. I hope this saves somebody some time when dealing with failed pulls.
For those of you who may be curious about the use-case, if you don't have it: I rarely encounter this with actual codebases I work on, as I tend to have a local trunk mirror, and features are neatly segregated into branches. It comes up more frequently in my personal configuration-files repository, where I make little changes to my desktop, little changes to my laptop, and then want to get out the door quickly with the latest merged copy. I was so happy when #bzr on freenode (thanks, spiv!) solved this problem for me that I just had to share.

Some Common Onomatological Errors

The open-source event-driven networking engine that I work on is called "Twisted".  If you're uncomfortable using something that sounds like an adjective in a place where a noun should go, the following noun phrases are equivalent:
  1. the Twisted project
  2. the Twisted engine
  3. the Twisted networking engine
  4. the Twisted framework
The unofficial group (of which I am a member) which works on that software is known as "Twisted Matrix Laboratories", sometimes shortened to "Twisted Matrix Labs" or "TMLabs".

I can understand that there is some confusion around this stuff, since these words often appear in close proximity, but to my knowledge there is nothing called "Python Twisted", "Twisted Python", or "Twisted Matrix".  There's "python-twisted", which is the package name that some operating systems use to package Twisted.  There is also "twisted.python", which is a python  package within Twisted itself.  Finally there is "twisted-python@twistedmatrix.com", which is the mailing list for discussing Twisted stuff in the Python programming language.  (This discussion list is so named to distinguish it from the possibility of not-quite-hypothetical discussion of Twisted implemented in other languages, although no other implementations are currently actively maintained.)

I just thought you'd all like to know that.  That is all.  (For now, anyway.)

Learn Twisted

Jean-Paul Calderone continues his excellent "Twisted Web In 60 Seconds" tutorial series.  If you haven't checked it out yet, you should!

Do you want WiFi to work at your conference?

I've been pretty busy for the last couple of weeks, so I've just had an opportunity to catch up with blog posts that have been piling up.  In particular I noticed this one: The “WiFi At Conferences” Problem, by Joel Spolsky.

Joel has a lot of what look like good recommendations.  However, I can provide a much-abridged list.

Some years, WiFi access at PyCon US has been provided by the venue, or by a contractor whose name I mercifully do not know.  Those years, it has not worked.  Some years, it has been provided, or at least managed, by tummy.com.  Those years, it has worked.  They are probably much more critical of their own efforts than I am, as you can see in this thorough write-up that they did of PyCon's 2008 WiFi situation.

My two-step plan for you if you want your conference to have working WiFi access at your conference is:
  1. e-mail somebody at tummy.com, telling them that you want a working wireless network, and
  2. give them whatever they ask for.
If you do these things, then when people open their laptops at your conference, their networks will work.

Hobgoblin History

I like Terry Jones; I think FluidDB has a lot of potential.  But, sometimes when he's talking about it, he gets a little carried away and forgets that the rest of us don't live in his future yet.  In his latest missive on the official FluidDB blog, "Digital Hobgoblins", he describes some of the problems that FluidDB sets out to solve.

The problem is, I already have solutions for all of these problems, and I don't quite understand why they don't (or shouldn't) work for me.  (Since he organizes the post in terms of problems that existing systems have, I'm going to take the liberty of re-labeling these in terms of the problems that he seems to be describing rather than the lead text he used.  Please post a comment if you think my labeling is wrong.)

In existing systems, Terry says:

"Things must be named, and have one name."  Specifically, Terry calls out file systems.  Except... file systems have lots of ways of introducing multiple names for the same thing.  Symbolic links.  Hard links, if you really want to allow for ambiguity.  If you want to track that ambiguity, Windows "shortcuts" and MacOS "aliases" can do that.  Overlay mounts, loopback mounts and chroot execution allow for semi-arbitrary renaming.  Lots of other systems support this, too.  Database systems have a specific provision for multiple names: the many-to-one relation.  Any programming language with pass-by-reference data structures allows for some level of multiple-naming.  In fact, there's a whole discipline for allowing things to have lots of different names: indexing.  Anywhere you have a full-text index or an object where multiple attributes are indexed in some kind of database, you've got objects with more than one name.

"You have to be consistent and unambiguous."  As I mentioned on the first point, there are lots of ways to be slightly ambiguous at a human level.  You can refer to the same thing by different names, or, with mutable binding, you can refer to the different things with the same name.  In some circumstances, you must be precise, but that's because fundamentally, algorithmic thinking requries a certain level of precision, not because of any specific problem with computers.  In fact, there is a word for inconsistency and ambiguity in programming languages: polymorphism.  Any time you invoke an interface rather than a concrete implementation (which is to say any time you do anything in a dynamic language like Python) you are being ambiguous and potentially inconsistent in your program's behavior.

"You only get one way to organize stuff."  This is a pretty weak point, though, given that Terry himself immediately turns around and notices that tagging and other multiply-indexed database systems are becoming popular.  So he gives us two examples of exceptions, but no examples of the rule.  I'm not sure what I could add to that.

"Programmers are obsessed with "meaning"."  On this one, I'm going to agree, except I don't think it's a problem.  In the computational world, we are obsessed with the meaning of data, because if you get the meaning of the inputs wrong, then the meaning of the outputs is wrong too.  For example: if you have a number that represents the total liabilities that your company has accumulated, it's pretty important that you don't ever treat that as your total profit.  At a deeper level, if you have a sequence of bits that represents a floating-point number, it's important to know about its intended meaning, and not treat it as a string of characters, unless what you really want is a string.  "@H=N" is not as useful a concept as "3.1287417411804199" if you are trying to add it to something.  For what it's worth, I have my own, similar take on how we should treat computational objects that have multiple meanings: Imaginary. Even systems like Imaginary and FluidDB depend on a very rigid definition of some simpler concepts, like numbers consistently being numbers and words consistently being words.  In my view, even if we treat the book itself as multifaceted, it's important to know what the data representing the "readable object" part of a book is really "about", and make sure it stays distinct from the data representing the "paperweight" part of the book.  To be fair, FluidDB appears to do this itself — and this terminology is my least-favorite part of FluidDB — by having single-purpose, permission-controlled "objects" just like every other system, but calling them "tags", and re-using the word "objects" to refer instead to what others might call a "UUID" or "central index".  In Imaginary, the system is similar; although the centrality of the FluidDB "object" (in Imaginary's case, the "Thing") is less stark; using FluidDB's terminology, in Imaginary, a "tag" can have a "tag" of its own; in fact, there's nothing but tags ("Items") anywhere.

"Metadata is separated from the data it describes."  This may be true in some systems, but the web is probably the system with the most data in it anywhere, and in that system, metadata is always available as part of the request and the response.  You can put in any headers you want in the response, and there are lots of pieces of metadata (like content-type) which are almost always found along with the data.  In my opinion, the problem is more that we don't have enough of the previous problem.  Web developers haven't been obsessed enough with meaning: there aren't enough useful conventions around the HTTP request/response metadata, and so it's hard to bundle more metadata in with your response and have it faithfully propagated elsewhere.  We don't know what arbitrary headers might mean, because we don't have any way of expressing a schema for them.

Terry says he's going to write more about these problems, and the solutions that FluidDB provides for them.  I'm looking forward to it.  As part of that, I'd really like to see a clear description of how these problems affect me, or someone I know, either as a programmer or as a user.  What do I, or should I, really want to do with some application right now that these five problems are preventing me from doing?

The reason I felt compelled to write about this is that history — and particularly the history of websites like freshmeat and sourceforge — is littered with the corpses of projects which promised to fundamentally change the way we represent data.  A common problem with these projects is that they have expansive denunciations of current techniques to represent data, or manage persistence, and claim to provide an advance so significant that they will displace all current applications.  What most of the people working on these projects don't realize is that the current techniques for representing data have a history, and there are good reasons for their limitations.  Granted, not all of those reasons are currently relevant, and many are examples of path dependence, but it's still important to understand the reasons in order to escape the problems.

In FluidDB's case, I think that the problem isn't so much that Terry doesn't have the historical perspective, but that he assumes that we all do.  And that we can all make the cognitive leap to see why FluidDB is necessary.  But if I can't do it, I have to assume there are at least a few other programmers who aren't getting the message either.