Techniques for Actually Distributed Development with Git

It's all very wibbly wobbly and versiony wersiony.

The Setup

I have a lot of computers. Sometimes I'll start a project on one computer and get interrupted, then later find myself wanting to work on that same project, right where I left off, on another computer. Sometimes I'm not ready to publish my work, and I might still want to rebase it a few times (because if you're not arbitrarily rewriting history all the time you're not really using Git) so I don't want to push to @{upstream} (which is how you say "upstream" in Git).

I would like to be able to use Git to synchronize this work in progress between multiple computers. I would like to be able to easily automate this synchronization so that I don’t have to remember any special steps to sync up; one of the most focused times that I have available to get coding and writing work done is when I’m disconnected from the Internet, on a cross-country train or a plane trip, or sitting near a beach, for 5-10 hours. It is very frustrating to realize only once I’m settled in and unable to fetch the code, that I don’t actually have it on my laptop because I was last doing something on my desktop. I would particularly like to be able to that offline time to resolve tricky merge conflicts; if there are two versions of a feature, I want to have them both available locally while I'm disconnected.

Completely Fake, Made-Up History That Is A Lie

As everyone knows, Git is a centralized version control system created by the popular website GitHub as a command-line client for its "forking" HTML API. Alternate central Git authorities have been created by other startups following in the wave following GitHub's success, such as BitBucket and GitLab.

It may surprise many younger developers to know that when GitHub first created Git, it was originally intended to be a distributed version control system, where it was possible to share code with no particular central authority!

Although the feature has been carefully hidden from the casual user, with a bit of trickery you can enable re-enable it!

Technique 0: Understanding What's Going On

It's a bit confusing to have to actually set up multiple computers to test these things, so one useful thing to understand is that, to Git, the place you can fetch revisions from and push revisions to is a repository. Normally these are identified by URLs which identify hosts on the Internet, but you can also just indicate a path name on your computer. So for example, we can simulate a "network" of three computers with three clones of a repository like this:

1
2
3
4
5
6
7
8
$ mkdir tmp
$ cd tmp/
$ mkdir a b c
$ for repo in a b c; do (cd $repo; git init); done
Initialized empty Git repository in .../tmp/a/.git/
Initialized empty Git repository in .../tmp/b/.git/
Initialized empty Git repository in .../tmp/c/.git/
$ 

This creates three separate repositories. But since they're not clones of each other, none of them have any remotes, and none of them can push or pull from each other. So how do we teach them about each other?

1
2
3
4
5
6
7
8
9
$ cd a
$ git remote add b ../b
$ git remote add c ../c
$ cd ../b
$ git remote add a ../a
$ git remote add c ../c
$ cd ../c
$ git remote add a ../a
$ git remote add b ../b

Now, you can go into a and type git fetch --all and it will fetch from b and c, and similarly for git fetch --all in b and c.

To turn this into a practical multiple-machine scenario, rather than specifying a path like ../b, you would specify an SSH URL as your remote URL, and turn SSH on on each of your machines ("Remote Login" in the Sharing preference pane on the mac, if you aren't familiar with doing that on a mac).

So, for example, if you have a home desktop tweedledee and a work laptop tweedledum, you can do something like this:

1
2
3
4
5
6
7
8
tweedledee:~ neo$ mkdir foo; cd foo
tweedledee:foo neo$ git init .
# ...
tweedledum:~ m_anderson$ mkdir bar; cd bar
tweedledum:bar m_anderson$ git init .
tweedledum:bar m_anderson$ git remote add tweedledee neo@tweedledee.local:foo
# ...
tweedledee:foo neo$ git remote add tweedledum m_anderson$@tweedledum.local:foo

I don't know the names of the hosts on your network. So, in order to make it possible for you to follow along exactly, I'll use the repositories that I set up above, with path-based remotes, in the following examples.

Technique 1 (Simple): Always Only Fetch, Then Merge

Git repositories are pretty boring without any commits, so let's create a commit:

1
2
3
4
5
6
7
$ cd ../a
$ echo 'some data' > data.txt
$ git add data.txt
$ git ci -m "data"
[master (root-commit) 8dc3db4] data
 1 file changed, 1 insertion(+)
 create mode 100644 data.txt

Now on our "computers" b and c, we can easily retrieve this commit:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ cd ../b/
$ git fetch --all
Fetching a
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../a
 * [new branch]      master     -> a/master
Fetching c
$ git merge a/master
$ ls
data.txt
$ cd ../c
$ ls
$ git fetch --all
Fetching a
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../a
 * [new branch]      master     -> a/master
Fetching b
From ../b
 * [new branch]      master     -> b/master
$ git merge b/master
$ ls
data.txt

If we make a change on b, we can easily pull it into a as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
$ cd ../b
$ echo 'more data' > data.txt 
$ git commit data.txt -m "more data"
[master f3d4165] more data
 1 file changed, 1 insertion(+), 1 deletion(-)
$ cd ../a
$ git fetch --all
Fetching b
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../b
 * [new branch]      master     -> b/master
Fetching c
From ../c
 * [new branch]      master     -> c/master
$ git merge b/master
Updating 8dc3db4..f3d4165
Fast-forward
 data.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

This technique is quite workable, except for one minor problem.

The Minor Problem

Let's say you're sitting on your laptop and your battery is about to die. You want to push some changes from your laptop to your desktop. Your SSH key, however, is plugged in to your laptop. So you just figure you'll push from your laptop to your desktop. Unfortunately, if you try, you'll see something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ cd ../a/
$ echo 'even more data' >> data.txt 
$ git commit data.txt -m "even more data"
[master a9f3d89] even more data
 1 file changed, 1 insertion(+)
$ git push b master
Counting objects: 7, done.
Writing objects: 100% (3/3), 260 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.
remote: error: 
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error: 
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
To ../b
 ! [remote rejected] master -> master (branch is currently checked out)
error: failed to push some refs to '../b'

While you're reading this, you fail your will save and become too bored to look at a computer any more.

Too late, your battery is dead! Hopefully you didn't lose any work.

In other words: sometimes it's nice to be able to push changes as well.

Technique 1.1: The Manual Workaround

The problem that you're facing here is that b has its master branch checked out, and is therefore rejecting changes to that branch. Your commits have actually all been "uploaded" to b, and are present in that repository, but there is no branch pointing to them. Doing either of those configuration things that Git warns you about in order to force it to allow it is a bad idea, though; if your working tree and your index and your your commits don't agree with each other, you're just asking for trouble. Git is confusing enough as it is.

In order work around this, you can just push your changes in master on a to a diferent branch on b, and then merge it later, like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ git push b master:master_from_a
Counting objects: 7, done.
Writing objects: 100% (3/3), 260 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ../b
 * [new branch]      master -> master_from_a
$ cd ../b
$ git merge master_from_a 
Updating f3d4165..a9f3d89
Fast-forward
 data.txt | 1 +
 1 file changed, 1 insertion(+)
$ 

This works just fine, you just always have to manually remember which branches you want to push, and where, and where you're pushing from.

Technique 2: Push To Reverse Pull To Remote Remotes

The astute reader will have noticed at this point that git already has a way of tracking "other places that changes came from", they're called remotes! And in fact b already has a remote called a pointing at ../a. Once you're sitting in front of your b computer again, wouldn't you rather just have those changes already in the a remote, instead of in some branch you have to specifically look for?

What if you could just push your branches from a into that remote? Well, friend, I'm here today to tell you you can.

First, head back over to a...

1
$ cd ../a

And now, all you need is this entirely straightforward and obvious command:

1
$ git config remote.b.push '+refs/heads/*:refs/remotes/a/*'

and now, when you git push b from a, you will push those branches into b's "a" remote, as if you had done git fetch a while in b.

1
2
3
4
$ git push b
Total 0 (delta 0), reused 0 (delta 0)
To ../b
   8dc3db4..a9f3d89  master -> a/master

So, if we make more changes:

1
2
3
4
$ echo 'YET MORE data' >> data.txt
$ git commit data.txt -m "You get the idea."
[master c641a41] You get the idea.
 1 file changed, 1 insertion(+)

we can push them to b...

1
2
3
4
5
6
$ git push b
Counting objects: 5, done.
Writing objects: 100% (3/3), 272 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ../b
   a9f3d89..c641a41  master -> a/master

and when we return to b...

1
$ cd ../b/

there's nothing to fetch, it's all been pre-fetched already, so

1
$ git fetch a

produces no output.

But there is some stuff to merge, so if we took b on a plane with us:

1
2
3
4
5
$ git merge a/master
Updating a9f3d89..c641a41
Fast-forward
 data.txt | 1 +
 1 file changed, 1 insertion(+)

we can merge those changes in whenever we please!

More importantly, unlike the manual-syncing solution, this allows us to push multiple branches on a to b without worrying about conflicts, since the a remote on b will always only be updated to reflect the present state of a and should therefore never have conflicts (and if it does, it's because you rewrote history and you should be able to force push with no particular repercussions).

Generalizing

Here's a shell function which takes 2 parameters, "here" and "there". "here" is the name of the current repository - meaning, the name of the remote in the other repository that refers to this one - and "there" is the name of the remote which refers to another repository.

1
2
3
4
5
6
function remoteremote () {
    local here="$1"; shift;
    local there="$1"; shift;

    git config "remote.$there.push" "+refs/heads/*:refs/remotes/$here/*";
}

In the above example, we could have used this shell function like so:

1
2
$ cd ../a
$ remoteremote a b

I now use this all the time when I check out a repository on multiple machines for the first time; I can then always easily push my code to whatever machine I’m going to be using next.

I really hope at least one person out there has equally bizarre usage patterns of version control systems and finds this post useful. Let me know!

Acknowledgements

Thanks very much to Tom Prince for the information that lead to this post being worth sharing, and Jenn Schiffer for teaching me that it is OK to write jokes sometimes, except about javascript which is very serious and should not be made fun of ever.

Harried

In a world…

Some things, one writes because one has something one wants the world to know.

Some things, one writes because, despite one’s better judgment, one can’t help it.

This is one of the latter type.

I have a recurring dream. I mean this in the most literal sense possible. I do not mean that I have an aspiration, or a hope. This is literally a dream that I’ve had, while I was asleep, multiple times. I’m not really sure why, since the apparent stimulus that this dream is responding to is many, many years in my past. Nevertheless, I have had it 3 or 4 times now.

It’s a movie trailer.

There’s a man driving a pickup truck along a highway near open plains.

Intercut are images of him driving a tractor over an apparently endless field, shot from a fixed 3rd-person perspective behind the tractor, driving towards the horizon. Close up of his face. He looks bored, resigned.

Image Credit: Ben Smith

He sees a falling star streaking down from the sky. As he watches, there’s a roaring noise, and a boom, and it crashes down into an adjacent field.

Cut to him cautiously exploring the crash site. There’s an object in the middle of the crash site, still glowing red hot. It’s a strange shape; conical, about a meter long, but with an indentation on its side and various protrusions, one of which looks like a human-sized handle.

Another cut, now he’s inside the truck again, and we can see that the object is under a tarp in the back of the truck.

Now he’s in his garage. He’s wearing a red work shirt and blue jeans. The object is hoisted up on a lift, and he’s walking around it, examining it. A prolonged sequence of him trying to cut it open with a plasma arc cutter, then wiping off the smudge to reveal it’s still shiny.

Finally, he lowers the lift, but as it goes towards the ground, the object beeps and stops dropping, at about waist height. It slowly rotates, swiveling to point towards the open garage door.

He walks over to the object and stands next to it, his hip near the indentation in its side, and he reaches out to grab the handle. The machine makes a powering-on whine and more protrusions start folding out of its sides; he yanks his hand back like he’s been burned, and it folds back into itself. More cautiously this time, he reaches out to grab the handle.

The machine’s tip opens like a claw, revealing a central spike surrounded by a bunch of curved radiating antennae. Sparks start shooting between the antennae and the spike, charging it. A view screen unfolds, initially displaying a targeting reticle pointed out of his garage door, then switching to text in an alien language. A grainy voice comes out of speakers in the side of the machine, initially speaking the alien language. Then the language on the screen switches to Japanese, then to French, then to German, and finally to English. In large, flashing, block letters, it reads, and the grainy voice from its speakers says, simply:

GET READY

Before the man has a chance to the react, the indentation in the side of the machine extrudes several parts of a metallic harness which surround his body. A thruster on the back part of the machine starts glowing a dull red color, and then the machine (with the man in tow) shoots out of his garage.

Initially, he is running furiously to keep up. He trips over a rock in the road, but the harness allows him to sort of gyroscopically spin while still attached to the machine, and then right himself again. Eventually he pulls back on the handle and he’s flying through the air, and then the protrusion on the nose of the machine shoots a beam in front of him and opens a portal, which he then flies through.

Space Harrier

The machine says “Welcome to the fantasy zone.” and then “Get ready!” again, and then we see a montage of him flying over alien landscapes, primarily plains checkered in garish primary colors, shooting giant plasma bullets out of the nose of the weapon at trees, tiny fighter jets, rocks, and basically anything that gets in his way. There are scenes of him talking to giant one-eyed mammoths, and a teaser-trailer shot of giant, horrible looking two-headed dragons.

COMING THIS SUMMER: SPACE HARRIER, THE MOTION PICTURE.

Then I wake up.

Sega: please, please get someone to make this movie. The dream is only ever the trailer, and I really want to see how it ends.

The Report Of Our Death

The report of Twisted’s death was an exaggeration.

tulips

Lots of folks are very excited about the Tulip project, recently released as the asyncio standard library module in Python 3.4.

This module is potentially exciting for two reasons:

  1. It provides an abstract, language-blessed, standard interface for event-driven network I/O. This means that instead of every event-loop library out there having to implement everything in terms of every other event-loop library’s ideas of how things work, each library can simply implement adapters from and to those standard interfaces. These interfaces substantially resemble the abstract interfaces that Twisted has provided for a long time, so it will be relatively easy for us to adapt to them.
  2. It provides a new, high-level coroutine scheduler, providing a slightly cleaned-up syntax (return works!) and more efficient implementation (no need for a manual trampoline, it’s built straight into the language runtime via yield from) of something like inlineCallbacks.

However, in their understandable enthusiasm, some observers of Tulip’s progress – links withheld to protect the guilty – have been forecasting Twisted’s inevitable death, or at least its inevitable consignment to the dustbin of “legacy code”.

At first I thought that this was just sour grapes from people who disliked Twisted for some reason or other, but then I started hearing this as a concern from users who had enjoyed using Twisted.

So let me reassure you that the idea that Twisted is going away is totally wrong. I’ll explain how.

The logic that leads to this belief seems to go like this:

Twisted is an async I/O thing, asyncio is an async I/O thing. Therefore they are the same kind of thing. I only need one kind of thing in each category of thing. Therefore I only need one of them, and the “standard” one is probably the better one to depend on. So I guess nobody will need Twisted any more!

The problem with this reasoning is that “an async I/O thing” is about as specific as “software that runs on a computer”. After all, Firefox is also “an async I/O thing” but I don’t think that anyone is forecasting the death of web browsers with the release of Python 3.4.

Which Is Better: OpenOffice or Linux?

Let’s begin with the most enduring reason that Twisted is not going anywhere any time soon. asyncio is an implementation of a transport layer and an event-loop API; it can move bytes into and out of your application, it can schedule timed calls to happen at some point in the future, and it can start and stop. It’s also an implementation of a coroutine scheduler; it can interleave apparently sequential logic with explicit yield points. There are also some experimental third-party extension modules available, including an event-driven HTTP server and client, and the community keeps building more stuff.

In other words, asyncio is a kernel for event-driven programming, with some applications starting to be developed.

Twisted is also an implementation of a transport layer and an event-loop API. It’s also got a coroutine scheduler, in the form of inlineCallbacks.

Twisted is also a production-quality event-driven HTTP server and client including its own event-driven templating engine, with third-party HTTP addons including server microframeworks, high-level client tools, API construction kits, robust, two-way browser communication, and automation for usually complex advanced security features.

Twisted is also an SSH client, both an API and a command-line replacement for OpenSSH. It’s also an SSH server which I have heard some people think is OK to use in production. Again, the SSH server is both an API and again as a daemon replacement for OpenSSH. (I'd say "drop-in replacement" except that neither the client nor server can parse OpenSSH configuration files. Yet.)

Twisted also has a native, symmetric event-driven message-passing protocol designed to be easy to implement in other languages and environments, making it incredibly easy to develop a custom protocol to propagate real-time events through multiple components; clients, servers, embedded devices, even web browsers.

Twisted is also a chat server you can deploy with one shell command. Twisted is also a construction kit for IRC bots. Twisted is also an XMPP client and server library. Twisted is also a DNS server and event-driven DNS client. Twisted is also a multi-protocol integrated authentication API. Twisted is also a pluggable system for creating transports to allow for third-party transport components to do things like allow you to run as a TOR hidden service with a single command-line switch and no modifications to your code.

Twisted is also a system for processing streams of geolocation data including from real hardware devices, via serial port support.

Twisted also natively implements GUI integration support for the Mac, Windows, and Linux.

I could go on.

If I were to include what third-party modules are available as well, I could go on at some considerable length.

The point is, while Twisted also has an existing kernel – largely compatible, at a conceptual level at least, with the way asyncio was designed – it also has a huge suite of functionality, both libraries and applications. Twisted is OpenOffice to asyncio’s Linux.

Of course this metaphor isn’t perfect. Of course the nascent asyncio community will come to supplant some of these things with other third-party tools. Of course there will be some duplication and some competing projects. That’s normal, and even healthy. But this is not a winner-take all existential Malthusian competition. Being able to leverage the existing functionality within the Twisted and Tornado ecosystems – and vice versa, allowing those ecosystems to leverage new things written with asyncio – was not an accident, it was an explicit, documented design goal of asyncio.

Now And Later

Python 3 is the future of Python.

While Twisted still has a ways to go to finish porting to Python 3, you will be able to pip install the portions that already work as of the upcoming 14.0 release (which should be out any day now). So contrary to some incorrect impressions I’ve heard, the Twisted team is working to support that future.

(One of the odder things that I’ve heard people say about asyncio is that now that Python 3 has asyncio, porting Twisted is now unnecessary. I'm not sure that Twisted is necessary per se – our planet has been orbiting that cursed orb for over 4 billion years without Twisted’s help – but if you want to create an XMPP to IMAP gateway in Python it's not clear to me how having just asyncio is going to help.)

However, while Python 3 may be the future of Python, right now it is sadly just the future, and not the present.

If you want to use asyncio today, that means foregoing the significant performance benefits of pypy. (Even the beta releases of pypy3, which still routinely segfault for me, only support the language version 3.2, so no “yield from” syntax.) It means cutting yourself off from a significant (albeit gradually diminishing) subset of available Python libraries.

You could use the Python 2 backport of Tulip, Trollius, but since idiomatic Tulip code relies heavily on the new yield from syntax, it’s possible to write code that works on both, but not idiomatically. Trollius actually works on Python 3 as well, but then you miss out on one of the real marquee features of Tulip.

Also, while Twisted has a strict compatibility policy, asyncio is still marked as having provisional status, meaning that unlike the rest of the standard library, its API may change incompatibly in the next version of Python. While it’s unlikely that this will mean major changes for asyncio, since Python 3.4 was just released, it will be in this status for at least the next 18 months, until Python 3.5 arrives.

As opposed to the huge laundry list of functionality above, all of these reasons will eventually be invalidated, hopefully sooner rather than later; if these were the only reasons that Twisted were going to stick around, I would definitely be worried. However, they’re still reasons why today, even if you only need the pieces of an asynchronous I/O system that the new asyncio module offers, you still might want to choose Twisted’s core event loop APIs. Keep in mind that using Twisted today doesn’t cut you off from using asyncio in the future: far from it, it makes it likely that you will be able to easily integrate whatever new asyncio code you write once you adopt it. Twisted’s goal, as Laurens van Houtven eloquently explained it this year in a PyCon talk, is to work with absolutely everything, and that very definitely includes asyncio and Python 3.

My Own Feelings

I feel like asyncio is a step forward for Python, and, despite the dire consequences some people seemed to expect, a tremendous potential benefit to Twisted.

For years, while we – the Twisted team – were trying to build the “engine of your Internet”, we were also having to make a constant, tedious sales pitch for the event-driven model of programming. Even today, we’re still stuck writing rambling, digressive rants explaining why you might not want three threads for every socket, giving conference talks where we try to trick the audience into writing a callback, and trying to explain basic stuff about networking protocols to an unreceptive, frustrated audience.

This audience was unreceptive because the broader python community has been less than excited about event-driven networking and cooperative task coordination in general. It’s a vicious cycle: programmers think events look “unpythonic”, so they write their code to block. Other programmers then just want to make use of the libraries suitable to their task, and they find ones which couple (blocking) I/O together with basic data-processing tasks like parsing.

Oddly enough, I noticed a drop in the frequency that I needed to have this sort of argument once node.js started to gain some traction. Once server-side Python programmers started to hear all the time about how writing callbacks wasn't a totally crazy thing to be doing on the server, there was a whole other community to answer their questions about why that was.

With the advent of asyncio, there is functionality available in the standard library to make event-driven implementations of things immediately useful. Perhaps even more important this functionality, there is guidance on how to make your own event-driven stuff. Every module that is written using asyncio rather than io is a module that at least can be made to work natively within Twisted without rewriting or monkeypatching it.

In other words, this has shifted the burden of arguing that event-driven programming is a worthwhile thing to do at all from Twisted to a module in the language’s core.

While it’ll be quite a while before most Python programmers are able to use asyncio on a day to day basis its mere existence justifies the conceptual basis of Twisted to our core consituency of Python programmers who want to put an object on a network. Which, in turn, means that we can dedicate more of our energy to doing cool stuff with Twisted, and we can dedicate more of the time we spend educating people to explaining how to do cool things with all the crazy features Twisted provides rather than explaining why you would even want to write all these weird callbacks in the first place.

So Tulip is a good thing for Python, a good thing for Twisted, I’m glad it exists, and it doesn't make me worried at all about Twisted’s future.

Quite the opposite, in fact.

Panopticon Rift

Why exactly is it that Oculus Rift fans hate Facebook so much?

Greg Price expresses a common opinion here:

I myself had a similar reaction; despite not being particularly invested in VR specifically (I was not an Oculus backer) I felt pretty negatively about the fact that Facebook was the acquirer. I wasn’t quite sure why, at first, and after some reflection I’d like to share my thoughts with you on both why this is a bad thing and also why gamers, in particular, were disturbed by it.

The Oculus Rift really captured the imagination of the gaming community’s intelligentsia. John Carmack’s imprimatur alone was enough to get people interested, but the real power of the Oculus was to finally deliver on the promise of all those unbelievably clunky virtual reality headsets that we’ve played around with at one time or another.

Virtual Boy

The promise of Virtual Reality is, of course, to transport us so completely to another place and time that we cease to even be aware of the real world. It is that experience, that complete and overwhelming sense of being in an imagined place, that many gamers are looking for when they sit down in front of an existing game. It is the aspiration to that experience that makes “immersive” such a high compliment in game criticism.

Personally, immersion in a virtual world was the beginning of my real interest in computers. I’ve never been the kind of gamer who felt the need to be intensely competitive. I didn’t really care about rules and mechanics that much, at least not for their own sake. In fact, the term “gamer” is a bit of a misnomer - I’m more a connoisseur of interactive experiences. The very first “game” I remember really enjoying was Zork. Although Zork is a goal-directed game you can “win”, that didn’t interest me. Instead, I enjoyed wandering around the environments it provided, and experimenting with different actions to see what the computer would do.

Computer games are not really just one thing. The same term, “games”, encompasses wildly divergent experiences including computerized Solitaire, Silent Hill, Dyad, and Myst. Nevertheless, I think that pursuit of immersion – on really, fully, presently paying attention to an interactive experience – is a primary reason that “gamers” feel the need to distinguish themselves (ourselves?) from the casual computing public. Obviously, the fans of Oculus are among those most focused on immersive experiences.

Gamers feel the need to set themselves apart because computing today is practically defined by a torrential cascade of things that we’re only barely paying attention to. What makes computing “today” different than the computing of yesteryear is that computing used to be about thinking, and now it’s about communication.

The advent of social media has left us not only focused on communication, but focused on constant, ephemeral, shallow communication. This is not an accident. In our present economy there is, after all, no such thing as a “social media business” (or, indeed, a “search engine business”); there are only ad agencies.

The purveyors of social media need you to be engaged enough with your friends that you won’t leave their sites, so there is some level of entertainment or interest they must bring to their transactions with you, of course; but they don’t want you to be so engaged that a distracting advertisement would be obviously crass and inappropriate. Lighthearted banter, pictures of shiba inus, and shallow gossip are fantastic fodder for this. Less so are long, soul-searching long-form writing explaining and examining your feelings.

An ad for cat food might seem fine if you’re chuckling at a picture of a dog in a hoodie saying “wow, very meme, such meta”. It’s less likely to drive you through to the terminus of that purchase conversion funnel if you’re intently focused on supporting a friend who is explaining to you how a cancer scare drove home how they’re doing nothing with their life, or how your friend’s sibling has run away from home and your friend doesn’t know if they’re safe.

Even if you’re a highly social extrovert, the dominant emotion that this torrent of ephemeral communication produces is, at best, sarcastic amusement. More likely, it produces constant anxiety. We do not experience our better selves when we’re not really paying focused attention to anything1. As Community Season 5 Episode 8 recently put it, somewhat more bluntly: “Mark Zuckerberg is Fidel Castro in flip-flops.”

I think that despite all the other reasons we’re all annoyed - the feelings of betrayal around the Kickstarter, protestations of Facebook’s creepyness, and so on - the root of the anger around the Facebook acquisition is the sense that this technology with so much potential to reverse the Balkanization of our attention has now been put directly into the hands of those who created the problem in the first place.

So now, instead of looking forward to a technology that will allow us to visit a world of pure imagination, we now eagerly await something that will shove distracting notifications and annoying advertisements literally an inch away from our eyeballs. Probably after charging us several hundred dollars for the privilege.

It seems to me that’s a pretty clear justification for a few hundred negative reddit comments.


  1. Paid link. See disclosures

Unyielding

Be as the reed, not the oak tree. Green threads are just threads.

The Oak and the Reed by Achille Michallon

… that which is hard and stiff
is the follower of death
that which is soft and yielding
is the follower of life …

the Tao Te Ching, chapter 76

Problem: Threads Are Bad

As we know, threads are a bad idea, (for most purposes). Threads make local reasoning difficult, and local reasoning is perhaps the most important thing in software development.

With the word “threads”, I am referring to shared-state multithreading, despite the fact that there are languages, like Erlang and Haskell which refer to concurrent processes – those which do not implicitly share state, and require explicit coordination – as “threads”.

My experience is mainly (although not exclusively) with Python but the ideas presented here should generalize to most languages which have global shared mutable state by default, which is to say, quite a lot of them: C (including Original Recipe, Sharp, Extra Crispy, Objective, and Plus Plus), JavaScript, Java, Scheme, Ruby, and PHP, just to name a few.

With the phrase “local reasoning”, I’m referring to the ability to understand the behavior (and thereby, the correctness) of a routine by examining the routine itself rather than examining the entire system.

When you’re looking at a routine that manipulates some state, in a single-tasking, nonconcurrent system, you only have to imagine the state at the beginning of the routine, and the state at the end of the routine. To imagine the different states, you need only to read the routine and imagine executing its instructions in order from top to bottom. This means that the number of instructions you must consider is n, where n is the number of instructions in the routine. By contrast, in a system with arbitrary concurrent execution – one where multiple threads might concurrently execute this routine with the same state – you have to read the method in every possible order, making the complexity nn.

Therefore it is – literally – exponentially more difficult to reason about a routine that may be executed from an arbitrary number of threads concurrently. Instead, you need to consider every possible caller across your program, understanding what threads they might be invoked from, or what state they might share. If you’re writing a library desgined to be thread-safe, then you must place some of the burden of this understanding on your caller.

The importance of local reasoning really cannot be overstated. Computer programs are, at least for the time being, constructed by human beings who are thinking thoughts. Correct computer programs are constructed by human beings who can simultaneously think thoughts about all the interactions that the portion of the system they’re developing will have with other portions.

A human being can only think about seven things at once, plus or minus two. Therefore, although we may develop software systems that contain thousands, millions, or billions of components over time, we must be able to make changes to that system while only holding in mind an average of seven things. Really bad systems will make us concentrate on nine things and we will only be able to correctly change them when we’re at our absolute best. Really good systems will require us to concentrate on only five, and we might be able to write correct code for them even when we’re tired.

Aside: “Oh Come On They’re Not That Bad”

Those of you who actually use threads to write real software are probably objecting at this point. “Nobody would actually try to write free-threading code like this,” I can hear you complain, “Of course we’d use a lock or a queue to introduce some critical sections if we’re manipulating state.”

Mutexes can help mitigate this combinatorial explosion, but they can’t eliminate it, and they come with their own cost; you need to develop strategies to ensure consistent ordering of their acquisition. Mutexes should really be used to build queues, and to avoid deadlocks those queues should be non-blocking but eventually a system which communicates exclusively through non-blocking queues effectively becomes a set of communicating event loops, and its problems revert to those of an event-driven system; it doesn’t look like regular programming with threads any more.

But even if you build such a system, if you’re using a language like Python (or the ones detailed above) where modules, classes, and methods are all globally shared, mutable state, it’s always possible to make an error that will affect the behavior of your whole program without even realizing that you’re interacting with state at all. You have to have a level of vigilance bordering on paranoia just to make sure that your conventions around where state can be manipulated and by whom are honored, because when such an interaction causes a bug it’s nearly impossible to tell where it came from.

Of course, threads are just one source of inscrutable, brain-bending bugs, and quite often you can make workable assumptions that preclude you from actually having to square the complexity of every single routine that you touch; for one thing, many computations don’t require manipulating state at all, and you can (and must) ignore lots of things that can happen on every line of code anyway. (If you think not, when was the last time you audited your code base for correct behavior in the face of memory allocation failures?) So, in a sense, it’s possible to write real systems with threads that perform more or less correctly for the same reasons it’s possible to write any software approximating correctness at all; we all need a little strength of will and faith in our holy cause sometimes.

Nevertheless I still think it’s a bad idea to make things harder for ourselves if we can avoid it.

Solution: Don’t Use Threads

So now I’ve convinced you that if you’re programming in Python (or one of its moral equivalents with respect to concurrency and state) you shouldn’t use threads. Great. What are you going to do instead?

There’s a lot of debate over the best way to do “asynchronous” programming - that is to say, “not threads”, four options are often presented.

  1. Straight callbacks: Twisted’s IProtocol, JavaScript’s on<foo> idiom, where you give a callback to something which will call it later and then return control to something (usually a main loop) which will execute those callbacks,
  2. “Managed” callbacks, or Futures: Twisted’s Deferred, JavaScript’s Promises/A[+], E’s Promises, where you create a dedicated result-that-will-be-available-in-the-future object and return it for the caller to add callbacks to,
  3. Explicit coroutines: Twisted’s @inlineCallbacks, Tulip’s yield from coroutines, C#’s async/await, where you have a syntactic feature that explicitly suspends the current routine,
  4. and finally, implicit coroutines: Java’s “green threads”, Twisted’s Corotwine, eventlet, gevent, where any function may switch the entire stack of the current thread of control by calling a function which suspends it.

One of these things is not like the others; one of these things just doesn’t belong.

Don’t Use Those Threads Either

Options 1-3 are all ways of representing the cooperative transfer of control within a stateful system. They are a semantic improvement over threads. Callbacks, Futures, and Yield-based coroutines all allow for local reasoning about concurrent operations.

So why does option 4 even show up in this list?

Unfortunately, “asynchronous” systems have often been evangelized by emphasizing a somewhat dubious optimization which allows for a higher level of I/O-bound concurrency than with preemptive threads, rather than the problems with threading as a programming model that I’ve explained above. By characterizing “asynchronousness” in this way, it makes sense to lump all 4 choices together.

I’ve been guilty of this myself, especially in years past: saying that a system using Twisted is more efficient than one using an alternative approach using threads. In many cases that’s been true, but:

  1. the situation is almost always more complicated than that, when it comes to performance,
  2. “context switching” is rarely a bottleneck in real-world programs, and
  3. it’s a bit of a distraction from the much bigger advantage of event-driven programming, which is simply that it’s easier to write programs at scale, in both senses (that is, programs containing lots of code as well as programs which have many concurrent users).

A system that presents “implicit coroutines” – those which may transfer control to another concurrent task at any layer of the stack without any syntactic indication that this may happen – are simply the dubious optimization by itself.

Despite the fact that implicit coroutines masquerade under many different names, many of which don’t include the word “thread” – for example, “greenlets”, “coroutines”, “fibers”, “tasks” – green or lightweight threads are indeed threads, in that they present these same problems. In the long run, when you build a system that relies upon them, you eventually have all the pitfalls and dangers of full-blown preemptive threads. Which, as shown above, are bad.

When you look at the implementation of a potentially concurrent routine written using callbacks or yielding coroutines, you can visually see exactly where it might yield control, either to other routines, or perhaps even re-enter the same routine concurrently. If you are using callbacks – managed or otherwise – you will see a return statement, or the termination of a routine, which allows execution of the main loop to potentially continue. If you’re using explicit coroutines, you’ll see a yield (or await) statement which suspends the coroutine. Because you can see these indications of potential concurrency, they’re outside of your mind, in your text editor, and you don’t need to actively remember them as you’re working on them.

You can think of these explicit yield-points as places where your program may gracefully bend to the needs of concurrent inputs. Crumple zones, or relief valves, for your logic, if you will: a single point where you have to consider the implications of a transfer of control to other parts of your program, rather than a rigid routine which might transfer (break) at any point beyond your control.

Like crumple zones, you shouldn’t have too many of them, or they lose their effectiveness. A long routine which has an explicit yield point before every single instruction requires just as much out-of-order reasoning, and is therefore just as error-prone as one which has none, but might context switch before any instruction anyway. The advantage of having to actually insert the yield point explicitly is that at least you can see when a routine has this problem, and start to clean up and consolidate the mangement of its concurrency.

But this is all pretty abstract; let me give you a specific practical example, and a small theoretical demonstration.

The Buggiest Bug

Brass Cockroach - Image Credit GlamourGirlBeads http://www.etsy.com/listing/62042780/large-antiqued-brass-cockroach1-ants3074

When we wrote the very first version of Twisted Reality in Python, the version we had previously written in Java was already using green threads; at the time, the JVM didn’t have any other kind of threads. The advantage to the new networking layer that we developed was not some massive leap forward in performance (the software in question was a multiplayer text adventure, which at the absolute height of its popularity might have been played by 30 people simultaneously) but rather the dramatic reduction in the number and severity of horrible, un-traceable concurrency bugs. One, in particular, involved a brass, mechanical cockroach which would crawl around on a timer, leaping out of a player’s hands if it was in their inventory, moving between rooms if not. In the multithreaded version, the cockroach would leap out of your hands but then also still stay in your hands. As the cockroach moved between rooms it would create shadow copies of itself, slowly but inexorably creating a cockroach apocalypse as tens of thousands of pointers to the cockroach, each somehow acquiring their own timer, scuttled their way into every player’s inventory dozens of times.

Given that the feeling that this particular narrative feature was supposed to inspire was eccentric whimsy and not existential terror, the non-determinism introduced by threads was a serious problem. Our hope for the even-driven re-write was simply that we’d be able to diagnose the bug by single-stepping through a debugger; instead, the bug simply disappeared. (Echoes of this persist, in that you may rarely hear a particularly grizzled Twisted old-timer refer to a particularly intractable bug as a “brass cockroach”.)

The original source of the bug was so completely intractable that the only workable solution was to re-write the entire system from scratch. Months of debugging and testing and experimenting could still reproduce it only intermittently, and several “fixes” (read: random, desperate changes to the code) never resulted in anything.

I’d rather not do that ever again.

Ca(sh|che Coherent) Money

Despite the (I hope) entertaining nature of that anecdote, it still might be somewhat hard to visualize how concurrency results in a bug like that, and the code for that example is far too sprawling to be useful as an explanation. So here's a smaller in vitro example. Take my word for it that the source of the above bug was the result of many, many intersecting examples of the problem described below.

As it happens, this is the same variety of example Guido van Rossum gives when he describes why chose to use explicit coroutines instead of green threads for the upcoming standard library asyncio module, born out of the “tulip” project, so it's happened to more than one person in real life.

Photo Credit: Ennor https://www.flickr.com/photos/ennor/441394582/sizes/l/

Let’s say we have this program:

1
2
3
4
5
6
7
8
9
def transfer(amount, payer, payee, server):
    if not payer.sufficient_funds_for_withdrawl(amount):
        raise InsufficientFunds()
    log("{payer} has sufficient funds.", payer=payer)
    payee.deposit(amount)
    log("{payee} received payment", payee=payee)
    payer.withdraw(amount)
    log("{payer} made payment", payer=payer)
    server.update_balances([payer, payee])

(I realize that the ordering of operations is a bit odd in this example, but it makes the point easier to demonstrate, so please bear with me.)

In a world without concurrency, this is of course correct. If you run transfer twice in a row, the balance of both accounts is always correct. But if we were to run transfer with the same two accounts in an arbitrary number of threads simultaneously, it is (obviously, I hope) wrong. One thread could update a payer’s balance below the funds-sufficient threshold after the check to see if they’re sufficient, but before issuing the withdrawl.

So, let’s make it concurrent, in the PEP 3156 style. That update_balances routine looks like it probably has to do some network communication and block, so let’s consider that it is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
@coroutine
def transfer(amount, payer, payee, server):
    if not payer.sufficient_funds_for_withdrawl(amount):
        raise InsufficientFunds()
    log("{payer} has sufficient funds.", payer=payer)
    payee.deposit(amount)
    log("{payee} received payment", payee=payee)
    payer.withdraw(amount)
    log("{payer} made payment", payer=payer)
    yield from server.update_balances([payer, payee])

So now we have a trivially concurrent, correct version of this routine, although we did have to update it a little. Regardless of what sufficient_funds_for_withdrawl, deposit and withdrawl do - even if they do network I/O - we know that we aren’t waiting for any of them to complete, so they can’t cause transfer to interfere with itself. For the sake of a brief example here, we’ll have to assume update_balances is a bit magical; for this to work our reads of the payer and payee’s balance must be consistent.

But if we were to use green threads as our “asynchronous” mechanism rather than coroutines and yields, we wouldn’t need to modify the program at all! Isn’t that better? And only update_balances blocks anyway, so isn’t it just as correct?

Sure: for now.

But now let’s make another, subtler code change: our hypothetical operations team has requested that we put all of our log messages into a networked log-gathering system for analysis. A reasonable request, so we alter the implementation of log to write to the network.

Now, what will we have to do to modify the green-threaded version of this code? Nothing! This is usually the point where fans of various green-threading systems will point and jeer, since once the logging system is modified to do its network I/O, you don’t even have to touch the code for the payments system. Separation of concerns! Less pointless busy-work! Looks like the green-threaded system is winning.

Oh well. Since I’m still a fan of explicit concurrency management, let’s do the clearly unnecessary busy-work of updating the ledger code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
@coroutine
def transfer(amount, payer, payee, server):
    if not payer.sufficient_funds_for_withdrawl(amount):
        raise InsufficientFunds()
    yield from log("{payer} has sufficient funds.", payer=payer)
    payee.deposit(amount)
    yield from log("{payee} received payment", payee=payee)
    payer.withdraw(amount)
    yield from log("{payer} made payment", payer=payer)
    yield from server.update_balances([payer, payee])

Well okay, at least that wasn’t too hard, if somewhat tedious. Sigh. I guess we can go update all of the ledger’s callers now and update them too…

…wait a second.

In order to update this routine for a non-blocking version of log, we had to type a yield keyword between the sufficient_funds_for_withdrawl check and the withdraw call, between the deposit and the withdraw call, and between the withdraw and update_balances call. If we know a little about concurrency and a little about what this program is doing, we know that every one of those yield froms are a potential problem. If those log calls start to back up and block, a payer may have their account checked for sufficient funds, then funds could be deducted while a log message is going on, leaving them with a negative balance.

If we were in the middle of updating lots of code, we might have blindly added these yield keywords without noticing that mistake. I've certainly done that in the past, too. But just the mechanical act of typing these out is an opportunity to notice that something’s wrong, both now and later. Even if we get all the way through making the changes without realizing the problem, when we notice that balances are off, we can look only (reasoning locally!) at the transfer routine and realize, when we look at it, based on the presence of the yield from keywords, that there is something wrong with the transfer routine itself, regardless of the behavior of any of the things it’s calling.

In the process of making all these obviously broken modifications, another thought might occur to us: do we really need to wait before log messages are transmitted to the logging system before moving on with our application logic? The answer would almost always be “no”. A smart implementation of log could simply queue some outbound messages to the logging system, then discard if too many are buffered, removing any need for its caller to honor backpressure or slow down if the logging system can’t keep up. Consider the way syslog says “and N more” instead of logging certain messages repeatedly. That feature allows it to avoid filling up logs with repeated messages, and decreases the amount of stuff that needs to be buffered if writing the logs to disk is slow.

All the extra work you need to do when you update all the callers of log when you make it asynchronous is therefore a feature. Tedious as it may be, the asynchronousness of an individual function is, in fact, something that all of its callers must be aware of, just as they must be aware of its arguments and its return type.

In fact you are changing its return type: in Twisted, that return type would be Deferred, and in Tulip, that return type is a new flavor of generator. This new return type represents the new semantics that happen when you make a function start having concurrency implications.

Haskell does this as well, by embedding the IO monad in the return type of any function which needs to have side-effects. This is what certain people mean when they say Deferreds are a Monad.

The main difference between lightweight and heavyweight threads is that it is that, with rigorous application of strict principles like “never share any state unnecessarily”, and “always write tests for every routine at every point where it might suspend”, lightweight threads make it at least possible to write a program that will behave deterministically and correctly, assuming you understand it in its entirety. When you find a surprising bug in production, because a routine that is now suspending in a place it wasn’t before, it’s possible with a lightweight threading system to write a deterministic test that will exercise that code path. With heavyweight threads, any line could be the position of a context switch at any time, so it’s just not tractable to write tests for every possible order of execution.

However, with lightweight threads, you still can’t write a test to discover when a new yield point might be causing problems, so you're still always playing catch-up.

Although it’s possible to do this, it remains very challenging. As I described above, in languages like Python, Ruby, JavaScript, and PHP, even the code itself is shared, mutable state. Classes, types, functions, and namespaces are all shared, and all mutable. Libraries like object relational mappers commonly store state on classes.

No Shortcuts

Despite the great deal of badmouthing of threads above, my main purpose in writing this was not to convince you that threads are, in fact, bad. (Hopefully, you were convinced before you started reading this.) What I hope I’ve demonstrated is that if you agree with me that threading has problematic semantics, and is difficult to reason about, then there’s no particular advantage to using microthreads, beyond potentially optimizing your multithreaded code for a very specific I/O bound workload.

There are no shortcuts to making single-tasking code concurrent. It's just a hard problem, and some of that hard problem is reflected in the difficulty of typing a bunch of new concurrency-specific code.

So don’t be fooled: a thread is a thread regardless of its color. If you want your program to be supple and resilient in the face of concurrency, when the storm of concurrency blows, allow it to change. Tell it to yield, just like the reed. Otherwise, just like the steadfast and unchanging oak tree in the storm, your steadfast and unchanging algorithms will break right in half.