Unsigned Commits

I’m not going to cryptographically sign my git commits, and you shouldn’t either.

I am going to tell you why I don’t think you should sign your Git commits, even though doing so with SSH keys is now easier than ever. But first, to contextualize my objection, I have a brief hypothetical for you, and then a bit of history from the evolution of security on the web.

paper reading “Sign Here:” with a pen poised over it

It seems like these days, everybody’s signing all different kinds of papers.

Bank forms, permission slips, power of attorney; it seems like if you want to securely validate a document, you’ve gotta sign it.

So I have invented a machine that automatically signs every document on your desk, just in case it needs your signature. Signing is good for security, so you should probably get one, and turn it on, just in case something needs your signature on it.

We also want to make sure that verifying your signature is easy, so we will have them all notarized and duplicates stored permanently and publicly for future reference.

No? Not interested?

Hopefully, that sounded like a silly idea to you.

Most adults in modern civilization have learned that signing your name to a document has an effect. It is not merely decorative; the words in the document being signed have some specific meaning and can be enforced against you.

In some ways the metaphor of “signing” in cryptography is bad. One does not “sign” things with “keys” in real life. But here, it is spot on: a cryptographic signature can have an effect.

It should be an input to some software, one that is acted upon. Software does a thing differently depending on the presence or absence of a signature. If it doesn’t, the signature probably shouldn’t be there.

Consider the most venerable example of encryption and signing that we all deal with every day: HTTPS. Many years ago, browsers would happily display unencrypted web pages. The browser would also encrypt the connection, if the server operator had paid for an expensive certificate and correctly configured their server. If that operator messed up the encryption, it would pop up a helpful dialog box that would tell the user “This website did something wrong that you cannot possibly understand. Would you like to ignore this and keep working?” with buttons that said “Yes” and “No”.

Of course, these are not the precise words that were written. The words, as written, said things about “information you exchange” and “security certificate” and “certifying authorities” but “Yes” and “No” were the words that most users read. Predictably, most users just clicked “Yes”.

In the usual case, where users ignored these warnings, it meant that no user ever got meaningful security from HTTPS. It was a component of the web stack that did nothing but funnel money into the pockets of certificate authorities and occasionally present annoying interruptions to users.

In the case where the user carefully read and honored these warnings in the spirit they were intended, adding any sort of transport security to your website was a potential liability. If you got everything perfectly correct, nothing happened except the browser would display a picture of a small green purse. If you made any small mistake, it would scare users off and thereby directly harm your business. You would only want to do it if you were doing something that put a big enough target on your site that you became unusually interesting to attackers, or were required to do so by some contractual obligation like credit card companies.

Keep in mind that the second case here is the best case.

In 2016, the browser makers noticed this problem and started taking some pretty aggressive steps towards actually enforcing the security that HTTPS was supposed to provide, by fixing the user interface to do the right thing. If your site didn’t have security, it would be shown as “Not Secure”, a subtle warning that would gradually escalate in intensity as time went on, correctly incentivizing site operators to adopt transport security certificates. On the user interface side, certificate errors would be significantly harder to disregard, making it so that users who didn’t understand what they were seeing would actually be stopped from doing the dangerous thing.

Nothing fundamental¹ changed about the technical aspects of the cryptographic primitives or constructions being used by HTTPS in this time period, but socially, the meaning of an HTTP server signing and encrypting its requests changed a lot.

Now, let’s consider signing Git commits.

You may have heard that in some abstract sense you “should” be signing your commits. GitHub puts a little green “verified” badge next to commits that are signed, which is neat, I guess. They provide “security”. 1Password provides a nice UI for setting it up. If you’re not a 1Password user, GitHub itself recommends you put in just a few lines of configuration to do it with either a GPG, SSH, or even an S/MIME key.

But while GitHub’s documentation quite lucidly tells you how to sign your commits, its explanation of why is somewhat less clear. Their purse is the word “Verified”; it’s still green. If you enable “vigilant mode”, you can make the blank “no verification status” option say “Unverified”, but not much else changes.

This is like the old-style HTTPS verification “Yes”/“No” dialog, except that there is not even an interruption to your workflow. They might put the “Unverified” status on there, but they’ve gone ahead and clicked “Yes” for you.

It is tempting to think that the “HTTPS” metaphor will map neatly onto Git commit signatures. It was bad when the web wasn’t using HTTPS, and the next step in that process was for Let’s Encrypt to come along and for the browsers to fix their implementations. Getting your certificates properly set up in the meanwhile and becoming familiar with the tools for properly doing HTTPS was unambiguously a good thing for an engineer to do. I did, and I’m quite glad I did so!

However, there is a significant difference: signing and encrypting an HTTPS request is ephemeral; signing a Git commit is functionally permanent.

This ephemeral nature meant that errors in the early HTTPS landscape were easily fixable. Earlier I mentioned that there was a time where you might not want to set up HTTPS on your production web servers, because any small screw-up would break your site and thereby your business. But if you were really skilled and you could see the future coming, you could set up monitoring, avoid these mistakes, and rapidly recover. These mistakes didn’t need to badly break your site.

We can extend the analogy to HTTPS, but we have to take a detour into one of the more unpleasant mistakes in HTTPS’s history: HTTP Public Key Pinning, or “HPKP”. The idea with HPKP was that you could publish a record in an HTTP header where your site commits² to using certain certificate authorities for a period of time, where that period of time could be “forever”. Attackers gonna attack, and attack they did. Even without getting attacked, a site could easily commit “HPKP Suicide” where they would pin the wrong certificate authority with a long timeline, and their site was effectively gone for every browser that had ever seen those pins. As a result, after a few years, HPKP was completely removed from all browsers.

Git commit signing is even worse. With HPKP, you could easily make terrible mistakes with permanent consequences even though you knew the exact meaning of the data you were putting into the system at the time you were doing it. With signed commits, you are saying something permanently, but you don’t really know what it is that you’re saying.

Today, what is the benefit of signing a Git commit? GitHub might present it as “Verified”. It’s worth noting that only GitHub will do this, since they are the root of trust for this signing scheme. So, by signing commits and registering your keys with GitHub, you are, at best, helping to lock in GitHub as a permanent piece of infrastructure that is even harder to dislodge because they are not only where your code is stored, but also the arbiters of whether or not it is trustworthy.

In the future, what is the possible security benefit? If we all collectively decide we want Git to be more secure, then we will need to meaningfully treat signed commits differently from unsigned ones.

There’s a long tail of unsigned commits several billion entries long. And those are in the permanent record as much as the signed ones are, so future tooling will have to be able to deal with them. If, as stewards of Git, we wish to move towards a more secure Git, as the stewards of the web moved towards a more secure web, we do not have the option that the web did. In the browser, the meaning of a plain-text HTTP or incorrectly-signed HTTPS site changed, in order to encourage the site’s operator to change the site to be HTTPS.

In contrast, the meaning of an unsigned commit cannot change, because there are zillions of unsigned commits lying around in critical infrastructure and we need them to remain there. Commits cannot meaningfully be changed to become signed retroactively. Unlike an online website, they are part of a historical record, not an operating program. So we cannot establish the difference in treatment by changing how unsigned commits are treated.

That means that tooling maintainers will need to provide some difference in behavior that provides some incentive. With HTTPS where the binary choice was clear: don’t present sites with incorrect, potentially compromised configurations to users. The question was just how to achieve that. With Git commits, the difference in treatment of a “trusted” commit is far less clear.

If you will forgive me a slight straw-man here, one possible naive interpretation is that a “trusted” signed commit is that it’s OK to run in CI. Conveniently, it’s not simply “trusted” in a general sense. If you signed it, it’s trusted to be from you, specifically. Surely it’s fine if we bill the CI costs for validating the PR that includes that signed commit to your GitHub account?

Now, someone can piggy-back off a 1-line typo fix that you made on top of an unsigned commit to some large repo, making you implicitly responsible for transitively signing all unsigned parent commits, even though you haven’t looked at any of the code.

Remember, also, that the only central authority that is practically trustable at this point is your GitHub account. That means that if you are using a third-party CI system, even if you’re using a third-party Git host, you can only run “trusted” code if GitHub is online and responding to requests for its “get me the trusted signing keys for this user” API. This also adds a lot of value to a GitHub credential breach, strongly motivating attackers to sneakily attach their own keys to your account so that their commits in unrelated repos can be “Verified” by you.

Let’s review the pros and cons of turning on commit signing now, before you know what it is going to be used for:

Pro	Con
Green “Verified” badge	Unknown, possibly unlimited future liability for the consequences of running code in a commit you signed
	Further implicitly cementing GitHub as a centralized trust authority in the open source world
	Introducing unknown reliability problems into infrastructure that relies on commit signatures
	Temporary breach of your GitHub credentials now lead to potentially permanent consequences if someone can smuggle a new trusted key in there
	New kinds of ongoing process overhead as commit-signing keys become new permanent load-bearing infrastructure, like “what do I do with expired keys”, “how often should I rotate these”, and so on

I feel like the “Con” column is coming out ahead.

That probably seemed like increasingly unhinged hyperbole, and it was.

In reality, the consequences are unlikely to be nearly so dramatic. The status quo has a very high amount of inertia, and probably the “Verified” badge will remain the only visible difference, except for a few repo-specific esoteric workflows, like pushing trust verification into offline or sandboxed build systems. I do still think that there is some potential for nefariousness around the “unknown and unlimited” dimension of any future plans that might rely on verifying signed commits, but any flaws are likely to be subtle attack chains and not anything flashy and obvious.

But I think that one of the biggest problems in information security is a lack of threat modeling. We encrypt things, we sign things, we institute rotation policies and elaborate useless rules for passwords, because we are looking for a “best practice” that is going to save us from having to think about what our actual security problems are.

I think the actual harm of signing git commits is to perpetuate an engineering culture of unquestioningly cargo-culting sophisticated and complex tools like cryptographic signatures into new contexts where they have no use.

Just from a baseline utilitarian philosophical perspective, for a given action A, all else being equal, it’s always better not to do A, because taking an action always has some non-zero opportunity cost even if it is just the time taken to do it. Epsilon cost and zero benefit is still a net harm. This is even more true in the context of a complex system. Any action taken in response to a rule in a system is going to interact with all the other rules in that system. You have to pay complexity-rent on every new rule. So an apparently-useless embellishment like signing commits can have potentially far-reaching consequences in the future.

Git commit signing itself is not particularly consequential. I have probably spent more time writing this blog post than the sum total of all the time wasted by all programmers configuring their git clients to add useless signatures; even the relatively modest readership of this blog will likely transfer more data reading this post than all those signatures will take to transmit to the various git clients that will read them. If I just convince you not to sign your commits, I don’t think I’m coming out ahead in the felicific calculus here.

What I am actually trying to point out here is that it is useful to carefully consider how to avoid adding junk complexity to your systems. One area where junk tends to leak in to designs and to cultures particularly easily is in intimidating subjects like trust and safety, where it is easy to get anxious and convince ourselves that piling on more stuff is safer than leaving things simple.

If I can help you avoid adding even a little bit of unnecessary complexity, I think it will have been well worth the cost of the writing, and the reading.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics such as “What else should I not apply a cryptographic signature to?”.

Yes yes I know about heartbleed and Bleichenbacher attacks and adoption of forward-secret ciphers and CRIME and BREACH and none of that is relevant here, okay? Jeez. ↩
Do you see what I did there. ↩

Techniques for Actually Distributed Development with Git

It's all very wibbly wobbly and versiony wersiony.

git version control decentralization Wednesday September 03, 2014

The Setup

I have a lot of computers. Sometimes I'll start a project on one computer and get interrupted, then later find myself wanting to work on that same project, right where I left off, on another computer. Sometimes I'm not ready to publish my work, and I might still want to rebase it a few times (because if you're not arbitrarily rewriting history all the time you're not really using Git) so I don't want to push to @{upstream} (which is how you say "upstream" in Git).

I would like to be able to use Git to synchronize this work in progress between multiple computers. I would like to be able to easily automate this synchronization so that I don’t have to remember any special steps to sync up; one of the most focused times that I have available to get coding and writing work done is when I’m disconnected from the Internet, on a cross-country train or a plane trip, or sitting near a beach, for 5-10 hours. It is very frustrating to realize only once I’m settled in and unable to fetch the code, that I don’t actually have it on my laptop because I was last doing something on my desktop. I would particularly like to be able to that offline time to resolve tricky merge conflicts; if there are two versions of a feature, I want to have them both available locally while I'm disconnected.

Completely Fake, Made-Up History That Is A Lie

As everyone knows, Git is a centralized version control system created by the popular website GitHub as a command-line client for its "forking" HTML API. Alternate central Git authorities have been created by other startups following in the wave following GitHub's success, such as BitBucket and GitLab.

It may surprise many younger developers to know that when GitHub first created Git, it was originally intended to be a distributed version control system, where it was possible to share code with no particular central authority!

Although the feature has been carefully hidden from the casual user, with a bit of trickery you can enable re-enable it!

Technique 0: Understanding What's Going On

It's a bit confusing to have to actually set up multiple computers to test these things, so one useful thing to understand is that, to Git, the place you can fetch revisions from and push revisions to is a repository. Normally these are identified by URLs which identify hosts on the Internet, but you can also just indicate a path name on your computer. So for example, we can simulate a "network" of three computers with three clones of a repository like this:

$ mkdir tmp
$ cd tmp/
$ mkdir a b c
$ for repo in a b c; do (cd $repo; git init); done
Initialized empty Git repository in .../tmp/a/.git/
Initialized empty Git repository in .../tmp/b/.git/
Initialized empty Git repository in .../tmp/c/.git/
$ 

This creates three separate repositories. But since they're not clones of each other, none of them have any remotes, and none of them can push or pull from each other. So how do we teach them about each other?

$ cd a
$ git remote add b ../b
$ git remote add c ../c
$ cd ../b
$ git remote add a ../a
$ git remote add c ../c
$ cd ../c
$ git remote add a ../a
$ git remote add b ../b

Now, you can go into a and type git fetch --all and it will fetch from b and c, and similarly for git fetch --all in b and c.

To turn this into a practical multiple-machine scenario, rather than specifying a path like ../b, you would specify an SSH URL as your remote URL, and turn SSH on on each of your machines ("Remote Login" in the Sharing preference pane on the mac, if you aren't familiar with doing that on a mac).

So, for example, if you have a home desktop tweedledee and a work laptop tweedledum, you can do something like this:

tweedledee:~ neo$ mkdir foo; cd foo
tweedledee:foo neo$ git init .
# ...
tweedledum:~ m_anderson$ mkdir bar; cd bar
tweedledum:bar m_anderson$ git init .
tweedledum:bar m_anderson$ git remote add tweedledee neo@tweedledee.local:foo
# ...
tweedledee:foo neo$ git remote add tweedledum m_anderson$@tweedledum.local:foo

I don't know the names of the hosts on your network. So, in order to make it possible for you to follow along exactly, I'll use the repositories that I set up above, with path-based remotes, in the following examples.

Technique 1 (Simple): Always Only Fetch, Then Merge

Git repositories are pretty boring without any commits, so let's create a commit:

$ cd ../a
$ echo 'some data' > data.txt
$ git add data.txt
$ git ci -m "data"
[master (root-commit) 8dc3db4] data
 1 file changed, 1 insertion(+)
 create mode 100644 data.txt

Now on our "computers" b and c, we can easily retrieve this commit:

$ cd ../b/
$ git fetch --all
Fetching a
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../a
 * [new branch]      master     -> a/master
Fetching c
$ git merge a/master
$ ls
data.txt
$ cd ../c
$ ls
$ git fetch --all
Fetching a
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../a
 * [new branch]      master     -> a/master
Fetching b
From ../b
 * [new branch]      master     -> b/master
$ git merge b/master
$ ls
data.txt

If we make a change on b, we can easily pull it into a as well.

$ cd ../b
$ echo 'more data' > data.txt 
$ git commit data.txt -m "more data"
[master f3d4165] more data
 1 file changed, 1 insertion(+), 1 deletion(-)
$ cd ../a
$ git fetch --all
Fetching b
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../b
 * [new branch]      master     -> b/master
Fetching c
From ../c
 * [new branch]      master     -> c/master
$ git merge b/master
Updating 8dc3db4..f3d4165
Fast-forward
 data.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

This technique is quite workable, except for one minor problem.

The Minor Problem

Let's say you're sitting on your laptop and your battery is about to die. You want to push some changes from your laptop to your desktop. Your SSH key, however, is plugged in to your laptop. So you just figure you'll push from your laptop to your desktop. Unfortunately, if you try, you'll see something like this:

$ cd ../a/
$ echo 'even more data' >> data.txt 
$ git commit data.txt -m "even more data"
[master a9f3d89] even more data
 1 file changed, 1 insertion(+)
$ git push b master
Counting objects: 7, done.
Writing objects: 100% (3/3), 260 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.
remote: error: 
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error: 
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
To ../b
 ! [remote rejected] master -> master (branch is currently checked out)
error: failed to push some refs to '../b'

While you're reading this, you fail your will save and become too bored to look at a computer any more.

Too late, your battery is dead! Hopefully you didn't lose any work.

In other words: sometimes it's nice to be able to push changes as well.

Technique 1.1: The Manual Workaround

The problem that you're facing here is that b has its master branch checked out, and is therefore rejecting changes to that branch. Your commits have actually all been "uploaded" to b, and are present in that repository, but there is no branch pointing to them. Doing either of those configuration things that Git warns you about in order to force it to allow it is a bad idea, though; if your working tree and your index and your your commits don't agree with each other, you're just asking for trouble. Git is confusing enough as it is.

In order work around this, you can just push your changes in master on a to a diferent branch on b, and then merge it later, like so:

$ git push b master:master_from_a
Counting objects: 7, done.
Writing objects: 100% (3/3), 260 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ../b
 * [new branch]      master -> master_from_a
$ cd ../b
$ git merge master_from_a 
Updating f3d4165..a9f3d89
Fast-forward
 data.txt | 1 +
 1 file changed, 1 insertion(+)
$ 

This works just fine, you just always have to manually remember which branches you want to push, and where, and where you're pushing from.

Technique 2: Push To Reverse Pull To Remote Remotes

The astute reader will have noticed at this point that git already has a way of tracking "other places that changes came from", they're called remotes! And in fact b already has a remote called a pointing at ../a. Once you're sitting in front of your b computer again, wouldn't you rather just have those changes already in the a remote, instead of in some branch you have to specifically look for?

What if you could just push your branches from a into that remote? Well, friend, I'm here today to tell you you can.

First, head back over to a...

$ cd ../a

And now, all you need is this entirely straightforward and obvious command:

$ git config remote.b.push '+refs/heads/*:refs/remotes/a/*'

and now, when you git push b from a, you will push those branches into b's "a" remote, as if you had done git fetch a while in b.

$ git push b
Total 0 (delta 0), reused 0 (delta 0)
To ../b
   8dc3db4..a9f3d89  master -> a/master

So, if we make more changes:

$ echo 'YET MORE data' >> data.txt
$ git commit data.txt -m "You get the idea."
[master c641a41] You get the idea.
 1 file changed, 1 insertion(+)

we can push them to b...

$ git push b
Counting objects: 5, done.
Writing objects: 100% (3/3), 272 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ../b
   a9f3d89..c641a41  master -> a/master

and when we return to b...

$ cd ../b/

there's nothing to fetch, it's all been pre-fetched already, so

$ git fetch a

produces no output.

But there is some stuff to merge, so if we took b on a plane with us:

$ git merge a/master
Updating a9f3d89..c641a41
Fast-forward
 data.txt | 1 +
 1 file changed, 1 insertion(+)

we can merge those changes in whenever we please!

More importantly, unlike the manual-syncing solution, this allows us to push multiple branches on a to b without worrying about conflicts, since the a remote on b will always only be updated to reflect the present state of a and should therefore never have conflicts (and if it does, it's because you rewrote history and you should be able to force push with no particular repercussions).

Generalizing

Here's a shell function which takes 2 parameters, "here" and "there". "here" is the name of the current repository - meaning, the name of the remote in the other repository that refers to this one - and "there" is the name of the remote which refers to another repository.

function remoteremote () {
    local here="$1"; shift;
    local there="$1"; shift;

    git config "remote.$there.push" "+refs/heads/*:refs/remotes/$here/*";
}

In the above example, we could have used this shell function like so:

$ cd ../a
$ remoteremote a b

I now use this all the time when I check out a repository on multiple machines for the first time; I can then always easily push my code to whatever machine I’m going to be using next.

I really hope at least one person out there has equally bizarre usage patterns of version control systems and finds this post useful. Let me know!

Acknowledgements

Thanks very much to Tom Prince for the information that lead to this post being worth sharing, and Jenn Schiffer for teaching me that it is OK to write jokes sometimes, except about javascript which is very serious and should not be made fun of ever.