mercredi 10 avril 2013

Git's branches are Sticky-notes

Take this quote that circulated on twitter
"Git gets easier once you get the basic idea that branches are homeomorphic endofunctors mapping submanifolds of a Hilbert space"
Now that is probably true but it is a most complicated way of explaining something very simple. This post will try to explain it in more human friendly terms and offer a useful metaphor (IMHO) for thinking about Git which is that of sticky notes for branches and tags.

The problem
It can be scary to do a merge ("what if I mess up the history"), and when you do get it wrong it's not very simple to undo that the first couple of times. Rebasing can also be quite awkward and seem like magic. And finally how can synchronisation between multiple repositories work (If I mess something up, I won't know how to get it right again). In short I was very careful in my first steps with git because I didn't want to mess things up, in particular with public repos. The following insights have helped me work more in harmony with Git and harness some of it's power. I'm writing this in hope they might useful to someone else.

Commits
First some basics. Git has commits. Commits are pointers to a full snapshot of the contents of the tracked files. They are identified by a sha1. It's very much like if you made a checksum of the checksums of all files.

A commit also contains a pointer to the its parent (i.e. the preceding commit). A commit with two or more parents is a merge commit, it has pointers to the commits that it merges.

Sticky notes
Git has sticky notes that you can stick to a commit. They're just a way of giving a commit a meaningful name. They are actually just a file with the sha1 of the commit as its sole . Lets look at the master branch, as (almost) all branches it is just a file residing in .git/refs/heads/

$ cat .git/refs/heads/master

ed31dcf89563d51010bca1620fbdef53a05c47a6



That's all there is to it! Now there are two types of sticky notes, tags and branches. Just like Post-Its you can move them around at your leisure. So lets say we want to move the master Post-It from where it is now to the commit with sha1 d23521... 
$ git branch --force master d23521da12cca757fac7c5f6b4729308b329c8ff

now the content of the master Post-It has changed
$ cat .git/refs/heads/master

d23521da12cca757fac7c5f6b4729308b329c8ff


Simple eh? You can even mess with the Post-Its of the team repository, but that will probably make your team mates just about as angry as if you moved around the Post-Its of your scrum board!

As I said there are two types of Post-Its: tags and branches. Tag Post-Its live in .git/refs/tags/ and there's not much more to them than that. Branch Post-Its live in .git/refs/heads/. What's special about branch sticky notes is that if you do a checkout on them they will follow your commits, so as to always point to the last commit - an example.
$ git checkout master
$ cat .git/refs/heads/master
d23521da12cca757fac7c5f6b4729308b329c8ff
$ ... modify some files
$ git commit

$ cat .git/refs/heads/master

6013c8f4133c175ff4bfeaf6af8bf3419f6bb6ba



In essence tags and branches are just pointers to a single commit.

Undoing a merge or rebase
Then it follows that if you mess up a merge you can easily redo it by checking out the commit you were on before the merge and then move the branch Post-it back. For instance lets say we're on the master
$ git merge somefeature

Let's undo!
$ git checkout HEADˆ                # HEAD means current commit, and the ^ means "the one before", we're no longer on the branch master.
$ git branch -f master HEAD      # moves the postit
$ git checkout master                 # we need to move back to the branch master

In practice you would rather do git reset --hard HEAD^, wich does all the above in one command. Btw there's an amazing doc about git reset.

Also if you messed up a rebase, even after it completed you can easily undo it by checking out the commit you were on before the rebase and moving along the branch Post-it, then checkout the branch Post-it at its new location.

Synchronisation with remotes

We're ready for some clarity about how pull and push to and from remote repositories work. When you do
$ git push origin master

That's a short for
$ git push origin master:refs/heads/master

That is, push any local commits on my master to the master branch on the remote repo origin. Or as we now see it : push my commit named master to the repo origin, and move the remote post-it (to the right of the colon) master to that commit. Under the hood Git will make sure that the commit-graph (all reachable parents from master) on both sides are identical and upload any missing commits.

Conclusion

In Git branches are just a way to name a commit.

I hope this post has taken away some of the strangeness and scariness of Git and that it has helped to see some of the commit-centric nature of git. I use this way of thinking all the time, in particular for merging, rebasing, reverting and in working with remotes. It has certainly made things easier and relieved me from the stress of doing something wrong.

In short I use this mental model should help when:
  • you merge two branches - you're just merging two commits using their name instead of their sha1
  • you're pushing or pulling to/from a remote repository -  there is absolutely no need for the names of the sticky notes on your repo and the remote repo to be the same. 
  • you're rebasing you're creating entirely new commits, then the the Post-It is moved to the tip of the new commits.
  • you do a bad merge, you can just move the Post-It down to where it was before and try again.
I just discovered Think like a git which provides very easy to read documentation