Johan Martinsson: Simplicité

mercredi 10 avril 2013

Git's branches are Sticky-notes

Take this quote that circulated on twitter

"Git gets easier once you get the basic idea that branches are homeomorphic endofunctors mapping submanifolds of a Hilbert space"

Now that is probably true but it is a most complicated way of explaining something very simple. This post will try to explain it in more human friendly terms and offer a useful metaphor (IMHO) for thinking about Git which is that of sticky notes for branches and tags.

The problem
It can be scary to do a merge ("what if I mess up the history"), and when you do get it wrong it's not very simple to undo that the first couple of times. Rebasing can also be quite awkward and seem like magic. And finally how can synchronisation between multiple repositories work (If I mess something up, I won't know how to get it right again). In short I was very careful in my first steps with git because I didn't want to mess things up, in particular with public repos. The following insights have helped me work more in harmony with Git and harness some of it's power. I'm writing this in hope they might useful to someone else.

Commits
First some basics. Git has commits. Commits are pointers to a full snapshot of the contents of the tracked files. They are identified by a sha1. It's very much like if you made a checksum of the checksums of all files.

A commit also contains a pointer to the its parent (i.e. the preceding commit). A commit with two or more parents is a merge commit, it has pointers to the commits that it merges.

Sticky notes
Git has sticky notes that you can stick to a commit. They're just a way of giving a commit a meaningful name. They are actually just a file with the sha1 of the commit as its sole . Lets look at the master branch, as (almost) all branches it is just a file residing in .git/refs/heads/

$ cat .git/refs/heads/master

ed31dcf89563d51010bca1620fbdef53a05c47a6

That's all there is to it! Now there are two types of sticky notes, tags and branches. Just like Post-Its you can move them around at your leisure. So lets say we want to move the master Post-It from where it is now to the commit with sha1 d23521...

$ git branch --force master d23521da12cca757fac7c5f6b4729308b329c8ff

now the content of the master Post-It has changed
$ cat .git/refs/heads/master

d23521da12cca757fac7c5f6b4729308b329c8ff

Simple eh? You can even mess with the Post-Its of the team repository, but that will probably make your team mates just about as angry as if you moved around the Post-Its of your scrum board!

As I said there are two types of Post-Its: tags and branches. Tag Post-Its live in .git/refs/tags/ and there's not much more to them than that. Branch Post-Its live in .git/refs/heads/. What's special about branch sticky notes is that if you do a checkout on them they will follow your commits, so as to always point to the last commit - an example.

$ git checkout master
$ cat .git/refs/heads/master
d23521da12cca757fac7c5f6b4729308b329c8ff
$ ... modify some files
$ git commit

$ cat .git/refs/heads/master

6013c8f4133c175ff4bfeaf6af8bf3419f6bb6ba

In essence tags and branches are just pointers to a single commit.

Undoing a merge or rebase
Then it follows that if you mess up a merge you can easily redo it by checking out the commit you were on before the merge and then move the branch Post-it back. For instance lets say we're on the master
$ git merge somefeature

Let's undo!
$ git checkout HEADˆ # HEAD means current commit, and the ^ means "the one before", we're no longer on the branch master.
$ git branch -f master HEAD # moves the postit
$ git checkout master # we need to move back to the branch master

In practice you would rather do git reset --hard HEAD^, wich does all the above in one command. Btw there's an amazing doc about git reset.

Also if you messed up a rebase, even after it completed you can easily undo it by checking out the commit you were on before the rebase and moving along the branch Post-it, then checkout the branch Post-it at its new location.

Synchronisation with remotes

We're ready for some clarity about how pull and push to and from remote repositories work. When you do
$ git push origin master

That's a short for
$ git push origin master:refs/heads/master

That is, push any local commits on my master to the master branch on the remote repo origin. Or as we now see it : push my commit named master to the repo origin, and move the remote post-it (to the right of the colon) master to that commit. Under the hood Git will make sure that the commit-graph (all reachable parents from master) on both sides are identical and upload any missing commits.

Conclusion

In Git branches are just a way to name a commit.

I hope this post has taken away some of the strangeness and scariness of Git and that it has helped to see some of the commit-centric nature of git. I use this way of thinking all the time, in particular for merging, rebasing, reverting and in working with remotes. It has certainly made things easier and relieved me from the stress of doing something wrong.

In short I use this mental model should help when:

you merge two branches - you're just merging two commits using their name instead of their sha1
you're pushing or pulling to/from a remote repository - there is absolutely no need for the names of the sticky notes on your repo and the remote repo to be the same.
you're rebasing you're creating entirely new commits, then the the Post-It is moved to the tip of the new commits.
you do a bad merge, you can just move the Post-It down to where it was before and try again.

I just discovered Think like a git which provides very easy to read documentation

mardi 18 octobre 2011

Pas si simple de faire simple

La version express de ce billet est que nous ne faisons pas assez la différence entre “construire avec des briques simples” et “faire des objets simples à utiliser”. Ce n’est pas du tout la même chose!

La différence :

La simplicité d’utilisation est

Contextuel
Composé
Plutôt fermé à l’évolution

Les briques simples sont

Généralistes
Unitaires
Ouverts à l’évolution

C’est possible d’avoir les deux.

Voici la version longue, avec des explications et, promis, des exemples.

Dans ce keynote Stuart Halloway explique que le mot simple veut dire quelque chose de très fondamentale. Il soutient que la définition du mot simple est non composé ou non assemblé (non compound en anglais).

Puis il continue à nous expliquer pourquoi il est primordiale de construire seulement à partir de choses simples. Notamment il dit

Essayez de faire quelque chose de simple à partir de choses non simples … C’est impossible

Autrement dit, pour avoir une librairie ou application simple, on doit l’assembler de choses simples. Puis il finit son keynote par dire.

Si un peu plus du code était écrit dans le respect de la simplicité le monde serait un peu meilleur.

Bon finalement c’était juste le Single Responsibility Principle dans une nouvelle sauce. Soit, seulement quelques jours plus tard, il m’est venu à l’ésprit qu’il y a un paradoxe. Car si j’écris tout mon code de cette manière alors il serait chiant à utiliser, même pour moi. Notamment lorsqu’on utilise fréquemment une fonction/objet avec les mêmes paramètres ce serait chouette de ne pas le répéter partout dans le code. Un code simple n’est pas toujours simple à utiliser. En regardant de près il n’y a pas de paradoxe, mais la réflexion m’a fait comprendre quelque chose que je vais partager ici.

Pour rendre la vie facile pour ceux qui utilisent notre code (souvent nous mêmes) nous avons tendance à faire du code simple à utiliser. Je suis convaincu que si on échoue à faire quelque chose de simple c’est souvent parce qu’on veut faire quelque chose de simple à utiliser! Et pourtant ce compromis n'est pas nécessaire.

Si on revient à la définition de simple on constate que simple est une qualité intrinsèque et non contextuelle. Il est possible de juger de la simplicité de manière objective.

Avec Simple à utiliser c’est l’inverse. Par définition c’est simple pour l’utilisation prévue. Si on change un peu le contexte il y a toutes les chances que ce ne soit plus très simple à utiliser. Simple à utiliser est contextuel.

Souvent nous trouvons des choses simples à utiliser parce que ça ressemble à quelque chose de connu, par exemple un patron de conception que nous avons déjà utilisé. Simple à utiliser est subjectif.

Ce serait bien dommage de faire des compromis entre une qualité "absolue" et une qualité contextuelle et subjective !

Prenons par exemple cet objet qui fait partie d’une application de crawl basique.

BatchCrawler est simple à utiliser car nous n’avons pas à fournir beaucoup d’arguments pour l’instancier. Par contre il n’est pas Simple car il fait un tas de choses en plus de crawler une liste d’urls :

Il spécifie où est configuré le timeout de chargement d’une page.
Il contraint ce timeout à ne changer dans toute la JVM voire même on doit redéployer pour la changer.
Il contraint à utiliser un RetryingHttpCrawler – qui retente en cas de timeout et qui passe par le protocole HTTP.

Limitant ainsi fortement le potentiel de réutilisation de l’objet.

Pour que l’objet puisse être qualifié de Simple, le constructeur devrait ressembler à ça

Mais alors il deviendrait chiant à utiliser. Si cet objet est instancié de plusieurs endroits de l’application il y aurait de la duplication. Nous allons donc réintroduire ce constructeur sous forme de FactoryMethod avec un nom explicatif (documentation) parce qu’il s’agit d'une façon de configurer ce graph d’objets.

Comme avec tout exemple simplifié c’est n’est jamais trop grave. C’est encore suffisamment petit pour être corrigé après-coup. Seulement dans une application réelle ça peut être un cauchemar car régulièrement on se retrouve avec l’initialisation d’une grappe d’objets - ici BatchCrawler initialise un RetryingHttpCrawler etc. Presque à chaque fois l’initialisation des objets plus bas dans le graphe ont des conséquences indésirables, c’est souvent ça qui nous rend les tests après si difficiles ... Le remplacement d’un objet dans ce graphe n’est pas toujours aisé et le code nécessaire teintera le code d’initialisation avec complèxité.

En construisant uniquement avec des objets et fonctions Simples on arrive à l’enrober avec une simplicité d’utilisation… pour le besoin actuellement connu. Quand le besoin change (lire toujours) on est le roi du pétrole. On a le beurre et l’argent du beurre.

Note : je ne dis pas qu’il faut essayer d’imaginer le besoin futur, seulement qu’il faut décomposer son code actuel en des briques les plus simples possible. Par dessus on peut ensuite construire la simplicité d’utilisation avec par exemple des valeurs par défaut.

Conclusion
Je ne vais pas conseiller de toujours faire ainsi, car à vrai dire rien n’est toujours vrai. Simplement le message que j’essaie de passer est

Il y a une différence importante :

La simplicité d’utilisation est

Contextuel
Composé
Plutôt fermé à l’évolution

Les briques simples sont

Généralistes
Unitaires
Ouverts à l’évolution

C’est possible d’avoir les deux.