Johan Martinsson: janvier 2014

tldr; Golden master is a complimentary testing technique particularly useful with awful legacy. It allows for quick production of reliable non regression tests that don't interfere with refactoring. Compared to traditional tests the game changer is that they are ephemeral - thus don't have to be maintainable.

I got introduced to the Golden Master testing technique 2 years ago when in Grenoble, France we had the chance to have J.B. Rainsberger come here for the worlds first Legacy Code Retreat. It was very an efficient technique to get total line coverage on that ugly piece of code of some 200-300 lines. The variation that he used was to put log statements all around the code, run it with many variations of input arguments, save the log files and write a test that rerun the application with the exact same arguments - if you only refactor (no behaviour changes i.e. transformations) then tests must be green.

Summary: Golden master is a technique where you bombard your system with many many many variations in input arguments, capture all the outputs to some file(s) which is committed. Every time you run the tests they must produce the same file(s). Chris Melinn describes the technique in detail. Sandro Mancuso has an example

I never used it later - except for dojos on legacy code. The reason is that it is so easy to on code that takes simple arguments and return all the results or at least has some easily verifiable side effects - like writing to a file. But production legacy code has soooooo many really horrible dependencies, intricate side effects and very often results depend very much on the state of the machine, application, database etc. Another reason I didn't use the Golden Master technique is that those tests are unmaintainable - very fragile to any modification of the behaviour.

But then a few months ago I had a few insights that suddenly made it very interesting. Here's an overview of them.

To deal with behaviour depending on state - I can transform them into direct input arguments by writing a wrapper function of the SUT (System Under Test) that configures different states
To deal with results doing side effects - I can transform them into return arguments by reading the state after exercising the SUT. So the system now behaves like a pure function from a testing perspective.
Those tests are ephemeral - I throw them away one I'm done with the refactoring. So they don't need to be explicit, clear, robust and all other things that takes time. They way I mock the system can also be quick and dirty.
Ordinary unit tests (one test class per production class) on awful code are often too fine grained and give only moderate security (failing to capture some intricate side effects) and while they allow refactoring below the interface they are testing, they hinder refactoring of that, often ill designed, interface - because they are a second client to it.

So if Golden Master tests only allow but don't enforce design improvement, aren't they dangerous? Don't they leave the SUT just as bad or even worse? Well they could of course, one way of avoiding that is to apply the classical TDD method on legacy, i.e.

Write non-regression tests
Refactor, to make the system Open-Closed (with respect to the feature you want to add)

Write ordinary unit tests for the refactored code

Test-drive the new functionality in isolation from current production code
Plug the new functionality (this is usually a one-line code or config change)

The Golden Master tests written in the first stage will allow the aggressive refactoring that is necessary for 4 to be a one-liner. They will not protect the 4th step. So if I want to reap maximum benefit from them I'll have to structure my work into absolute separation of refactoring from transformation (modifying behaviour), because in the phase of transformation the whole Golden Master tests will fail possibly not providing any information on why. Just like well executed TDD.

To me this is a wonderful technique for working with legacy code! I and a few friends have used it in a professional context with excellent results, Rémy Sanlaville wrote about string serialization and Matthieu Cans about coverage. I'm exploring this subject in detail and will post about it as I learn more.

Tooling: ApprovalTests and a powerful mocking library like PowerMock. Code coverage is also essential.

Btw, while we do throw away the tests, the majority of the work is still useful - the wrapper function(s) can be kept as an example and whatever we did to make the system testable has decoupled the system.

Johan Martinsson

jeudi 2 janvier 2014

Golden Master and legacy - a few insights