Building a Safety Net for Continuous Delivery with Developer Tests

It is impossible to develop a software system with a certain level of complexity unless it is built on top of a smaller working system.

I wanted to credit Bjarne Stroustup for expressing this point of view as early as 1985 in his book The C++ Programming Language, but after re-reading his Notes to the Reader, I see that the quote I remembered was on the importance of well-structured code (in which C++ excels over C), not correctness.

Still, I don’t think I am alone when I claim that we software developers find it natural to develop iteratively, thereby continuously building on top of the last iteration, the last working system.

The question is, how do we know that the system we build on is working?

The truth is that unless we have a very good verification process we don’t know if we build on a working system.

In a good continuous delivery process, we will have waves of verification in the form of continuous integration builds and deployments, automatic and manual testing by testers etc.

Naturally, developers will also develop unit tests as an integrated part of developing code, thereby ensuring that each implemented responsibility behaves as expected in isolation.

But is this good enough? Will it ensure that each iteration builds on a working system?

I think it is not good enough because,

  • Testing is usually decoupled in time and space from the development process.
  • Unit testing only verifies tiny pieces of logic in isolation, but bugs typically show up when these pieces of logic are composed into higher level behaviour.

If you ask me, developers needs to write what I call developer tests.

Developer Tests

A developer test is similar to a unit test, the difference being that we never mock any dependencies unless we absolutely must. For example, we mock external web services that our code calls, but we do not mock database access.

When we run a developer test, we run the exact same code as is run in the production system, which means that the behaviour of the test will closely match the behaviour of the production system. This means that the verification which is done by a developer test is very reliable.

When I develop new code, I always exercise the new behaviour through developer tests. This is typically much easier than setting up the production system with the relevant users with relevant permissions and relevant data to query and alter.

When my new developer tests turn green, I feel confident that my new behaviour works as intended, not only in isolation but also when run in context with huge parts of the existing functionality.

When I have verified that the existing developer tests are green I feel confident that I did not introduce regressions.

Then I check the code changes into the main branch and the new feature will be in the next release a short while after.

What Makes Developer Tests Work

I have developed the concept of developer tests over the last couple of years while working on TradingFloor.com. Since it is now second nature to use developer tests as an integral part of the software development process, it is difficult to remember why this seemed difficult, or impossible, to do just a couple of years ago.

A major part of the reason that developer tests work in TradingFloor.com is that the code is (largely) written with sensible principles in mind, and in this context one of the SOLID principles, the Dependency Inversion Principle (DIP), is essential. And furthermore, using Dependency Injection is practical.

This means that when I exercise my new behaviour through the method Foo on class Bar …

public class Bar
{
Bar(IMyDependency1 dep1, IMyDependency2 dep2) { /*…*/}
void Foo() { /*… use dep1 and dep2 */ }
}

… then I also run the code of the two dependencies (and their dependencies, and their dependencies …), including any kind of logging, interception and whatnot. This is in contrast to a unit test in which I would mock the two dependencies.

In addition to DIP, our experience is that the Command Query Separation (CQS) principle is a great help in general in our code structure, and in particular this principle makes writing developer tests easy. I suppose you can imagine that a code base composed of queries (we call them readers) and command handlers are very handy when building up a test scenario and when asserting the outcome of a test.

Why is the Entire World not Using Developer Tests

Developer tests allow for faster development, they provide fast feedback on correctness during development and they provide a safety net for the future.

Yet, I have not seen a rush for all other developers to get on board and start to use developer tests. Why?

Here are some of the counter arguments I have heard so far,

  • It cannot be done.
    That argument is a couple of years old. Today we are doing it on a daily basis.
  • It is too slow.
    No, our 850+ tests run in one minute on a typical developer PC.
  • Developer tests are very brittle.
    No, it is the other way around. Unit tests are often very brittle because you need to re-do your mocking when refactoring code. Developer tests don’t have this problem and they are surprisingly solid towards refactoring.
  • I cannot do it because my code is much more complex than your code.
    If your code is really complex, working without a safety net is not an option! You can do it.
  • I run a heavy SQL database, tests will be too slow and difficult to set up.
    Right, we run a no-SQL database so building up an entire database per test is fast and easy. Installing the database locally and on any build or test system is also easy and fast. All that will be a hassle with some SQL databases, but not impossible. If you have to, you can isolate SQL access and mock it out but I would prefer not to.

Where Are We Now

I would love to share more details but I feel that I need to introduce developer tests to at least one more project before I can express myself without going into too much detail.

I will come back with more information once I have done that. In the meantime, if you would like me to elaborate on this or that, please ask.

Continuous Delivery – a Safety Net will make you Lazy

I sincerely believe that some kind of safety net is needed when coding.

In fact I believe that having a safety net is especially important when doing continuous delivery.

Before I managed to blog on my opinions regarding this, I read Scott Berkun’s book The Year Without Pants in which Scott makes quite the opposite argument based on his experience from Automattic (and contrary to his experience from Microsoft).

A Safety Net will make you Lazy

Essentially, Scott backs up Automattic’s belief in the philosophy: safeguards don’t make you safe; they make you lazy. This may to a certain extend be true in some cases, as some people drive faster when they get ABS brakes, and football players take more risks because of their padding. And on the same token, if you find yourself in a high tower with no railing, you will be very cautious about every step you take as a fall would kill you. And since you are very cautious, it is unlikely that you will be killed.

Does this philosophy work in software development? Should we skip manual and automatic testing as well as other kinds of verification before we deploy the latest changes to the Live system? Should we essentially skip the entire safety net and rely on developers being very cautious?

No!

Being cautious only takes you part of the way. And if you are too cautious, there will be much needed changes to the code that you will never dare do. Besides, even cautiously made changes could have unexpected effects, regressions, on other parts of the code. If you are too cautious, your code will rot and eventually become unmanageable. (There is a brilliant description of code rot and how to avoid it in Robert C Martin’s book Clean Code)

Still, in a perfect world, coding without a safety net could actually work. In theory it’s simple, and I have already blogged about it. First of all, all changes to code must be small, additive increments – baby steps. Secondly, the code must be crafted by rigorously following the SOLID principles. With a perfect code base with low coupling and high coherence, most baby step changes would consist of adding new code that plug in without changing existing code, or would be a few simple changes to a single existing class, in either case the impact on the system would be restricted and well understood.

Alas, the world is not perfect, neither is the code base that most developers work on.

Besides, sometimes you need to do a refactoring that will impact quite a bit of functionality. Sometimes you change a single or a few lines of code, but there is no simple way to fully understand its impact. In both cases the risk of regressions can be lowered with rigorous verification.

Building a Safety Net

I am quite happy that I read Scott’s book, as it made me think a bit deeper about building up a safety net. (And I can certainly recommend the book to anyone interested in the process of developing software.) Note that what I mention here is regarding the part of the safety net that developers must build and maintain.

Here is my opinion:

  1. Build and maintain automatic tests for non-trivial functionality.
  2. Do not build tests for trivial, unimportant or easily verified functionality.

The second bullet is based on my experience that often huge amounts of tests are made, but the maintenance burden is so high that the tests are not maintained, new tests are not written and (unit) testing in general gets a bad reputation among developers. In such a case, needed refactoring is generally avoided and the code will rot. For these reasons, it is good practice to avoid tests that would only reveal bugs of low severity, of which many would be found anyway, simply with a quick glance at the system.

So the trick is to have exactly the tests that make sense and ensure that they are maintainable.

Even then, a safety net could make a developer lazy. It is never an option to simply throw the code over the fence to the Testing Department, effectively making buggy code somebody else’s problem. Rather, developers must build up a safety net as an integral part of developing code.

Being cautious and having a safety net is the way to go.

Why do some Developers Prefer not to have a Safety Net?

As a final note I have a possible explanation to why Automattic developers prefer working without a safety net.

Scott explains how he once went to India and climbed the stone tower of Jantar Mantar. There was no railing and a fall would kill anyone. But people were cautious because of the lack of safety measures.

I also climbed the stone tower of Jantar Mantar years ago when I was much younger. I clearly remember looking down at our hilarious guide at the ground, but I do not particularly remember the missing railing.

Could it be that focus on safety measures increases with age and experience?

Even More Asserts in a Single Unit Test Method

In my last post I stated that sometimes it is OK to have multiple asserts in a single unit test method and I devised a helper class MultiAssert for that.

Please do not get me wrong – I am a great believer in keeping unit tests readable and maintainable, and restricting a unit test to have exactly one assert is one way to achieve that.

But despite that I am now going to argue that sometimes I find it to be OK to have multiple asserts in a unit test method, even without packing all the asserts up using MultiAssert.Aggregate.

Remember that “The arguments against multiple asserts are multiple, a main one being that if one assert fails the rest will not be executed” (from my last post).

However, what if I in this case explicitly want the rest of the unit test not to execute – is it then OK to have one assert ensure that another is not executed? I think so. Read on and I will explain.

The other day, I looked into a unit test that failed randomly. I knew that a person whose unit testing skills I do not usually question wrote it. Still, it turned out that it was surprisingly tricky for me to figure out why it failed.

The following is a simplification of the original unit test,

[TestMethod]
public void DoThis_MustCallDoThatOnFooWithExpectedParameters_WhenCalled()
{
_target.Initialize(_foo, "first", "second");
_target.DoThis();
_foo.AssertWasCalled(
f => f.DoThat(
Arg<string>.Is.Equal("first"), Arg<string>.Is.Equal("second")));
}

In order to figure out why this unit test failed, I started making assumptions.

My first assumption was that DoThat was called with at least one of the two parameters having an unexpected value, so I added a line to the unit test to let Rhino tell me what the actual parameter values for the DoThat call was,

var actualArgs = _foo.GetArgumentsForCallsMadeOn(f => f.DoThat("", ""));
MultiAssert.Aggregate(
() => Assert.AreEqual("first", actualArgs[0][0]),
() => Assert.AreEqual("second", actualArgs[0][1]));

This did not help me, as inspecting actualArgs only caused an index was out of range exception to be thrown.

My second assumption was that DoThat was not called at all. To test this hypothesis I tried the following,

_foo.AssertWasCalled(f => f.DoThat(Arg<string>.Is.Anything, Arg<string>.Is.Anything));

Bingo! The unit test still failed (randomly) with this assert, which showed me that DoThat was not called.

So the single assert of the original unit test was actually multiple asserts behind the scene – one assert to state that DoThat was called and two asserts to state that each of the two parameters were as expected. This is one case of having multiple asserts that I do not like!

With this knowledge, it was quite easy to track down the root cause of the failure. Somebody had checked in a parallel implementation of _target.Initialize so that the initialization of _target randomly made it to completion before _target.DoThis was called. While I realize that it is important to figure out what this new parallel code will do in production code, for now I will keep focus on the correct implementation of this unit test method.

Since we have three asserts in this unit test, some would say that it is obvious to split it into three separate unit tests? Well, I would go for one or, perhaps, two. Read on and I will explain.

If we want to split into three unit tests, that would be

  • One to check if DoThat was called at all,
  • One to check that firstParam had the expected value, and
  • One to check that secondParam had the expected value

It could be argued that this is a lot of repeated setup that should be avoided. I do not agree – common setup can be put in auxiliary methods and execution will be very fast because I have mocked out external dependencies to the code under test.

My problem with the tree unit test approach is rather that the first unit test is a prerequisite to the two other unit tests; if the first fails, the two other will certainly also fail (barring random behaviour). I believe that if you have an assert that, if failed, will make another assert fail with certainty, then these two asserts can, and often must, be packed together into a single unit test method.

So does that mean that the correct implementation of this unit test is to split it into two unit tests?

  • One that checks if DoThat was called, and also checks that firstParam had the expected value, and
  • One that checks if DoThat was called, and also checks that secondParam had the expected value.

I believe that the most maintainable way to implement this is to have a single unit test.

  • One that checks if DoThat was called, and also uses MultiAssert to verify that both firstParam and SecondParam had the expected values.

All in all, this means that we have changed the original – hidden – multiple asserts into two explicit asserts. This is better than the original, as the first assert is a prerequisite of the second, which means that it makes no sense to execute the second if the first fails.

Here is the resulting implementation,

[TestMethod]
public void DoThis_MustCallDoThatOnFooWithExpectedParameters_WhenCalled()
{
_target.Initialize(_foo, "first", "second");

_target.DoThis();

_foo.AssertWasCalled(f => f.DoThat(Arg<string>.Is.Anything, Arg<string>.Is.Anything));
var actualArgs = _foo.GetArgumentsForCallsMadeOn(f => f.DoThat("", ""));
MultiAssert.Aggregate(
() => Assert.AreEqual("first", actualArgs[0][0]),
() => Assert.AreEqual("second", actualArgs[0][1]));
}