Building a Safety Net for Continuous Delivery with Developer Tests

It is impossible to develop a software system with a certain level of complexity unless it is built on top of a smaller working system.

I wanted to credit Bjarne Stroustup for expressing this point of view as early as 1985 in his book The C++ Programming Language, but after re-reading his Notes to the Reader, I see that the quote I remembered was on the importance of well-structured code (in which C++ excels over C), not correctness.

Still, I don’t think I am alone when I claim that we software developers find it natural to develop iteratively, thereby continuously building on top of the last iteration, the last working system.

The question is, how do we know that the system we build on is working?

The truth is that unless we have a very good verification process we don’t know if we build on a working system.

In a good continuous delivery process, we will have waves of verification in the form of continuous integration builds and deployments, automatic and manual testing by testers etc.

Naturally, developers will also develop unit tests as an integrated part of developing code, thereby ensuring that each implemented responsibility behaves as expected in isolation.

But is this good enough? Will it ensure that each iteration builds on a working system?

I think it is not good enough because,

  • Testing is usually decoupled in time and space from the development process.
  • Unit testing only verifies tiny pieces of logic in isolation, but bugs typically show up when these pieces of logic are composed into higher level behaviour.

If you ask me, developers needs to write what I call developer tests.

Developer Tests

A developer test is similar to a unit test, the difference being that we never mock any dependencies unless we absolutely must. For example, we mock external web services that our code calls, but we do not mock database access.

When we run a developer test, we run the exact same code as is run in the production system, which means that the behaviour of the test will closely match the behaviour of the production system. This means that the verification which is done by a developer test is very reliable.

When I develop new code, I always exercise the new behaviour through developer tests. This is typically much easier than setting up the production system with the relevant users with relevant permissions and relevant data to query and alter.

When my new developer tests turn green, I feel confident that my new behaviour works as intended, not only in isolation but also when run in context with huge parts of the existing functionality.

When I have verified that the existing developer tests are green I feel confident that I did not introduce regressions.

Then I check the code changes into the main branch and the new feature will be in the next release a short while after.

What Makes Developer Tests Work

I have developed the concept of developer tests over the last couple of years while working on TradingFloor.com. Since it is now second nature to use developer tests as an integral part of the software development process, it is difficult to remember why this seemed difficult, or impossible, to do just a couple of years ago.

A major part of the reason that developer tests work in TradingFloor.com is that the code is (largely) written with sensible principles in mind, and in this context one of the SOLID principles, the Dependency Inversion Principle (DIP), is essential. And furthermore, using Dependency Injection is practical.

This means that when I exercise my new behaviour through the method Foo on class Bar …

public class Bar
{
Bar(IMyDependency1 dep1, IMyDependency2 dep2) { /*…*/}
void Foo() { /*… use dep1 and dep2 */ }
}

… then I also run the code of the two dependencies (and their dependencies, and their dependencies …), including any kind of logging, interception and whatnot. This is in contrast to a unit test in which I would mock the two dependencies.

In addition to DIP, our experience is that the Command Query Separation (CQS) principle is a great help in general in our code structure, and in particular this principle makes writing developer tests easy. I suppose you can imagine that a code base composed of queries (we call them readers) and command handlers are very handy when building up a test scenario and when asserting the outcome of a test.

Why is the Entire World not Using Developer Tests

Developer tests allow for faster development, they provide fast feedback on correctness during development and they provide a safety net for the future.

Yet, I have not seen a rush for all other developers to get on board and start to use developer tests. Why?

Here are some of the counter arguments I have heard so far,

  • It cannot be done.
    That argument is a couple of years old. Today we are doing it on a daily basis.
  • It is too slow.
    No, our 850+ tests run in one minute on a typical developer PC.
  • Developer tests are very brittle.
    No, it is the other way around. Unit tests are often very brittle because you need to re-do your mocking when refactoring code. Developer tests don’t have this problem and they are surprisingly solid towards refactoring.
  • I cannot do it because my code is much more complex than your code.
    If your code is really complex, working without a safety net is not an option! You can do it.
  • I run a heavy SQL database, tests will be too slow and difficult to set up.
    Right, we run a no-SQL database so building up an entire database per test is fast and easy. Installing the database locally and on any build or test system is also easy and fast. All that will be a hassle with some SQL databases, but not impossible. If you have to, you can isolate SQL access and mock it out but I would prefer not to.

Where Are We Now

I would love to share more details but I feel that I need to introduce developer tests to at least one more project before I can express myself without going into too much detail.

I will come back with more information once I have done that. In the meantime, if you would like me to elaborate on this or that, please ask.

Even More Asserts in a Single Unit Test Method

In my last post I stated that sometimes it is OK to have multiple asserts in a single unit test method and I devised a helper class MultiAssert for that.

Please do not get me wrong – I am a great believer in keeping unit tests readable and maintainable, and restricting a unit test to have exactly one assert is one way to achieve that.

But despite that I am now going to argue that sometimes I find it to be OK to have multiple asserts in a unit test method, even without packing all the asserts up using MultiAssert.Aggregate.

Remember that “The arguments against multiple asserts are multiple, a main one being that if one assert fails the rest will not be executed” (from my last post).

However, what if I in this case explicitly want the rest of the unit test not to execute – is it then OK to have one assert ensure that another is not executed? I think so. Read on and I will explain.

The other day, I looked into a unit test that failed randomly. I knew that a person whose unit testing skills I do not usually question wrote it. Still, it turned out that it was surprisingly tricky for me to figure out why it failed.

The following is a simplification of the original unit test,

[TestMethod]
public void DoThis_MustCallDoThatOnFooWithExpectedParameters_WhenCalled()
{
_target.Initialize(_foo, "first", "second");
_target.DoThis();
_foo.AssertWasCalled(
f => f.DoThat(
Arg<string>.Is.Equal("first"), Arg<string>.Is.Equal("second")));
}

In order to figure out why this unit test failed, I started making assumptions.

My first assumption was that DoThat was called with at least one of the two parameters having an unexpected value, so I added a line to the unit test to let Rhino tell me what the actual parameter values for the DoThat call was,

var actualArgs = _foo.GetArgumentsForCallsMadeOn(f => f.DoThat("", ""));
MultiAssert.Aggregate(
() => Assert.AreEqual("first", actualArgs[0][0]),
() => Assert.AreEqual("second", actualArgs[0][1]));

This did not help me, as inspecting actualArgs only caused an index was out of range exception to be thrown.

My second assumption was that DoThat was not called at all. To test this hypothesis I tried the following,

_foo.AssertWasCalled(f => f.DoThat(Arg<string>.Is.Anything, Arg<string>.Is.Anything));

Bingo! The unit test still failed (randomly) with this assert, which showed me that DoThat was not called.

So the single assert of the original unit test was actually multiple asserts behind the scene – one assert to state that DoThat was called and two asserts to state that each of the two parameters were as expected. This is one case of having multiple asserts that I do not like!

With this knowledge, it was quite easy to track down the root cause of the failure. Somebody had checked in a parallel implementation of _target.Initialize so that the initialization of _target randomly made it to completion before _target.DoThis was called. While I realize that it is important to figure out what this new parallel code will do in production code, for now I will keep focus on the correct implementation of this unit test method.

Since we have three asserts in this unit test, some would say that it is obvious to split it into three separate unit tests? Well, I would go for one or, perhaps, two. Read on and I will explain.

If we want to split into three unit tests, that would be

  • One to check if DoThat was called at all,
  • One to check that firstParam had the expected value, and
  • One to check that secondParam had the expected value

It could be argued that this is a lot of repeated setup that should be avoided. I do not agree – common setup can be put in auxiliary methods and execution will be very fast because I have mocked out external dependencies to the code under test.

My problem with the tree unit test approach is rather that the first unit test is a prerequisite to the two other unit tests; if the first fails, the two other will certainly also fail (barring random behaviour). I believe that if you have an assert that, if failed, will make another assert fail with certainty, then these two asserts can, and often must, be packed together into a single unit test method.

So does that mean that the correct implementation of this unit test is to split it into two unit tests?

  • One that checks if DoThat was called, and also checks that firstParam had the expected value, and
  • One that checks if DoThat was called, and also checks that secondParam had the expected value.

I believe that the most maintainable way to implement this is to have a single unit test.

  • One that checks if DoThat was called, and also uses MultiAssert to verify that both firstParam and SecondParam had the expected values.

All in all, this means that we have changed the original – hidden – multiple asserts into two explicit asserts. This is better than the original, as the first assert is a prerequisite of the second, which means that it makes no sense to execute the second if the first fails.

Here is the resulting implementation,

[TestMethod]
public void DoThis_MustCallDoThatOnFooWithExpectedParameters_WhenCalled()
{
_target.Initialize(_foo, "first", "second");

_target.DoThis();

_foo.AssertWasCalled(f => f.DoThat(Arg<string>.Is.Anything, Arg<string>.Is.Anything));
var actualArgs = _foo.GetArgumentsForCallsMadeOn(f => f.DoThat("", ""));
MultiAssert.Aggregate(
() => Assert.AreEqual("first", actualArgs[0][0]),
() => Assert.AreEqual("second", actualArgs[0][1]));
}

 

Multiple Asserts in a Single Unit Test method

The title of this post is a provocation to many people who have read and love Roy Osherove’s brilliant book, The Art of Unit Testing. In this book Roy clearly states that one of the pillars of good tests is to avoid multiple asserts in a unit test.

The arguments against multiple asserts are multiple, a main one being that if one assert fails the rest will not be executed, which means that the state of the code under unit test is really unknown. Another argument is that if you find a need for having multiple asserts, it is probably because you are testing multiple things in a single unit test method. This will break the principle of single responsibility and maintainability will suffer.

I am a great believer in having maintainable and readable unit tests and I have always tried to follow the single assert advise myself. I am also a great believer in the principle of single responsibility, although I am often forced to be pragmatic when working on legacy code. When I want to test several outcomes from a single object I can choose to implement Equals, or maybe ToString in order to do direct comparisons of whole objects. Sometimes I will try to make a utility method or class that will allow me to compare several values in a way that will fit a single assert. While some people do not like adding to the code base for unit testing purposes only, most people object to having too many utilities creeping up in the unit test projects.

Recently I had discussions on unit testing with my colleagues and the reasoning behind single asserts came up – and also some arguments against it.

Let’s have a look at one of Roy’s examples,

[TestMethod]
public void CheckVariousSumResults()
{
    Assert.AreEqual(3, this.Sum(1001, 1, 2));
    Assert.AreEqual(3, this.Sum(1, 1001, 2));
    Assert.AreEqual(3, this.Sum(1, 2, 1001));
}

The problem here is that if one assertion fails, the rest will not be run and we do not know if they would fail if run.

There are a number of solutions to this.

The first solution: Create a separate test for each assert

This is easy and it only takes a few seconds to write those unit tests,

[TestMethod]
public void Sum_1001AsFirstParam_Returns3()
{
    Assert.AreEqual(3, this.Sum(1001, 1, 2));
}
[TestMethod]
public void Sum_1001AsMiddleParam_Returns3()
{
    Assert.AreEqual(3, this.Sum(1, 1001, 2));
}
[TestMethod]
public void Sum_1001AsThirdParam_Returns3()
{
    Assert.AreEqual(3, this.Sum(1, 2, 1001));
}

What is the problem with this solution?

Well, although the example may be slightly contrived it is easy to imagine that the three cases are somewhat correlated. By putting all three asserts in a single method we have signaled that these must be considered as a whole in order to be understood, while if we create separate unit test we have lost this information. And imagine that there were more than three cases, say 42? If a fundamental bug in the Sum method creeps in so that all 42 unit tests fail, would you prefer to have 42 unit tests fail or would you prefer to have a single unit test fail?

Another problem is maintainability. It is correct that it only takes a few seconds to write these three unit tests, but someone needs to maintain them in all future and the task can become daunting due to the sheer number of unit tests.

Both problems can to a certain extend be overcome with proper naming and with true single responsibility of units under test as well as each unit test method, but that is not always the reality – especially when you try to put legacy code under unit test.

The Second Solution: Use Parameterized Tests

In many cases I would prefer to use parameterized tests. However, currently my unit testing environment is Visual Studio 2010 and it does not support such a feature!

The Third Solution: Use try-catch

Since an assertion failure means that an exception is thrown, at least in the unit test frameworks I have used so far, we can simply catch such exceptions, do some intelligent processing, and then allow the next assertion to fail or succeed. That solves our problem with having multiple asserts.

Even though Roy is my unit testing hero, I think he is a bit too hasty to simply abandon the try-catch solution with a statement like "Some people think it’s a good idea to use a try-catch block […] I think using parameterized tests is a far better way of achieving the same thing."

A Simple Solution Using try-catch

Since I cannot write parameterized unit tests with my unit testing environment, I had to come up with an alternative solution. My solution is to introduce a new MultiAssert class which will accept delegates to multiple assert statements but only fail at most once. This new class seems to be a logical addition to the existing family of assert classes along with e.g. CollectionAssert and StringAssert.

Here is the above example in a single unit test with a single assertion,

[TestMethod]
public void CheckVariousSumResults()
{
    MultiAssert.Aggregate(
        () => Assert.AreEqual(3, this.Sum(1001, 1, 2)), 
        () => Assert.AreEqual(3, this.Sum(1, 1001, 2)), 
        () => Assert.AreEqual(3, this.Sum(1, 2, 1001)));
}

MultiAssert.Aggregate can even be used in situations that do not fit parameterized unit tests easily.

Here is the implementation of MultiAssert.

public static class MultiAssert
{
    public static void Aggregate(params Action[] actions)
    {
        var exceptions = new List<AssertFailedException>();

        foreach (var action in actions)
        {
            try
            {
                action();
            }
            catch (AssertFailedException ex)
            {
                exceptions.Add(ex);
            }
        }

        var assertionTexts = 
            exceptions.Select(assertFailedException => assertFailedException.Message);
        if (0 != assertionTexts.Count())
        {
            throw new
                AssertFailedException(
                assertionTexts.Aggregate(
                    (aggregatedMessage, next) => aggregatedMessage + Environment.NewLine + next));
        }
    }
}

Using or Abusing Multiple Asserts

MultiAssert can be abused, it is not meant as a universal excuse for cramming a lot of assertions into any unit test method. Remember that maintainability and readability of unit tests must still be a top priority and you should only use MultiAssert when this can be achieved.

One situation in which I recommend the use of MultiAssert is when it makes sense to assert both pre- and post-conditions in a unit test method. In this context, a post-condition is simply a (single) assert that states something about the state of the world after the Act part of the unit test method. However, if you assert that something has the value 42, how do you know that this was not already true right after the Arrange part of the unit test? After all, the Assert part of your unit test must assert what was supposed to happen as a consequence of the Act part of the unit test method.

So one nice usage of MultiAssert is to assert both pre- and post-conditions in unit tests.

[TestMethod]
public void Foo()
{
    // Arrange
    var underTest = ;
    bool preCondition = underTest.TheFoo() != 42;

    // Act
    underTest.Foo();
    int actual = underTest.TheFoo();

    // Assert
    MultiAssert.Aggregate(
        () => Assert.IsTrue(preCondition),
        () => Assert.AreEqual(42, actual));
}

WPF: How to mark an input field as not valid

I wanted to add to my input controls a visual clue that tells if the current content of the control is valid.

The visual clue I had in mind was a red squiggly line, similar to the way MS Word underlines my typos.

Why this particular visual clue? Because this is how it is done in MS Dynamics AX,

clip_image002[4]

This left me with two tasks,

1. How to trigger the actual validation.

2. How to change the default WPF visual clue (a red border around the control) to the desired red squiggly line.

How to trigger validation

I thought about using validation rules similar to what Nigel Spencer blogged about. But validation is only triggered when the binding value has changed, while I wanted the visual clue to work even at start-up.

I was not alone with this problem. Willem Meints blogged about this as well as about several other validation problems. He actually wrote a complete validation framework that I found would be too much of a good thing for me. He also wrote that he did not use IDataErrorInfo as it does not provide enough flexibility.

I like to keep things simple, so I opted to use IDataErrorInfo.

Basically, I let the business object I bind to report an error if it is not happy about what the user typed so far. It can be augmented by validation rules, e.g. a user defined NumberRangeValue.

For the simple case, meaning the check for mandatory fields having a value, I use code similar to the following,

public string this[string columnName]
{
    get
    {
        if (string.IsNullOrEmpty(columnName))
        {
            return string.Empty;
        }

        if (<this is a mandatory field and has no value>)
        {
            // This is never shown in the UI and does not
            // have to be localized; but if we ever choose
            // to show this text it must be localized!
            return "This field is mandatory.";
        }

        return string.Empty;
    }
}

Now I need to figure out how to trick the visual of controls to show a squiggly red line instead of the default red border.

How to get the red squiggly line

The TextBlock control can actually do the trick with the red squiggly line with its built-in spell checker. Alas, there is no API to trigger this at will.

Instead I found that the following XAML placed in a Resources section would do the trick,

<DrawingBrush x:Key="squiggleBrush" TileMode="Tile"
        Viewbox="0,0,4,4" ViewboxUnits="Absolute"
        Viewport="0,0,4,4" ViewportUnits="Absolute">
    <DrawingBrush.Drawing>
        <GeometryDrawing Geometry="M 0,2 L 1,1 3,3 4,2">
            <GeometryDrawing.Pen>
                <Pen Brush="Red" Thickness="1"
                StartLineCap="Square" EndLineCap="Square"/>
            </GeometryDrawing.Pen>
        </GeometryDrawing>
    </DrawingBrush.Drawing>
</DrawingBrush>
<ControlTemplate x:Key="SquiggleError">
<StackPanel HorizontalAlignment="Center" VerticalAlignment=
        "Center">
        <AdornedElementPlaceholder/>
        <Rectangle Height="4" Fill=
            "{StaticResource squiggleBrush}"/>
    </StackPanel>
</ControlTemplate>
For e.g. a combo box I apply this template as follows,
 
<ComboBox >
    <
Validation.ErrorTemplate>
        <
DynamicResource ResourceKey=”SquiggleError”/> </Validation.ErrorTemplate>
</
ComboBox>
All this ensures that the red squiggly line is drawn when I want it, and it looks close enough to the real thing,

clip_image004[4]

WPF: Making combo box items disabled – also when accessed using the keyboard

When I first embarked on WPF I ran into a number of small problems. Here is one of them.

The problem

I tried various ways to make my data bound combo box items disabled. I found that binding the ComboboxItem.IsEnabled to a property that indicates whether the item should be enabled did the trick.

But this did not work when I selected items using the keyboard. I also tried to make these not focusable but with the same result.

What worked: Click the combo down with the mouse and see that items that must be disabled can in fact not be selected.

What not worked: Use tab and arrow down to select items. They are not disabled and I can select them just fine with keyboard keys. Funny enough, if I click the combo down with the mouse, then the items are in fact disabled and cannot be selected with the keyboard keys.

Here is the XAML (I set up the data, meaning the properties FieldDomainValues, IsSelectable and ValueAsString, in constructers in code-behind):

<ComboBox x:Name="cmbx_test" ItemsSource="{Binding Path=FieldDomainValues}">
    <ComboBox.ItemContainerStyle>
        <Style>
            <Style.Triggers>
                <DataTrigger Binding ="{Binding IsSelectable}" Value="False">
                    <Setter Property="ComboBoxItem.Focusable" Value="False"/>
                    <Setter Property="ComboBoxItem.IsEnabled" Value="False"/>
                </DataTrigger>
            </Style.Triggers>
        </Style>
    </ComboBox.ItemContainerStyle>
    <ComboBox.ItemTemplate>
        <DataTemplate>
            <StackPanel Orientation="Horizontal">
                <TextBlock Text="{Binding Path=ValueAsString }"></TextBlock>
            </StackPanel>
        </DataTemplate>
    </ComboBox.ItemTemplate>
</ComboBox>

What did I do wrong here?

The workaround

Before we dig into the core reason for this problem, let’s have a look at a simple work-around that Parag Bhand suggested,

We also faced similar issues while dealing with binding in combobox. Once we open the dropdown every thing works fine (even from keyboard) then onwards. I guess it has something to do with ItemsContainerGenerator because item containers are not generated until dropdown is opened at least once.

In fact if we open the drop down and close it again programmatically in window’s loaded event handler then also it works fine.

cmbx_test.IsDropDownOpen = true;
cmbx_test.IsDropDownOpen = false;

This is what I do now and it works.

The explanation

Dwayne Need, found the explanation,

ComboBox.SelectItemHelper has logic to determine if the item and its corresponding container allow selection. When there is no container, this logic always returns True. The containers (ComboBoxItems) aren’t created until the dropdown box is measured, which doesn’t happen until the user causes the dropdown to appear, typically by clicking on its down-arrow button.

If you followed that, it’ll be clear why the workaround works. It causes a measure on the dropdown, which generates the containers. After that, SelectItemHelper’s logic picks up the state the app has declared for the containers. It should also be clear why we don’t do that for you automatically – the workaround creates UI that is never displayed.

This looks like a scenario bug to me. While the technical details make sense, the end-to-end scenario is broken.

Please file a bug.

This has now been filed as a bug (750993: Unable to disable combo-box items when selected with the keyboard).

Unfortunately the fix for this bug will not make it into .NET 4.0 so we must try and get around with the workaround for now.