unit testing: why and what

3 February 2018

Introduction

Over last couple of years I spent a lot of time thinking about different aspects of programming. Even though my main occupation (which is being an iOS engineer) definitely makes my perspective limited, I was still trying to focus on things that are applicable to a broader range of platforms and languages.

I wanted to start the series with something that influenced me the most in recent years and led me to a deeper understanding of programming as a whole — testing. I'll be mostly talking about unit testing (which I have the most experience with) and will briefly touch on other kinds of testing (which I'm not very fond of, perhaps, because of lack of the said experience).

What Are Tests?

If we want to come up with a definition for something, it'd better be constructive and actionable, that is, have real, tangible impact on what you're doing every day. As well as that, it's useful to come up with something more or less objective, something that has a higher chance of being perceived similarly by multiple people and not ultimately boiling down to a matter of personal preference.

The definition of unit tests that I found particularly useful is this:

Unit tests are a functional specification of code that is always synchronised with the code and is checked regularly and in automatic fashion.

In other words, I see tests mostly as a means of communication between developers, and here's what a good test tells me:

The second point may be less obvious than the first one, but for me its extremely important. Another way to phrase this would be that if I see a particular aspect of behaviour that doesn't have a corresponding test case (mind you, this is not the same thing as coverage), it's a strong signal for me that, for some reason, the author just decided to spend their time on writing code that wasn't really necessary. Ideally, every line of code should have some justification for being committed in a project, and unit tests provide you with a way of communicating this justification to others: this is why I had to write this line, this is why I decided to check this condition and so on. The fastest code and the code with least amount of bugs is the code that was never written in the first place, you know.

In other words, I see unit tests as a thought process of a developer, their understanding and expectations of the system, but formalised in the form of (test) code, which can be reproduced by the computer or another developer. And, as any other communication, it can be less or more clear and less or more valuable.

Aside on type systems

There were multiple discussions on the Internets about whether we still need sophisticated type system in a language if we have a extensive test suite, and vice versa. I believe this is a false dichotomy, and in reality both tests and type system serve the same purpose — communication.

The main difference is that tests exercise behaviour, but type system makes proofs. The more advanced type system we have, the more can be expressed with it, and less tests we need to write. The main advantage of a sophisticated type system is that it allows to make invalid states in the system not only harder to reach, but unrepresentable. This means that if a program compiles, it is already free of many kinds of errors, and you don't have to write the code that handles such cases and you don't have to write corresponding tests.

I'd like to specifically single out the following type system properties:

So, what's the catch with type systems? I'd say that usually it's the complexity of the compiler and longer compilation times as a result.

When to Write Tests?

There are two main approaches to writing tests: writing them either before or after code. I believe that writing tests after code is severely diminishing their value because of two reasons:

What (Not) to Test?

Theoretically, all code that you control can be tested, however, it may require some transformations first (more on that later) in order to do this more easily and reliably. Things get worse, when the code you're trying to test touches some other code that you don't control directly. This is usually fine for "inert" data structures like arrays, maps and strings, but most of the time this other code is not ready to be tested. Here are some common ways in which system or third-party frameworks make testing harder:

All this adds up to tests that are harder to write, slower to run and, perhaps, most importantly — harder to maintain. And the worst thing is that you can't really fix it, since it's a code you don't control. Therefore, I usually try to stay away from platform or third-party frameworks in the code I'm planning to test (and correspondingly, don't test the code that is heavily dependent on these frameworks). 100% coverage is a non-goal, especially when it comes at a cost of maintainability.

Test Maintenance

In an ideal world we would just write tests and they would do their job of being a communication mechanism and catching regressions. However, in reality it is often the case that, as your test suite grows, you need to spend non-trivial amount of time supporting it. This obvously includes changing tests when requirements change, but the biggest problem, I believe, is dealing with test failures. Here I'm not talking about failing test as the first step of TDD process, but more about tests that are failing after you made your changes. This includes:

The difficulty of doing this is often exacerbated by running a test suite asynchronously, because of it being too slow to run synchronously and more often, and developers switching to other tasks, loosing the relevant context.

Imagine doing all that just to find out that this was "just a flaky test" and the usual solution being just to re-run again? How many man-hours were wasted doing that? What is even the point of having a test that is failing for random reasons, even when the code hasn't changed? I believe this is a very important problem, which significantly reduces the value of a test suite, and needs to be accounted for from the very beginning, if you plan to employ testing in your project.

Aside on asynchronous tests

I always found asynchronous code being a major source of reliability problems with tests. This stems from the very nature of asynchronicity:

What I find the most troublesome is that all these problems can be avoided quite easily, at the same time giving you more opportunities for testing all aspects of SUT behaviour. I will be talking more about this in next notes.

Higher-level Testing

Higher-level tests, such as integration or UI tests, are very promising when you want to get maximum coverage with the least effort. However, even though I like the idea in general, I have yet to see a working process for these kinds of tests. I see the following set of (all too familiar) problems:

To summarise, I believe that higher-level tests can be useful in certain cases, but they provide too little additional value over a solid suite of unit tests to justify the expensive maintenance process.

Practicalities

I also wanted to share several practical tips on writing good unit tests. Some of them are more important than others, but in no particular order:

FAQ

It's easy to test that 2 + 2 = 4, but real projects are not that simple.

Well, you're absolutely right. However, I think that the important question here is why 2 + 2 = 4 is easy to test and can we achieve the same degree of testability for real projects? I believe that we can, and more on that in upcoming notes.

We need 100% coverage.

Test coverage has a nice advantage of being easy to measure, but other than that I don't find high coverage numbers particularly useful. That being said, lines or branches that are not covered, definitely are signs of missing tests or code that is not actually used.

How do I test that a private method of SUT was called or a private field has changed its value?

I'm sorry, but you don't. Everyone seems to agree that you shouldn't test implementation details of your SUT, so you don't have to change your tests when these details change. However, this question keeps popping up, mostly because:

I don't have a good solution for the first problem (it can be argued that it is not a given that all these preconditions will always share the same expected outcomes). The second one, however, is a signal of an inability to properly isolate SUT during a test.

We need a framework for tests.

Not really. Everything that these frameworks do, can be recreated manually, and this won't be the hardest neither longest part of the whole testing process.

Thank you for reading and stay tuned for Part II!