Unit testing, you’re doing it wrong

(this is a simple reposting of the Medium version)

TLDR; Existence of untested code in the wild should worry you: most of our
lives is now somehow software controlled. Good news is
that you can do something about it. Also, there is confusion about what
unit testing means.

Disclaimer

I understand that I am addressing a very sensitive topic; I will probably
offend many readers that will say that I am an insane troll and that my views are bullshit. Offending is not my objective, but I stand by my opinions. Of
course comments are here to help you voice your opinion. And yes this piece is
biased by my past experiences, but that’s the point of it, sharing my
experiences.

‘How legitimate are you?”

Fair question. I have a 35 years career in IT; I have worked at companies of
various sizes and culture. I have often been in some transversal position and
had the opportunity to meet and work with a lot of developers (think x000s)
While most of my positions involves code, I also touched on QA and BA
activities. I am now in CTO-like positions for 2500 ITs and had the great
privilege to work with well-known french experts, as well as lesser-known ones.

So my opinion is based on things and events I have experienced first-hand as a developer, things I have seen others struggle or succeed with, problems
encountered by teams I have helped as well and views and issues that other
experts taught me about.
Basically, I have been through all this sh*t and made most of the mistakes
listed here.
Of course, this dos not imply that I am
right, but at least, grant me that I have a comprehensive view of what I am
talking about.

Fallacies about unit testing

1. TDD is all about unit tests

You keep using that word

Big NO, TDD, a.k.a ‘Test First Development’ is about defining what the code is
expected to produce, capturing this __as some __test(s) and
then implementing just
enough code to make it pass
. Unit testing is about testing small parts of
the code in isolation
, e.g. testing some class’s methods, maybe using some
stubs/mocks to strip dependencies.

unit tests

Unit tests are promoted for their speed and
focus
: they are
small, with limited dependencies, hence run (usually fast). When a unit test
fails, it is easy to identify which part of the code is responsible.

Actually, TDD is about every form of
tests
.
For example, I often write performance
tests as part of my TDD routine; end-to-end tests as well.
Furthermore, this is about requirements, not implementation: you write a
new test **when you need
to fulfill a requirement. You do not write a test
when you need to code a new class or a new method**. Subtle, but important
nuance.

And when Kent Beck wrote about tests being isolated, he meant between one and
another. For example, having one test inserting record in a table while
another reads that same table is probably a bad idea, as the result of the
tests may vary depending in the order of which the tests are run.


2. Automated testing is all about unit tests

No, automated testing describes a process: having tests automatically run as
part of your build/delivery chain. It covers every kind of tests you can
perform automatically
: behavior tests, stress tests, performance tests,
integration tests, system tests, UI tests….

There is an emphasis on unit tests because they are fast, localized and you
can execute them en masse. But feature tests, use case tests, system
tests, performance tests
, you name it, must be part of your building
chain
.
You must reduce as much manual tests as you can. Manual tests are expensive
and give slow feedback.

Sickness


3. 100% code coverage requires extensive unit testing

NO, NO, NO and f…g no. In a perfect TDD world, untested code does not
exist in the first place
.

Writing a test is akin to writing down a contract or a specification, it fixes
and enforces many decisions.
Your tests must focus on behavior; behavior driven and use cases tests are the most important ones. Code coverage must include every tests,
disregarding its type.
tests tests and tests
Improving coverage by simply adding specific tests for untested methods and
classes is wrong
. Added tests must be justified by some
requirements (new or existing); otherwise, it means the code has no actual
use
. It will lead your codebase to excessive coupling between tests and
implementation details and your tests will break whenever refactoring occurs.

For example, if you implemented a calendar module that support Gregorian to
Julian
conversion
,
either you have a pertinent test for this feature, or you just remove it.


4. You have to make private methods public to reach 100%

Exposing

Again, no: private methods will be tested through public entry points.
Once again, unit testing is not about testing methods one by one.

Wondering about how to test private methods is a clear sign you’ve got TDD
wrong. If this is not clear to you, I suggest you stop ** UNIT TESTING**
altogether and contemplate BDD. When you get the grasp on BDD, you will be able to embrace TDD.
If they cannot be tested in full, you need to challenge the relevance
of the non covered part: it is __probably __useless code.


5. Some code do not need be tested

The design of the Death Star is Rebel proof, right ?!

This will never happen, right?

This one is somewhat true, but probably not to the extent you think it is:
code that works by construction does not require testing if it never changes.
That being said, please show me some code that will never change.

Plus, I am an average developer, and my long
experience have taught me that my code working on the first attempt is an
happy accident
.
Even if you are the god of code, chances are somebody else will break
your code in a couple of months, weeks or even hours.
And yes, that somebody else is probably the future you. Remember as I said
earlier, a test is a contract. And contracts exist because people
change, context changes, etc….

I often get this remark: “Testing getters or setters is simply a waste of
time.”
. Seems pretty obvious, isn’t it?
What is wrong with this remark is the implicit notion of testing (trivial)
getters or setters in isolation
. Which would probably be not only useless
but likely harmful.
Unit testing is not about testing method in isolation. Your getters and
setters should be tested as part of a larger, behavior related, test.


6. You need to use a mocking framework

Isn’t it cute and mesmerizing ?

Nifty, isn' it

Nope, chances are you don’t. Mocking frameworks are great pieces of
engineering, but almost every time I have seen a team using it, mocks were
pervasive within the test base with little to no added value. I have seen
tests that ultimately
test no production code whatsoever, but it took me hours peering at the code to
come to that conclusion.

Often teams are using mocks to test class in isolation, mocking every
dependencies. Remember, ‘unit’ in unit testing is to be understood as a
module or a component, not a class.

Whenever you decide to introduce a mock, you enforce a contract that makes
refactoring more difficult.

Mocks are here to help you get rid of slow or unstable
dependencies
, such as a remote services, or some persistent storage.

You should not test for collaboration/dependencies between classes. Those tests
are useful if you do bottom-up/inside-out TDD, but you must get
rid of them once the feature is complete
.
Philippe Bourgau has a
great set of posts on this topic
if you are wanting to dig further.

7. Tests are expensive to write

Yes, testing is expensive in most of industries: think about testing
a home appliance, a drug or a new car…

Expensive test run in real life

Actual crash test

But code is incredibly cheap, giving
the impression that tests are needlessly costly, in a relative way.

They do require extra effort, but they are efficient compliment or even
replacement for specifications, they improve quality, bring fast feedback,
secure knowledge for newcomers.

But green tests look useless both to the team and to management.


8. The ‘testing pyramid’ is the ultimate testing strategy

You have probably heard of the testing pyramid. It basically states that
you should have a lot of unit tests, less component tests, then less
integration tests, and so one, up to the top of the pyramid where you have a
few use case based/acceptance tests
. It is used as the default testing
strategy for most projects.

Pyramids can be dangerous!

Testing Pyramid

Truth to be told, the ** testing pyramid outlived its usefulness**.
Its original purpose was to address the fact that high level tests can have a
long execution time and that cause for causes of failure may be hard to
identify. It therefore pushes to invest more in unit tests, which are both fast and local, by definition.

This is also a dangerous analogy, giving the impression that a ratio of 1000 to 1 between unit and use case based tests is a desirable thing.

You should focus on the top of the pyramid, not the bottom !

I often see teams that have only a couple of high level tests, that
covers some of the core use cases, of crude, nothing
more than glorified smoke tests. And then thousands of method tests to
ensure a high coverage. This is not good.

You need to have a decent set of use case based tests for your system, ideally
covering all use cases, but major ones is a good start.
This tests must be rely on your high level public APIs, just ‘below’ the
user interface.
Then have some performance tests for the performance sensitive parts of
the application, integrates also failures reproducing tests, such as
external dependencies that are down (thanks to mock), to make sure your system
handles those properly.
And then, unit (as in module) tests for the dynamic part of your code base.
Then understand the trade off:
* Having a few unit tests means your design can
easily be changed, but it means that finding the root cause of a failing high
level tests will take time (and probably debugging).
* Having a lot of those means you find issues as soon as they are introduced
in the code base, but significant re design of your solution will be ridden
with failing tests.

if at any point in time you need to have finer tests, such as class or
method tests, throw them away as soon as you no longer need them
, such as
when the initial design and implementation phase is over. Otherwise they will
drag your product down slowly.


What about some truths ?

1. Unit tests are not about testing a method in isolation

Here is what Wikipedia proposes:

In computer programming, unit testing is a software testing method by which
individual units of source code, sets of one or more computer program
modules together with associated control data, usage procedures, and
operating procedures, are tested to determine whether they are fit for use.[1]

isolation
Good tests must test a behavior in isolation to other tests. Calling
them unit, system or integration has no relevance to this.

Kent Beck says it so much better than I could ever do.

From this perspective, the integration/unit test frontier is a frontier of
design
, not of tools or frameworks or how long tests run or how many lines
of code we wrote get executed while running the test.

Kent Beck


2. 100% coverage does not mean your code is bug free

This the first rebuttal I get whenever I talk about 100% coverage.
Of course, it does not. Coverage only shows which part of the code have
been executed. It does not guarantee that it will work in all
circumstances
, and it may still fail for specific parameters’ values, some
application state or due to concurrency issue. Also, it does not prove the
code produce the required output in itself; you need to have adequate
assertions
to that effect.
Unit tests vs integration tests.

This is especially true if you only perform unit testing!

Coverage metrics are not about what is covered, but about what is not
covered.

Non covered means not tested. So at least make sure that non tested parts
are non critical and that important part of your code must be properly
tested
!


3. There is a tooling problem

The truth is unit tests are in the spotlight mostly thanks to tooling!
We should be all eternally grateful to Kent Beck for creating sUnit, the
library which triggered a testing revolution, but we must not stop there.

Are you using test coverage tools (JCov, Clover, NCover, Jasmine…)?
Do you look at their report?

Have you tried continuous testing tools (InfinyTest, NCrunch, Wallaby…)?
I have a bias: I am addicted to NCrunch.

Having your tests running continuously is a game changer for TDD!

Me

No seriously, do it, now! It will change your perceived value for tests.

Have you tried Cucumber to have a more use case driven approach? You may
also consider using
Mutation Testing
, to assess the quality of your tests.
Property Based Testing
is useful to check for invariants and higher level abstractions.

![Testing](./Engine tuning.jpg)


4. It is difficult

Yes, but this is no more difficult than designing the software up front.
You face complexity, but what is interesting in test first approaches,
is that you have an opportunity to focus on
essential complexity
as test code ought to be simpler than actual implementation.
Difficult

I have animated many craftsmanship discovering sessions based on Lego
exercises (French
deck)

. After the TDD exercise, attendants often express that the difficult part
was choosing the right test
, and building the solution was straightforward.
Interestingly, even non coder profiles (BA, managers, CxO, …) share this
feeling, sometime event saying how comfortable it was just to follow
requirements, versus the hardship of identifying a test (in TDD mode).

Choosing the next test is an act of design.

(attributed to) Kent Beck

I attribute this difficulty to a set of factors:
1. it forces you to think problem first, while solution first is
everyone comfort zone
2. it constrains your design, and nobody likes extra constraints
3. it gives you the impression of being unproductive

But all those factors turn into benefits:
1. Problem first is the right focus!
2. Constraints help you drive the design. And as you are problem first, this
is bound to be a good design.
3. Worst case, tests will be thrown away. But they helped you build a solution
and a deep understanding of the problem. At best, they prevent future
regression, and provide help and documentation for future developers.

Writing tests is never unproductive.


5. Tests require maintenance

Maintenance

Tests require maintenance effort as any other piece of code. It needs
refactoring along the source code, but it may also requires
refactoring on its own.
They will have to be updated if new use cases are identified, or if existing
ones must be altered.

To sum it up: tests are part of your codebase and must be treated as such.
Which leads to the next truth:


6. Having too many tests is a problem

Since tests need to evolve with the production code, too much tests will
hamper your productivity: if changing some lines of code break hundred
tests or more, the cost (of change) becomes an issue.
This is a sure sign of failing to tender for your tests appropriately:
tests may be replicated with only minor variations, each one adding little
value.

I have seen projects and teams that were grounded to a halt due to having a
far too large test base. Then there is a strong likelihood that the test base
may be simply thrown away, or cut through savagely.

Automated tests

Ultimately, tests also increase build time, and as you are doing continuous
build/delivery (you are, aren’t you?), you need to keep build time as low as
possible.

This has a clear consequence:


7. Throwing away tests is a hygienic move

It should be obvious by now that you need to maintain a manageable
number of tests.

Therefore you must have some form of optimization strategy for you test base.
Articles are pretty much non existent for this kind of activity, so let me make
a proposal:
– getting rid of scaffolding tests should be part of your TDD/BDD coding
cycle.
By scaffolding tests, I mean tests that you used to write the code in the
first place, identify algorithm(s) and explore the problem space. Only keep
use case based tests.
– make regular code coverage review, identify highly tested lines and remove
tests you find redundant.

You can see this thread for an extensive
discussion on having too many tests.
Recycling


8. Automated tests are useful

Last but not least. Automated tests have a lot of value.
Yes, a green test looks useless, like any security device: safety belt,
life vest, emergency brakes…

If you practice
TDD, tests have value right now. But even if you don’t, tests have value in
the long run.

An interesting and important 2014 study done analyzed 198 user reported
issues
on distributed systems. Among several important findings, it
concluded that 77% of the analyzed production issues can be reproduced by a
unit test.

Another key finding was that almost all catastrophic failures were the
result of incorrect error handling
.

Catastrophe
Simple testing can prevent most critical failures

Source study


Conclusion

First of all, thanks for having the patience of reading this so far. If you
are dubious about unit tests, I hope this article cleared some of your
concerns and gave you some reason to try it.
If you are already doing unit testing, I hope I offered you some guidance to help you avoid the dangerous mines that lie ahead.
And if you think you’re a master at unit testing, I hope you share my point of views and that I gave you strong arguments to convince other.

Each of the facts I listed previously is worthy of a dedicated talk or article.
Digging further is left as an exercise for the so minded reader.

Remember:
1. Tests are useful, they can prevent catastrophic failures.
2. Test behaviors, not implementation. A.k.a. understand what unit
stands for in unit tests.
3. Maintain your test base with the delicate but strong hand of the
gardener
: gently refactoring when necessary and pruning out when no longer
useful.

Advertisements

100% code coverage is good

100% code coverage is good

Quick Rex about 100% code coverage

read comments here

TLDR; maintaining 100% coverage brings many benefits, you need to try it.

A few years ago I blogged about aiming for 100% code coverage for your tests. This post made some noise and the feedback was essentially negative. I was even called out as a troll a few times…

Being stubborn and dedicated, I understood I needed to put my money where my mouth was, and start to practice what I preached. I did and this post is about what I learned by reaching 100% code coverage for my tests.

Continue reading “100% code coverage is good”

My very first computer, and why it matters

Alt
It’s never as good as the first time.

Do you remember your first time? Do you think it is important?
Do you think it still influences you somehow?

Well, I do, I do and I do.

A couple of weeks ago, I received my long awaited Recreated ZX Spectrum,
basically a bluetooth keyboard shaped like an ’82 8 bits computer.
This is a piece of memorabilia from my youth, as the ZX Spectrum was my
very first computer.
It gave me an opportunity to ponder what it did taught me.

But first, let’s go through its ~~impressive~~ list of features
– A whopping 49152 bytes of RAM
– A 3.5Mhz Z80 8 bits processor, with an impressive 0.25 IPC
– 16384 bytes ROM with integrated basic interpreter
– A 256*192 colored pixels screen resolution, but color management is tricky
(more on that later)
– A 1500 bps in/out tape interface for persistent storage
– All of that in its signature book sized plastic encasing sporting a timeless
rubber keyboard

Those specs bear little meaning today! It is actually difficult to believe that
you could accomplish anything with those.
But they had an unsurpassable quality: they were engaging and not intimidating!

I had to learn how to enter instructions as it was required even to start a
game.
I could even compare my programs to off the self software as those were
done by one or two guys in a couple of months!

And, I gained some interesting skills:

  1. How to learn a programming language: BASIC at that time. Since then,
    I went through various ASMs, C, C++, Pascal, Ada, Smalltalk,
    some Lisp and Prolog, Java and more recently F#.
  2. I need to be aware of how much memory my programs used.
    Allocating a 100 x 100 int array uses up half available memory.
    And we are talking 16 bits int!
  3. How to read/disassemble others’ programs and learned a lot
    from those. Soon the debugger was my best friend.
  4. How to use non documented functions. Heck, nothing was documented.

Were they really beneficial? As a matter of facts,
I should probably talk about bad habits:

  1. BASIC, I mean, BASIC of all languages. This meant a lot of GOTOs
  2. I mostly learn nasty tricks to shave every single byte whenever possible
  3. I hacked other applications to remove protections or copy algorithms
  4. I created fragile code, depending on undocumented features, including
    undocumented opcodes for the processor

When I started my developer career I was still there. That’s the problem when you
code alone for your own: no feedback.

On one hand, being a self learner and self taught gave me some advantages. I did
read a lot, so I had a lot of book knowledge and knew C++ inside out. Which meant
that I lacked the humility that is needed to learn from feedback and I did
commit a lot of atrocities in the name of cute code cleverness.

Walk of shame

I still have a vivid memory of what I fear is the worst design I have ever
produced
.
The platform was C++ on OS/2 and the product we were working on was
getting close to release and, as an attempt to improve supportability,
I designed an error class.
And I decided to be clever by providing several overloaded assignment operator
(=) to capture the error code, error message and error category. Let’s see

MyError error;

error = 45;
error = "Invalid user id";
error = ErrorType.Security;

// which stands for
error.code = 45;
error.message = "Invalid user id";
error.type = ErrorType.Security;

I mean, come on, is there anyone who could see this as a good idea?

I did at the time, but I definitely no longer do!

So what did I actually learn from my first computer?!
I am afraid it will have to wait for the next part of this post.

Stay tuned

A post about agility, architecture and scales

Newton vs Einstein: it gets physical

Newtonian physics ruled the mechanical world for a couple of centuries. It was simple, elegant, almost intuitive.

Relative pressure

Then, early twentieth century, an obscure swiss patent clerk, (Einstein if that rings a bell) demonstrated that it did not work at larger scale and it needed adjustments. It was a daring theory, somewhat far-fetched but it has been proven right every time since.

Then a couple of decades later, quantum mechanics wreaked havoc at the smaller scale, demonstrating that anything goes! It was counter intuitive, unfathomable and it is still. particles are probabilistic, you have to choose between knowing their location or their speed…

Quantum mechanics rulz

But Newtonian mechanics still rules our day-to-day life, because it is accurate enough to account for our experiences. But serious physicists need either Relativity or Quantum mechanics to dig deeper at non human scale. And to this day, nobody has been able to reconcile quantum physics and relativity.

What about agility?

Agile values are really people centric. Indeed, when I was first introduced to them, I thought this was a great way to reconcile users with software developers. But beyond that, agile inspired methodologies focus on adaptability whereas traditional project management method focus on careful planning.

But careful planning is only as good as what you use to establish your plan, traditional approaches require a lot of up front information to succeed, a requirements that is difficult to fulfill for software development.

Planning, really?

Instead, Agile focuses on getting the best out of the information you have and the central notion of maximizing information before using it trough the notion of ‘last responsible moment’.

There is no question Agile inspired methodologies had great success and help improve software project success rates.

But none of them scale.

Smaller

They do not scale down: agile method will not help you design a faster sort algorithm or a Sudoku solver.

I mean, I am not sure how individual interactions would help there.

There, design principles, previous works, patterns are the tool you will need.

Agile methodologies have no relevance there.

Larger

They do not scale up either: being focus on people is great for small groups, but how to scale them for hundred, thousands or even larger groups? How do you engage C-level stakeholders? How do you make those organizations embrace change?

That is where you need Enterprise Architecture practices. As it is often the case with any tool or methodology, there are many ways you can fail with those. But a key attributes to success is to have the right attitude, being a facilitator.

But describing Enterprise Architecture is beyond the scope of this post.

As a conclusion

Trying to change a large organization/IS using some agile methodology is akin to trying to use quantum physics to describe a car. Yes any matter is made out of sub-atomic particles, but you will simply not succeed because it is too complex.

Understand that Agility and Enterprise Architecture are related in their objectives, but at very different scale.

Adopt both, and use accordingly.

Repeat and succeed.

Mechanical sympathy, introduction

Mechanical Sympathy…Jackie Stewart, 1968

This term has been coined by Jackie Stewart, a famous British race driver. He used it to describe his driving philosophy; he spent countless hours with the engineers and mechanics to get a deep understanding of where lied the limits of the mechanic, allowing him to get close to the edge and get the max out of it and when to let it rest a bit.

This strategy helped him win races when competitors were pushing their car too hard.

Gravity

The thought process is obviously complex, but it often relies on mental modelsThose are abstractions the brain use to understand, analyze and interact with the physical world.

They are built through the brain’s learning process, refining them until they are proven accurate enough to act upon them.

For example, we all have a clear mental model of gravity: if we drop an object, it has an accelerated fall. So, this mental model helps us catch something that may have slip out of our hands. If we train hard, we get better at this, until we have a decent juggling ability.

About reality

Coming back to Jackie Stewart:

every driver (incl. you and me) has a mental model of a car, which helps him steer the car; but he can only spend so much time refining his model, i.e. driving and exploring possible situations. And he relies on a very basic model of the car engine, brakes and stickiness of the tires. That’s why he loses most of his skills if it rains or if there is a mechanical malfunction.

By spending a significant time with technicians, James was able to refine his mental models of the various car components. In the process, he could better anticipate the brakes/engine/steering behavior of his car….

Old technology

In my younger years, I was the proud owner of an Atari ST: I was constantly cracking games’ protections, learning how they were written and coding technical demos of some sort. Those demos required expertise and a deep understanding of the guts of the computer, which mostly meant understanding the MC68000 processor.

Its frequency was 8MHz and instructions took between 2 and 20 or more of those cycles, depending on how many memory accesses were required. Typical cycle count was around 6-8, leading to  (roughly) 1M instructions per second (3 orders of magnitude less than now). So if you wanted a 50 fps demo, your algorithm had to fit in 160,000 clock ticks=> ~20,000 instructions.

One time, I had devised a cute scrolling algorithm – remember, no GPU – but I failed short of reaching the expected smoothness: animation was 25 fps not the 50fps I was aiming for. Meaning I needed more than one screen refresh period to update the image…

Tenacity

Something was fishy there, and I had to understand what and why. Reality was not fitting the theory.

Toying with the parameters of the algorithm (reducing text size if my memory serves me we’ll), I was ultimately able to reach the holy grail of smoothness. But my algorithm was now significantly below the 160k cycles barrier, on the paper at least, so I should have reached 50 fps. 

Epiphany

Then it dawned on me and I immediately did write a quick micro benchmark to assess the hypothesis.

Bang!

Actual instructions’ cycle count was always a multiple of four, rounded up (of course). My theory was that the shifter used half of the memory access slots for display purposes (actually, this is a bit trickier than I though). Atari 520 ST Motherboard

I adjusted for that, which meant rewriting part of my algorithm and lowering my ambitions by reducing the amount of moved pixels. And now my scrolling was running smoothly at 50 fps.

Adaptability

My mental model had to be adjusted to match the reality of the hardware. It was an indispensable step to reach my objectives, which definitely performance oriented. But after this failure, I was able to predictably reach 50 fps when needed. I was basically a better demo coder than I was before.

Temporary

In this post, I tried to give you a brief introduction into mechanical sympathy and mental models. I also took the opportunity to brag about my past minor successes.

Doing this, I expected you to start pondering

  • how good is my mental model of the hardware I am working on?
  • are there any signs that I am wrong?
  • can I find some?
  • and foremost, does it matter?

In the next post, I will dig into the models of various parts of a PC and relates this to actual performance impact.