The belief that a change will be easy to do correctly makes it less
likely that the change will be done correctly.
An XP programmer writes a unit test to clarify his intentions before
he makes a change. We call this test-driven
design (TDD) or test-first
programming, because an API's design and implementation are
guided by its test cases. The programmer writes the test the way he
wants the API to work, and he implements the API to fulfill the
expectations set out by the test.
Test-driven design helps us invent testable and usable
interfaces. In many ways, testability and usability are one in the
same. If you can't write a test for an API, it'll probably be
difficult to use, and vice-versa. Test-driven design gives feedback on
usability before time is wasted on the implementation of an awkward API.
As a bonus, the test documents how the API works, by example.
All of the above are good things, and few would argue with them. One
obvious concern is that test-driven design might slow down
development. It does take time to write tests, but by writing the
tests first, you gain insight into the implementation, which speeds
development. Debugging the implementation is faster, too, thanks to
immediate and reproducible feedback that only an automated test can
provide.
Perhaps the greatest time savings from unit testing comes a few months
or years after you write the test, when you need to extend the API.
The unit test not only provides you with reliable documentation for
how the API works, but it also validates the assumptions that went
into the design of the API. You can be fairly sure a change didn't
break anything if the change passes all the unit tests written before
it. Changes that fiddle with fundamental API assumptions cause
the costliest defects to debug. A comprehensive unit test suite is
probably the most effective defense against such unwanted changes.
This chapter introduces test-driven design through the
implementation of an exponential moving average (EMA), a simple but
useful mathematical function. This chapter also explains how to use
the CPAN modules Test::More and
Test::Exception.
Unit Tests
A unit test validates the programmer's view of the application. This
is quite different from an acceptance test, which is written from the
customer's perspective and tests end-user functionality, usually
through the same interface that an ordinary user uses. In constrast,
a unit test exercises an API, formally known as a unit. Usually, we test an entire Perl package with a
single unit test.
Perl has a strong tradition of unit testing, and virtually every CPAN
module comes with one or more unit tests. There are also many
test frameworks available from CPAN. This and subsequent chapters
use
Test::More, a popular and well documented
test module.[2]
I also use Test::Exception to test deviance
cases that result in calls to die.[3]
Test First, By Intention
Test-driven design takes unit testing to the extreme. Before you
write the code, you write a unit test. For example, here's the first
test case for the EMA (exponential moving average) module:
use strict;
use Test::More tests => 1;
BEGIN {
use_ok('EMA');
}
This is the minimal Test::More test. You tell
Test::More how many tests to expect, and you
import the module with use_ok as the first test
case. The BEGIN ensures the module's
prototypes and functions are available during compilation of the rest
of the unit test.
The next step is to run this test to make sure that it fails:
% perl -w EMA.t
1..1
not ok 1 - use EMA;
# Failed test (EMA.t at line 4)
# Tried to use 'EMA'.
# Error: Can't locate EMA.pm in @INC [trimmed]
# Looks like you failed 1 tests of 1.
At this stage, you might be thinking, "Duh! Of course, it
fails." Test-driven design does involve lots of duhs in the
beginning. The baby steps are important, because they help to put you
in the mindset of writing a small test followed by just enough code to
satisfy the test.
If you have maintenance programming experience, you may already be
familiar with this procedure. Maintenance programmers know they need
a test to be sure that their change fixes what they think is broken.
They write the test and run it before fixing anything to make sure
they understand a failure and that their fix works. Test-driven
design takes this practice to the extreme by clarifying your
understanding of all changes before you make them.
Now that we have clarified the need for a module called
EMA (duh!), we implement it:
package EMA;
use strict;
1;
And, duh, the test passes:
% perl -w EMA.t
1..1
ok 1 - use EMA;
Yeeha! Time to celebrate with a double cappuccino so we don't fall asleep.
That's all there is to the test-driven design loop: write a test,
see it fail, satisfy the test, and watch it pass. For brevity,
the rest of the
examples leave out the test execution steps and the concomitant duhs
and yeehas. However, it's important to remember to include these simple
steps when test-first programming. If you don't remember, your
programming partner probably will.[4]
Exponential Moving Average
Our hypothetical customer for this example would like to maintain a
running average of closing stock prices for her website. An EMA is
commonly used for this purpose, because it is an efficient way to
compute a running average. You can see why if you look at the basic
computation for an EMA:
today's price x weight + yesterday's average x (1 - weight)
This algorithm produces a weighted average that favors recent history.
The effect of a price on the average decays exponentially over time.
It's a simple function that only needs to
maintain two values: yesterday's average and the weight. Most other types
of moving averages, require more data storage and more complex
computations.
The weight, commonly called alpha, is
computed in terms of uniform time periods (days, in this example):
2 / (number of days + 1)
For efficiency, alpha is usually computed once, and stored along with
the current value of the average. I chose to use an object to hold
these data and a single method to compute the average.
Test Things That Might Break
Since the first cut design calls for a stateful object, we need to
instantiate it to use it. The next case tests object creation:
ok(EMA->new(3));
I sometimes forget to return the instance ($self)
so the test calls ok to check that
new
returns some non-zero value. This case tests what I think might
break. An alternative, more extensive test is:
# Not recommended: Don't test what is unlikely to break
ok(UNIVERSAL::isa(EMA->new(3), 'EMA'));
This case checks that new returns a blessed reference of class
EMA. To me, this test is unnecessarily
complex. If new returns something, it's probably
an instance. It's reasonable to rely on the simpler case on that
basis alone. Additionally, there will be other test cases that will use
the instance, and those tests will fail if new
doesn't return an instance of class EMA.
This point is subtle but important, because the size of a unit test
suite matters. The larger and slower the suite, the less useful it
will be. A slow unit test suite means programmers will hesitate
before running all the tests, and there will be more checkins which
break unit and/or acceptance tests. Remember, programmers are lazy
and impatient, and they don't like being held back by their
programming environment. When you test only what might break, your
unit test suite will remain a lightweight and effective development
tool.
Please note that if you and your partner are new to test-driven
design, it's probably better to err on the side of caution and to test
too much. With experience, you'll learn which tests are redundant and
which are especially helpful. There are no magic formulas here.
Testing is an art that takes time to master.
Satisfy The Test, Don't Trick It
Returning to our example, the implementation of
new that satisfies this case is:
sub new {
my($proto, $length) = @_;
return bless({}, ref($proto) || $proto);
}
This is the minimal code which satisfies the above test.
$length doesn't need to be stored, and we don't
need to compute alpha. We'll get to them when we need to.
But wait, you say, wouldn't the following code satisfy the test, too?
# Not recommended: Don't fake the code to satisfy the test
sub new {
return 1;
}
Yes, you can trick any test. However, it's nice to treat programmers
like grown-ups (even though we don't always act that way). No one is
going to watch over your shoulder to make sure you aren't cheating
your own test. The first implementation of new
is the right amount of code, and the test is sufficient to
help guide that implementation. The design calls for an object to
hold state, and an object creation is what needed to be coded.
Test Base Cases First
What we've tested thus far are the base cases,
that is, tests that validate the basic assumptions of the API.
When we test basic assumptions first, we
work our way towards the full complexity of the complete
implementation, and it also makes the test more readable. Test-first
design works best when the implementation grows along with
the test cases.
There are two base cases for the compute
function. The first base case is that the initial value of the
average is just the number itself.
There's also the case of inputting a value equal to the average,
which should leave the average unchanged. These cases are coded as
follows:
ok(my $ema = EMA->new(3));
is($ema->compute(1), 1);
is($ema->compute(1), 1);
The is function from
Test::More lets us compare scalar values. Note
the change to the instantiation test case that allows us to use the
instance ($ema) for subsequent cases. Reusing
results of previous tests shortens the test, and makes it easier to
understand.
The implementation that satisfies these cases is:
package EMA;
use strict;
sub new {
my($proto, $length) = @_;
return bless({
alpha => 2 / ($length + 1),
}, ref($proto) || $proto);
}
sub compute {
my($self, $value) = @_;
return $self->{avg} = defined($self->{avg})
? $value * $self->{alpha} + $self->{avg} * (1 - $self->{alpha})
: $value;
}
1;
The initialization of alpha was added to
new, because compute needs
the value.
new initializes the state of the object,
and compute implements the EMA algorithm.
$self->{avg} is initially
undef
so that case can be detected.
Even though the implementation looks finished, we aren't done testing.
The above code might be defective. Both compute
test cases use the same value, and the test would pass even if, for
example, $self->{avg} and
$value were accidentally switched. We also need to
test that the average changes when given different values. The test
as it stands is too static, and it doesn't serve as a good example of
how an EMA works.
Choose Self-Evident Data
In a test-driven environment, programmers use the tests to learn how
the API works. You may hear that XPers don't like documentation.
That's not quite true. What we prefer is self-validating
documentation in the form of tests. We take care to write tests that
are readable and demonstrate how to use the API.
One way to create readable tests is to pick good test data. However,
we have a little bootstrapping problem: To pick good test data, we need valid
values from the results of an EMA computation, but we need an EMA
implementation to give us those values. One solution is to calculate
the EMA values by hand. Or, we could use another EMA implementation
to come up with the values. While either of these choices would work,
a programmer reading the test cases would have to trust them or to
recompute them to verify they are correct. Not to mention that we'd
have to get the precision exactly right for our target platform.
Use The Algorithm, Luke!
A better alternative is to work backwards through the algorithm
to figure out some self-evident test data.[5]
To accomplish this, we treat the EMA algorithm as two equations by
fixing some values.
Our goal is to have integer values for the results so
we avoid floating point precision issues. In addition, integer values make it
easier for the programmer to follow what is going on.
When we look at the equations, we see alpha is the most constrained
value:
today's average = today's price x alpha + yesterday's average x (1 - alpha)
where:
alpha = 2 / (length + 1)
Therefore it makes sense to try and figure out a value of alpha that
can produce integer results given integer prices.
Starting with length 1, the values of alpha decrease as follows: 1,
2/3, 1/2, 2/5, 1/3, 2/7, and 1/4. The values 1, 1/2, and 2/5 are good
candidates, because they can be represented exactly in binary floating
point. 1 is a degenerate case, the average of a single value is
always itself. 1/2 is not ideal, because
alpha and 1 - alpha are
identical, which creates a symmetry in the first equation:
today's average = today's price x 0.5 + yesterday's average x 0.5
We want asymmetric weights so that defects, such as swapping today's
price and yesterday's average, will be detected. A length of 4 yields
an alpha of 2/5 (0.4), and makes the equation asymmetric:
today's average = today's price x 0.4 + yesterday's average x 0.6
With alpha fixed at 0.4, we can pick prices that make today's average
an integer. Specifically, multiples of 5 work nicely. I like prices
to go up, so I chose 10 for today's price and 5 for yesterday's average.
(the initial price). This
makes today's average equal to 7, and our test becomes:
ok(my $ema = EMA->new(4));
is($ema->compute(5), 5);
is($ema->compute(5), 5);
is($ema->compute(10), 7);
Again, I revised the base cases to keep the test short. Any value in
the base cases will work so we might as well save testing time through
reuse.
Our test and implementation are essentially complete. All paths through
the code are tested, and EMA could be used in
production if it is used properly. That is,
EMA is complete if all we care about is
conformant behavior.
The implementation currently ignores what happens
when new is given an invalid value for
$length.
Fail Fast
Although EMA is a small part of the
application, it can have a great impact on quality. For example, if
new is passed a $length of -1,
Perl throws a divide-by-zero exception when alpha is computed. For
other invalid values for $length, such as -2,
new silently accepts the errant value, and
compute faithfully produces non-sensical values
(negative averages for positive prices). We can't simply ignore these
cases. We need to make a decision about what to do when
$length is invalid.
One approach would be to assume garbage-in garbage-out. If a
caller supplies -2 for $length, it's the caller's
problem. Yet this isn't what Perl's divide function does, and it
isn't what happens, say, when you try to de-reference a scalar which
is not a reference. The Perl interpreter calls
die, and I've already mentioned in the
Coding Style chapter that I prefer failing fast
rather than waiting until the program can do some real damage. In
our example, the customer's web site would display an invalid moving
average, and one her customers might make an incorrect investment
decision based on this information. That would be bad. It is better
for the web site to return a server error page than to display
misleading and incorrect information.
Nobody likes program crashes or server errors. Yet calling
die is an efficient way to communicate semantic
limits (couplings) within the application. The UI programmer, in our
example, may not know that an EMA's length must be a positive integer.
He'll find out when the application dies. He can then change the
design of his code and the EMA class to make this limit visible to
the end user. Fail fast is an important feedback mechanism. If we
encounter an unexpected die, it tells us the
application design needs to be improved.
Deviance Testing
In order to test for an API that fails fast, we
need to be able to catch calls to die and then
call ok to validate the call did indeed end in an
exception. The function dies_ok in the module
Test::Exception does this for us.
Since this is our last group of test cases in this chapter, here's the
entire unit test with the changeds for the new
deviance cases highlighted:
use strict;
use Test::More tests => 9;
use Test::Exception;
BEGIN {
use_ok('EMA');
}
ok(my $ema = EMA->new(4));
is($ema->compute(5), 5);
is($ema->compute(5), 5);
is($ema->compute(10), 7);
dies_ok {EMA->new(-2)};
dies_ok {EMA->new(0)};
lives_ok {EMA->new(1)};
dies_ok {EMA->new(2.5)};
There are now 9 cases in the unit test.
The first deviance case validates
that
$length can't be negative. We already know -1 will
die with a divide-by-zero exception so -2 is a better choice.
The zero case checks the boundary condition. The first valid length
is 1. Lengths must be integers, and 2.5 or any other floating point
number is not allowed. $length has no explicit
upper limit. Perl automatically converts integers to floating point
numbers if they are too large. The test already checks that floating
point numbers are not allowed so no explicit upper limit check is
required.
The implementation that satisfies this test follows:
package EMA;
use strict;
sub new {
my($proto, $length) = @_;
die("$length: length must be a positive 32-bit integer")
unless $length =~ /^\d+$/ && $length >= 1 && $length <= 0x7fff_ffff;
return bless({
alpha => 2 / ($length + 1),
}, ref($proto) || $proto);
}
sub compute {
my($self, $value) = @_;
return $self->{avg} = defined($self->{avg})
? $value * $self->{alpha} + $self->{avg} * (1 - $self->{alpha})
: $value;
}
1;
The only change is the addition of a call to die
with an unless clause. This simple fail fast
clause doesn't complicate the code or slow down the API, and yet
it prevents subtle errors by converting an assumption into an
assertion.
Only Test The New API
One of the most difficult parts of testing is to know when to stop.
Once you have been test-infected, you may want to keep on adding cases
to be sure that the API is "perfect". For example, a
interesting test case would be to pass a NaN (Not a Number) to
compute, but that's not a test of
EMA. The floating point implementation of Perl
behaves in a particular way with respect to NaNs[6], and Bivio::Math::EMA will
conform to that behavior. Testing that NaNs are handled properly is a
job for the Perl interpreter's test suite.
Every API relies on a tremendous amount of existing code. There isn't
enough time to test all the existing APIs and your new API as well.
Just as an API should separate concerns so must
a test. When testing a new API, your concern should be that API and
no others.
Solid Foundation
In XP, we do the simplest thing that could possibly work so we can
deliver business value as quickly as possible. Even as we write the
test and implementation, we're sure the code will change. When we
encounter a new customer requirement, we refactor the code, if need
be, to facilitate the additional function. This iterative process is
called continuous design, which is the subject of
the next chapter. It's like renovating your house whenever your needs
change.
[7]
A system or house needs a solid foundation in order to support
continuous renovation. Unit tests are the foundation of an XP
project. When designing continuously, we make sure the house doesn't
fall down by running unit tests to validate all the assumptions about
an implementation. We also grow the foundation before adding new
functions. Our test suite gives us the confidence to embrace change.
Footnotes
Quality Software Management: Vol. 1 Systems Thinking,
Gerald Weinberg, Dorset House, 1991, p. 236.
Part of the Test-Simple distribution, available at
http://search.cpan.org/search?query=Test-Simple
I used version 0.47 for this book.
Version 0.15 used here. Available at
http://search.cpan.org/search?query=Test-Exception
Just a friendly
reminder to program in pairs, especially when trying something new.
Thanks to Ion Yadigaroglu for teaching me this technique.
In
some implementations, use of NaNs will cause a run-time error. In
others, they will cause all subsequent results to be a NaN.
Don't let the thought of continuous
house renovation scare you off. Programmers are much quieter and less
messy than construction workers.
|