Extreme Perl:  Chapter 11: Test-Driven Design   An Evolving Book
about Extreme Programming
with Perl
dot
Previous: Chapter 10: Coding Style   Next: Chapter 12: Continuous Design
 

The belief that a change will be easy to do correctly makes it less likely that the change will be done correctly.

-- Gerald Weinberg[1]

An XP programmer writes a unit test to clarify his intentions before he makes a change. We call this test-driven design (TDD) or test-first programming, because an API's design and implementation are guided by its test cases. The programmer writes the test the way he wants the API to work, and he implements the API to fulfill the expectations set out by the test.

Test-driven design helps us invent testable and usable interfaces. In many ways, testability and usability are one in the same. If you can't write a test for an API, it'll probably be difficult to use, and vice-versa. Test-driven design gives feedback on usability before time is wasted on the implementation of an awkward API. As a bonus, the test documents how the API works, by example.

All of the above are good things, and few would argue with them. One obvious concern is that test-driven design might slow down development. It does take time to write tests, but by writing the tests first, you gain insight into the implementation, which speeds development. Debugging the implementation is faster, too, thanks to immediate and reproducible feedback that only an automated test can provide.

Perhaps the greatest time savings from unit testing comes a few months or years after you write the test, when you need to extend the API. The unit test not only provides you with reliable documentation for how the API works, but it also validates the assumptions that went into the design of the API. You can be fairly sure a change didn't break anything if the change passes all the unit tests written before it. Changes that fiddle with fundamental API assumptions cause the costliest defects to debug. A comprehensive unit test suite is probably the most effective defense against such unwanted changes.

This chapter introduces test-driven design through the implementation of an exponential moving average (EMA), a simple but useful mathematical function. This chapter also explains how to use the CPAN modules Test::More and Test::Exception.

Unit Tests

A unit test validates the programmer's view of the application. This is quite different from an acceptance test, which is written from the customer's perspective and tests end-user functionality, usually through the same interface that an ordinary user uses. In constrast, a unit test exercises an API, formally known as a unit. Usually, we test an entire Perl package with a single unit test.

Perl has a strong tradition of unit testing, and virtually every CPAN module comes with one or more unit tests. There are also many test frameworks available from CPAN. This and subsequent chapters use Test::More, a popular and well documented test module.[2] I also use Test::Exception to test deviance cases that result in calls to die.[3]

Test First, By Intention

Test-driven design takes unit testing to the extreme. Before you write the code, you write a unit test. For example, here's the first test case for the EMA (exponential moving average) module:

use strict;
use Test::More tests => 1;
BEGIN {
    use_ok('EMA');
}

This is the minimal Test::More test. You tell Test::More how many tests to expect, and you import the module with use_ok as the first test case. The BEGIN ensures the module's prototypes and functions are available during compilation of the rest of the unit test.

The next step is to run this test to make sure that it fails:

% perl -w EMA.t
1..1
not ok 1 - use EMA;
#     Failed test (EMA.t at line 4)
#     Tried to use 'EMA'.
#     Error:  Can't locate EMA.pm in @INC [trimmed]
# Looks like you failed 1 tests of 1.

At this stage, you might be thinking, "Duh! Of course, it fails." Test-driven design does involve lots of duhs in the beginning. The baby steps are important, because they help to put you in the mindset of writing a small test followed by just enough code to satisfy the test.

If you have maintenance programming experience, you may already be familiar with this procedure. Maintenance programmers know they need a test to be sure that their change fixes what they think is broken. They write the test and run it before fixing anything to make sure they understand a failure and that their fix works. Test-driven design takes this practice to the extreme by clarifying your understanding of all changes before you make them.

Now that we have clarified the need for a module called EMA (duh!), we implement it:

package EMA;
use strict;
1;

And, duh, the test passes:

% perl -w EMA.t
1..1
ok 1 - use EMA;

Yeeha! Time to celebrate with a double cappuccino so we don't fall asleep.

That's all there is to the test-driven design loop: write a test, see it fail, satisfy the test, and watch it pass. For brevity, the rest of the examples leave out the test execution steps and the concomitant duhs and yeehas. However, it's important to remember to include these simple steps when test-first programming. If you don't remember, your programming partner probably will.[4]

Exponential Moving Average

Our hypothetical customer for this example would like to maintain a running average of closing stock prices for her website. An EMA is commonly used for this purpose, because it is an efficient way to compute a running average. You can see why if you look at the basic computation for an EMA:

today's price x weight + yesterday's average x (1 - weight)

This algorithm produces a weighted average that favors recent history. The effect of a price on the average decays exponentially over time. It's a simple function that only needs to maintain two values: yesterday's average and the weight. Most other types of moving averages, require more data storage and more complex computations.

The weight, commonly called alpha, is computed in terms of uniform time periods (days, in this example):

2 / (number of days + 1)

For efficiency, alpha is usually computed once, and stored along with the current value of the average. I chose to use an object to hold these data and a single method to compute the average.

Test Things That Might Break

Since the first cut design calls for a stateful object, we need to instantiate it to use it. The next case tests object creation:

ok(EMA->new(3));

I sometimes forget to return the instance ($self) so the test calls ok to check that new returns some non-zero value. This case tests what I think might break. An alternative, more extensive test is:

# Not recommended: Don't test what is unlikely to break
ok(UNIVERSAL::isa(EMA->new(3), 'EMA'));

This case checks that new returns a blessed reference of class EMA. To me, this test is unnecessarily complex. If new returns something, it's probably an instance. It's reasonable to rely on the simpler case on that basis alone. Additionally, there will be other test cases that will use the instance, and those tests will fail if new doesn't return an instance of class EMA.

This point is subtle but important, because the size of a unit test suite matters. The larger and slower the suite, the less useful it will be. A slow unit test suite means programmers will hesitate before running all the tests, and there will be more checkins which break unit and/or acceptance tests. Remember, programmers are lazy and impatient, and they don't like being held back by their programming environment. When you test only what might break, your unit test suite will remain a lightweight and effective development tool.

Please note that if you and your partner are new to test-driven design, it's probably better to err on the side of caution and to test too much. With experience, you'll learn which tests are redundant and which are especially helpful. There are no magic formulas here. Testing is an art that takes time to master.

Satisfy The Test, Don't Trick It

Returning to our example, the implementation of new that satisfies this case is:

sub new {
    my($proto, $length) = @_;
    return bless({}, ref($proto) || $proto);
}

This is the minimal code which satisfies the above test. $length doesn't need to be stored, and we don't need to compute alpha. We'll get to them when we need to.

But wait, you say, wouldn't the following code satisfy the test, too?

# Not recommended: Don't fake the code to satisfy the test
sub new {
    return 1;
}

Yes, you can trick any test. However, it's nice to treat programmers like grown-ups (even though we don't always act that way). No one is going to watch over your shoulder to make sure you aren't cheating your own test. The first implementation of new is the right amount of code, and the test is sufficient to help guide that implementation. The design calls for an object to hold state, and an object creation is what needed to be coded.

Test Base Cases First

What we've tested thus far are the base cases, that is, tests that validate the basic assumptions of the API. When we test basic assumptions first, we work our way towards the full complexity of the complete implementation, and it also makes the test more readable. Test-first design works best when the implementation grows along with the test cases.

There are two base cases for the compute function. The first base case is that the initial value of the average is just the number itself. There's also the case of inputting a value equal to the average, which should leave the average unchanged. These cases are coded as follows:

ok(my $ema = EMA->new(3));
is($ema->compute(1), 1);
is($ema->compute(1), 1);

The is function from Test::More lets us compare scalar values. Note the change to the instantiation test case that allows us to use the instance ($ema) for subsequent cases. Reusing results of previous tests shortens the test, and makes it easier to understand.

The implementation that satisfies these cases is:

package EMA;
use strict;

sub new {
    my($proto, $length) = @_;
    return bless({
        alpha => 2 / ($length + 1),
    }, ref($proto) || $proto);
}

sub compute {
    my($self, $value) = @_;
    return $self->{avg} = defined($self->{avg})
        ? $value * $self->{alpha} + $self->{avg} * (1 - $self->{alpha})
        : $value;
}

1;

The initialization of alpha was added to new, because compute needs the value. new initializes the state of the object, and compute implements the EMA algorithm. $self->{avg} is initially undef so that case can be detected.

Even though the implementation looks finished, we aren't done testing. The above code might be defective. Both compute test cases use the same value, and the test would pass even if, for example, $self->{avg} and $value were accidentally switched. We also need to test that the average changes when given different values. The test as it stands is too static, and it doesn't serve as a good example of how an EMA works.

Choose Self-Evident Data

In a test-driven environment, programmers use the tests to learn how the API works. You may hear that XPers don't like documentation. That's not quite true. What we prefer is self-validating documentation in the form of tests. We take care to write tests that are readable and demonstrate how to use the API.

One way to create readable tests is to pick good test data. However, we have a little bootstrapping problem: To pick good test data, we need valid values from the results of an EMA computation, but we need an EMA implementation to give us those values. One solution is to calculate the EMA values by hand. Or, we could use another EMA implementation to come up with the values. While either of these choices would work, a programmer reading the test cases would have to trust them or to recompute them to verify they are correct. Not to mention that we'd have to get the precision exactly right for our target platform.

Use The Algorithm, Luke!

A better alternative is to work backwards through the algorithm to figure out some self-evident test data.[5] To accomplish this, we treat the EMA algorithm as two equations by fixing some values. Our goal is to have integer values for the results so we avoid floating point precision issues. In addition, integer values make it easier for the programmer to follow what is going on.

When we look at the equations, we see alpha is the most constrained value:

today's average = today's price x alpha + yesterday's average x (1 - alpha)

where:

alpha = 2 / (length + 1)

Therefore it makes sense to try and figure out a value of alpha that can produce integer results given integer prices.

Starting with length 1, the values of alpha decrease as follows: 1, 2/3, 1/2, 2/5, 1/3, 2/7, and 1/4. The values 1, 1/2, and 2/5 are good candidates, because they can be represented exactly in binary floating point. 1 is a degenerate case, the average of a single value is always itself. 1/2 is not ideal, because alpha and 1 - alpha are identical, which creates a symmetry in the first equation:

today's average = today's price x 0.5 + yesterday's average x 0.5

We want asymmetric weights so that defects, such as swapping today's price and yesterday's average, will be detected. A length of 4 yields an alpha of 2/5 (0.4), and makes the equation asymmetric:

today's average = today's price x 0.4 + yesterday's average x 0.6

With alpha fixed at 0.4, we can pick prices that make today's average an integer. Specifically, multiples of 5 work nicely. I like prices to go up, so I chose 10 for today's price and 5 for yesterday's average. (the initial price). This makes today's average equal to 7, and our test becomes:

ok(my $ema = EMA->new(4));
is($ema->compute(5), 5);
is($ema->compute(5), 5);
is($ema->compute(10), 7);

Again, I revised the base cases to keep the test short. Any value in the base cases will work so we might as well save testing time through reuse.

Our test and implementation are essentially complete. All paths through the code are tested, and EMA could be used in production if it is used properly. That is, EMA is complete if all we care about is conformant behavior. The implementation currently ignores what happens when new is given an invalid value for $length.

Fail Fast

Although EMA is a small part of the application, it can have a great impact on quality. For example, if new is passed a $length of -1, Perl throws a divide-by-zero exception when alpha is computed. For other invalid values for $length, such as -2, new silently accepts the errant value, and compute faithfully produces non-sensical values (negative averages for positive prices). We can't simply ignore these cases. We need to make a decision about what to do when $length is invalid.

One approach would be to assume garbage-in garbage-out. If a caller supplies -2 for $length, it's the caller's problem. Yet this isn't what Perl's divide function does, and it isn't what happens, say, when you try to de-reference a scalar which is not a reference. The Perl interpreter calls die, and I've already mentioned in the Coding Style chapter that I prefer failing fast rather than waiting until the program can do some real damage. In our example, the customer's web site would display an invalid moving average, and one her customers might make an incorrect investment decision based on this information. That would be bad. It is better for the web site to return a server error page than to display misleading and incorrect information.

Nobody likes program crashes or server errors. Yet calling die is an efficient way to communicate semantic limits (couplings) within the application. The UI programmer, in our example, may not know that an EMA's length must be a positive integer. He'll find out when the application dies. He can then change the design of his code and the EMA class to make this limit visible to the end user. Fail fast is an important feedback mechanism. If we encounter an unexpected die, it tells us the application design needs to be improved.

Deviance Testing

In order to test for an API that fails fast, we need to be able to catch calls to die and then call ok to validate the call did indeed end in an exception. The function dies_ok in the module Test::Exception does this for us.

Since this is our last group of test cases in this chapter, here's the entire unit test with the changeds for the new deviance cases highlighted:

use strict;
use Test::More tests => 9;
use Test::Exception;
BEGIN {
    use_ok('EMA');
}
ok(my $ema = EMA->new(4));
is($ema->compute(5), 5);
is($ema->compute(5), 5);
is($ema->compute(10), 7);
dies_ok {EMA->new(-2)};
dies_ok {EMA->new(0)};
lives_ok {EMA->new(1)};
dies_ok {EMA->new(2.5)};


There are now 9 cases in the unit test. The first deviance case validates that $length can't be negative. We already know -1 will die with a divide-by-zero exception so -2 is a better choice. The zero case checks the boundary condition. The first valid length is 1. Lengths must be integers, and 2.5 or any other floating point number is not allowed. $length has no explicit upper limit. Perl automatically converts integers to floating point numbers if they are too large. The test already checks that floating point numbers are not allowed so no explicit upper limit check is required.

The implementation that satisfies this test follows:

package EMA;
use strict;

sub new {
    my($proto, $length) = @_;
    die("$length: length must be a positive 32-bit integer")
        unless $length =~ /^\d+$/ && $length >= 1 && $length <= 0x7fff_ffff;
    return bless({
        alpha => 2 / ($length + 1),
    }, ref($proto) || $proto);
}

sub compute {
    my($self, $value) = @_;
    return $self->{avg} = defined($self->{avg})
        ? $value * $self->{alpha} + $self->{avg} * (1 - $self->{alpha})
        : $value;
}
1;

The only change is the addition of a call to die with an unless clause. This simple fail fast clause doesn't complicate the code or slow down the API, and yet it prevents subtle errors by converting an assumption into an assertion.

Only Test The New API

One of the most difficult parts of testing is to know when to stop. Once you have been test-infected, you may want to keep on adding cases to be sure that the API is "perfect". For example, a interesting test case would be to pass a NaN (Not a Number) to compute, but that's not a test of EMA. The floating point implementation of Perl behaves in a particular way with respect to NaNs[6], and Bivio::Math::EMA will conform to that behavior. Testing that NaNs are handled properly is a job for the Perl interpreter's test suite.

Every API relies on a tremendous amount of existing code. There isn't enough time to test all the existing APIs and your new API as well. Just as an API should separate concerns so must a test. When testing a new API, your concern should be that API and no others.

Solid Foundation

In XP, we do the simplest thing that could possibly work so we can deliver business value as quickly as possible. Even as we write the test and implementation, we're sure the code will change. When we encounter a new customer requirement, we refactor the code, if need be, to facilitate the additional function. This iterative process is called continuous design, which is the subject of the next chapter. It's like renovating your house whenever your needs change. [7]

A system or house needs a solid foundation in order to support continuous renovation. Unit tests are the foundation of an XP project. When designing continuously, we make sure the house doesn't fall down by running unit tests to validate all the assumptions about an implementation. We also grow the foundation before adding new functions. Our test suite gives us the confidence to embrace change.

Footnotes

  1. Quality Software Management: Vol. 1 Systems Thinking, Gerald Weinberg, Dorset House, 1991, p. 236.

  2. Part of the Test-Simple distribution, available at http://search.cpan.org/search?query=Test-Simple I used version 0.47 for this book.

  3. Version 0.15 used here. Available at http://search.cpan.org/search?query=Test-Exception

  4. Just a friendly reminder to program in pairs, especially when trying something new.

  5. Thanks to Ion Yadigaroglu for teaching me this technique.

  6. In some implementations, use of NaNs will cause a run-time error. In others, they will cause all subsequent results to be a NaN.

  7. Don't let the thought of continuous house renovation scare you off. Programmers are much quieter and less messy than construction workers.

 
Previous: Chapter 10: Coding Style   Next: Chapter 12: Continuous Design
dot
dot
Copyright © 2004 Robert Nagler
Licensed under a Creative Commons Attribution 4.0 International License.
  back to top