In refactoring, there is a premium on knowing when to quit.
-- Kent Beck
Refactoring is an iterative design strategy. Each refactoring
improves an implementation in a small way without altering its
observable behavior. You refactor to make the implementation easier
to repair and to extend. The refactorings themselves add no business
value.
Programming in XP is a four part process: listening, testing, coding,
and designing. Design comes after you have a test for what the
customer wants, and you have a simple implementation already coded.
Of course, your simple implementation is based on your definition of
simple. Your first cut may even be optimally coded in which the
design is complete for the task you are working on. Your next task
may require you to copy some similar code from another module, because
there's no API to get at it. Copying code is usually the simplest way
to see if that code actually works, that is, passes your tests.
Copying code is not good design. After you get the copy working, you
need to refactor the original module and your new code to use a common
API. And, that's emergent design.
The designing emerges from the problem, completely naturally, like you
organize a kitchen. The figure out where to put things depending on
how frequently they are used. If you only use that fruitcake pan once
a year, you probably tuck it away in the back of a hard to reach
cabinet. Similarly, if a routine is used only in one place, you keep
it private within a module. The first time it is used elsewhere, you
may copy it. If you find another use for it, you refactor all three
uses so that they call a single copy of the routine. In XP, we
call this the Rule of Three, [1] and basically it says you only
know some code is reusable if you copy it two times. Refactoring
is the process by which you make the code reusable.
This chapter demonstrates several refactorings in the context of a
single example, Mail::POP3Client. We discuss
how and when to refactor and why refactoring is a good design
strategy. We show how refactoring makes it easier to fix a couple of
defects found by the unit test in Unit Testing.
Design on Demand
XP has no explicit design phase. We start coding right away. The
design evolves through expansions (adding or exposing
function) and contractions (eliminating redundant or unused code).
Refactoring occurs in both phases. During contraction, we clean up
excesses of the past, usually caused by copy-and-paste. During
expansion, we modularize function to make it accessible for the
task at hand. Expansion solves today's problems. Contraction helps
mitigate future problems.
Mail::POP3Client
The examples in this chapter use the foundation we laid in
Unit Testing.
Mail::POP3Client[2]
is a CPAN class that implements the client side of the POP3 mail
retrieval protocol. Since refactoring does not change observable
behavior, you don't need to understand how POP3 works to follow the
examples in this chapter.
I chose Mail::POP3Client, because it is
well-written, mature, and solves a real problem.
Some of the code is
complex. The meat of the example is highlighted in the changed
version, so skip ahead if code complexity interferes with
comprehension. I have included entire subroutines for completeness,
but the differences are just a few lines of code.
Remove Unused Code
We'll warm up with a very simple refactoring. Here's the
Host accessor from
Mail::POP3Client:
sub Host {
my $me = shift;
my $host = shift or return $me->{HOST};
# $me->{INTERNET_ADDR} = inet_aton( $host ) or
# $me->Message( "Could not inet_aton: $host, $!") and return;
$me->{HOST} = $host;
}
This first refactoring removes the extraneous comment. Unused code
and comments distract the reader from the salient bits. Should we
need to resurrect the dead code, it remains in a file's history
available from the source repository. For now, we don't need it, and
the refactored version is shorter and more readable:
sub Host {
my $me = shift;
my $host = shift or return $me->{HOST};
$me->{HOST} = $host;
}
After each refactoring, even as simple as this one, we run the unit
test. In this case, the test will tell us if we deleted too much.
Before we check
in our changes, we run the entire unit test suite to make sure we
didn't break anything.
How often to check in is a matter of preference. The more often you
check in, the easier it is to back out an individual change. The cost
of checking in is a function of the size of your test suite.
Some people like to
check in after every refactoring. Others will wait till the task is
complete. I wouldn't check in now unless this change was unrelated to
the work I'm doing. For example, I might have deleted the dead code
while trying to figure out how to use
Mail::POP3Client in another class.
Refactor Then Fix
As discussed in Distinguish Error Cases Uniquely, the
Message method has a defect that doesn't let us
clear its value. Its implementation is identical to
Host and six other accessors:
sub Message {
my $me = shift;
my $msg = shift or return $me->{MESG};
$me->{MESG} = $msg;
}
The method tests the wrong condition: $msg
being false. It should test to see if $msg was
passed at all: @_ greater than 1. For most of the
other accessors, this isn't a real problem. However, the
Debug method only allows you to turn on
debugging output, since it only saves true values:
sub Debug
{
my $me = shift;
my $debug = shift or return $me->{DEBUG};
$me->{DEBUG} = $debug;
}
We could fix all the accessors with the global replacement, but the
repetition is noisy and distracting. It's better to contract the code
to improve its design while fixing the defects.
We refactor first, so we can test the refactoring with the existing
unit test. Once we determine the refactoring hasn't changed the
observable behavior, we fix the fault. Usually, refactoring first
cuts down on the work, because we only need to fix
the fault in one place.
The first step is to split out the existing logic into a separate
subroutine, and update all accessors to use the new routine:
sub Debug {
return _access('DEBUG', @_);
}
sub Host {
return _access('HOST', @_);
}
sub LocalAddr {
return _access('LOCALADDR', @_);
}
sub Message {
return _access('MESG', @_);
}
sub Pass {
return _access('PASSWORD', @_);
}
sub Port {
return _access('PORT', @_);
}
sub State {
return _access('STATE', @_);
}
sub User {
return _access('USER', @_);
}
sub _access {
my $field = shift;
my $me = shift;
my $value = shift or return $me->{$field};
$me->{$field} = $value;
}
Perl allows us to pass on the actual parameters of a routine simply.
Since the $value parameter is optional, we put the
new parameter $field before the rest of the
parameters. This simple trick reduces code complexity. By the way,
it's not something I'd do if _access were public.
If _access needed to be overriddable by a
subclass, it would have been called as a method
($me->access). However, the XP rule "you
aren't going to need it" applies here. We'll expose
_access to subclasses when there's an explicit
need to.
Consistent Names Ease Refactoring
The refactored accessors are repetitive. In Perl it's common to
generate repetitive code, and we'll do so here.[3]
First, we need to rename MESG to
MESSAGE to simplify generation. There are only
two references in Mail::POP3Client (one use not
shown), so this refactoring is easy:
sub Message {
return _access('MESSAGE', @_);
}
Next, we refactor the Pass accessor. As noted in
Consistent APIs Ease Testing, the inconsistent
naming (Pass method and
PASSWORD configuration parameter) made testing a
bit more repetitive than it needed to be. We don't change the
parameter and instance field from PASSWORD to
PASS, because abbreviations are less desirable.
Rather, we refactor Pass as follows:
sub Pass {
return shift->Password(@_);
}
sub Password {
return _access('PASSWORD', @_);
}
As a part of this refactoring, we need document that
Pass
is deprecated (will eventually go away).
Mail::POP3Client is available on CPAN,
so we can't fix
all the code that uses it. Deprecation is an important refactoring
technique for very public APIs like this one.
Generate Repetitive Code
With the refactored names, we can now generate all eight accessors
with this simple loop:
foreach my $a (qw(
Debug
Host
LocalAddr
Message
Password
Port
State
User
)) {
eval(qq{
sub $a {
return _access('@{[uc($a)]}', \@_);
}
1;
}) or die($@);
}
We check eval's result to be sure the generator
works correctly. The syntax to insert the field name
is a bit funky,
but I prefer it to using a temporary variable.
The @{[any-expression]} idiom allows arbitrary Perl
expressions to be interpolated in double-quoted strings.
Fix Once and Only Once
With the refactoring complete, we can now fix all eight accessors with
one line of code. _access should not check the
value of its third argument, but should instead use the argument count:
sub _access {
my $field = shift;
my $me = shift;
@_ or return $me->{$field};
$me->{$field} = shift;
}
This is not a refactoring. It changes the observable behavior. Our
unit test confirms this: three more test cases pass.
Stylin'
Coding style is important to consider when refactoring. In this
chapter, I chose to maintain the style of the source.[4]
To contrast,
here's what _access would look like in bOP style:
sub _access {
my($field, $self, $value) = @_;
return @_ >= 3 ? $self->{$field} = $value : $self->{$field};
}
The differences between the two styles are subtle, but the bOP version
would stand out in the code and be an unnecessary
distraction. No one cares if you use $me,
$self, or $i[5]
as long as you are consistent.
Style updates are important, too. If for some reason,
Mail::POP3Client were to be incorporated into
bOP, we would consider updating the implementation to match bOP style.
Refactoring coding style is fine as long as you refactor and test
independently from other changes.
|
Tactics Versus Strategy
When we begin a task, tactics are important. We want to add business
value as quickly as possible. Doing the simplest thing that could
possibly work is a tactic to reach our goal.
Copy-and-paste is the weapon of choice when task completion is our
only objective. Generalizations, such as the refactorings shown thus
far, are not easy to come up with when you're under pressure to get
the job done.
Besides, you don't know if the code you copy solves the problem until the
implementation is finished. It's much better to copy-and-paste
to test an idea than
to invent, to implement, and to use the wrong abstraction.
As a design strategy, copy-and-paste poses problems. The code is more
difficult to comprehend, because it's difficult to see subtle
differences. Faults fixed in one place do not propagate themselves
automatically. For example, there's an alternative fix to the problem
already embedded in two other accessors, Count and
Size:
sub Count {
my $me = shift;
my $c = shift;
if (defined $c and length($c) > 0) {
$me->{COUNT} = $c;
} else {
return $me->{COUNT};
}
}
sub Size {
my $me = shift;
my $c = shift;
if (defined $c and length($c) > 0) {
$me->{SIZE} = $c;
} else {
return $me->{SIZE};
}
}
These accessors behave differently from the other eight that we
refactored and fixed above. Count and
Size need to be resettable to zero (mailboxes can
be empty), which is why the accessors have an alternate
implementation.
The thought and debugging that went into fixing
Count and Size could have
also applied
to the other accessors. Since the code wasn't refactored at the time
of the fix, it was probably easier to leave the other accessors alone.
And, under the principle of "if it ain't broke, don't fix
it" any change like this is not justified.
XP legitimatizes fixing non-broke code. It's something programmers do
anyway, so XP gives us some structure and guidelines to do it safely.
We can refactor Size and
Count to use _access without
fear.[6]
The unit test covers the empty mailbox case. If we didn't have
a test for this case, we could add one. Again, XP saves us. Since
the programmers are the testers, we're free to modify the test to suit
our needs.
Refactor With a Partner
Pair programming supports refactoring, too. Two people
are better than one at assessing the need for, the
side-effects of, and the difficulty of a change.
The tradeoffs between tactics versus strategy are hard, and discussing
them with a partner is both effective and natural. Switching partners
often brings new perspectives to old code, too.
Sometimes I look at code and don't know where to begin refactoring. The
complexity overwhelms my ability to identify commonality. For
example, here's some code which needs refactoring:
sub List {
my $me = shift;
my $num = shift || '';
my $CMD = shift || 'LIST';
$CMD=~ tr/a-z/A-Z/;
$me->Alive() or return;
my @retarray = ();
my $ret = '';
$me->_checkstate('TRANSACTION', $CMD) or return;
$me->_sockprint($CMD, $num ? " $num" : '', $me->EOL());
my $line = $me->_sockread();
unless (defined $line) {
$me->Message("Socket read failed for LIST");
return;
}
$line =~ /^\+OK/ or $me->Message("$line") and return;
if ($num) {
$line =~ s/^\+OK\s*//;
return $line;
}
while (defined($line = $me->_sockread())) {
$line =~ /^\.\s*$/ and last;
$ret .= $line;
chomp $line;
push(@retarray, $line);
}
if ($ret) {
return wantarray ? @retarray : $ret;
}
}
sub ListArray {
my $me = shift;
my $num = shift || '';
my $CMD = shift || 'LIST';
$CMD=~ tr/a-z/A-Z/;
$me->Alive() or return;
my @retarray = ();
my $ret = '';
$me->_checkstate('TRANSACTION', $CMD) or return;
$me->_sockprint($CMD, $num ? " $num" : '', $me->EOL());
my $line = $me->_sockread();
unless (defined $line) {
$me->Message("Socket read failed for LIST");
return;
}
$line =~ /^\+OK/ or $me->Message("$line") and return;
if ($num) {
$line =~ s/^\+OK\s*//;
return $line;
}
while (defined($line = $me->_sockread())) {
$line =~ /^\.\s*$/ and last;
$ret .= $line;
chomp $line;
my($num, $uidl) = split ' ', $line;
$retarray[$num] = $uidl;
}
if ($ret) {
return wantarray ? @retarray : $ret;
}
}
sub Uidl {
my $me = shift;
my $num = shift || '';
$me->Alive() or return;
my @retarray = ();
my $ret = '';
$me->_checkstate('TRANSACTION', 'UIDL') or return;
$me->_sockprint('UIDL', $num ? " $num" : '', $me->EOL());
my $line = $me->_sockread();
unless (defined $line) {
$me->Message("Socket read failed for UIDL");
return;
}
$line =~ /^\+OK/ or $me->Message($line) and return;
if ($num) {
$line =~ s/^\+OK\s*//;
return $line;
}
while (defined($line = $me->_sockread())) {
$line =~ /^\.\s*$/ and last;
$ret .= $line;
chomp $line;
my($num, $uidl) = split ' ', $line;
$retarray[$num] = $uidl;
}
if ($ret) {
return wantarray ? @retarray : $ret;
}
}
Where are the differences? What's the first step? With a fresh
perspective, the following stood out:
sub Uidl {
my $me = shift;
my $num = shift;
return $me->ListArray($num, 'UIDL');
}
A partner helps you overcome familiarity and fear of change which make
it hard to see simplifications like this one.
Sharing with Code References
It's clear that List and
ListArray are almost identical. The problem is
that they differ in the middle of the loop. Perl code references
are a great way to factor out differences especially within loops:
sub List {
return _list(
sub {
my $line = shift;
my $retarray = shift;
push(@$retarray, $line);
return;
},
@_,
);
}
sub ListArray {
return _list(
sub {
my($num, $value) = split ' ', shift;
my $retarray = shift;
$retarray->[$num] = $value;
return;
},
@_,
);
}
sub _list {
my $parse_line = shift;
my $me = shift;
my $num = shift || '';
my $CMD = shift || 'LIST';
$CMD =~ tr/a-z/A-Z/;
$me->Alive() or return;
my @retarray = ();
my $ret = '';
$me->_checkstate('TRANSACTION', $CMD) or return;
$me->_sockprint($CMD, $num ? " $num" : '', $me->EOL());
my $line = $me->_sockread();
unless (defined $line) {
$me->Message("Socket read failed for $CMD");
return;
}
$line =~ /^\+OK/ or $me->Message("$line") and return;
if ($num) {
$line =~ s/^\+OK\s*//;
return $line;
}
while (defined($line = $me->_sockread())) {
$line =~ /^\.\s*$/ and last;
$ret .= $line;
chomp $line;
$parse_line->($line, \@retarray);
}
if ($ret) {
return wantarray ? @retarray : $ret;
}
}
We pass an anonymous subroutine as _list's first
parameter,
$parse_line, for the reason described in
Refactor Then Fix.
Refactoring Illuminates Fixes
The List, ListArray, and
Uidl methods are bimodal. When called without
arguments, they return a list of values. When passed a message
number, the value for that message number alone is returned. The
message number cases failed in our unit test.
The code reference refactoring shows us where the fault lies:
$parse_line is not called when
_list is called with an argument. It also needs
to chomp the $line to match
the behavior:
if ($num) {
$line =~ s/^\+OK\s*//;
chomp $line;
return $parse_line->($line);
}
The anonymous subroutines in List and
ListArray need to be bimodal for this refactoring
to work:
sub List {
return _list(
sub {
my $line = shift;
my $retarray = shift or return $line;
push(@$retarray, $line);
return;
},
@_,
);
}
sub ListArray {
return _list(
sub {
my($num, $value) = split ' ', shift;
my $retarray = shift or return $value;
$retarray->[$num] = $value;
return;
},
@_,
);
}
By compressing the business logic, its essence and errors become
apparent. Less code is almost always better than more code.
While advanced constructs like code references may be difficult to understand
for those unfamiliar with them, dumbing down the code is not a good
option. Defaulting to the least common denominator produces dumb code
and ignorant programmers. In XP, change occurs not only the project
artifacts but also in ourselves. Learning and teaching are an
integral part of the XP methodology.
Brush and Floss Regularly
This chapter presents a glimpse of design on demand. Each
refactoring was implemented and tested separately. The trick to
refactoring successfully is taking baby steps.
I like to compare refactoring to brushing your teeth. Your best shot
at preventing tooth decay is to brush briefly after each meal and
to floss daily.
Alternatively, you could just wait for cavities to appear and have
them filled. The short term cost in pain, money, and time is much greater if
you do. In the long term, too many fillings create structural
problems and the tooth has to be replaced.
Refactoring is preventative maintenance, like tooth brushing. Quick
fixes are like fillings. Eventually, they create structural problems
and require the implementation to be replaced. Like teeth, complete
rewrites are unnecessarily painful and costly.
XP encourages you to do the simplest thing that could possibly work
(tactics) to address the immediate problem . You refactor your code
to express concepts once and only once (strategy) to prevent
structural problems. And, don't forget that brushing and flossing
support pair programming.
Footnotes
Don Roberts came up with this rule, and it is noted in
Refactoring: Improving the Design of Existing
Code, Martin Fowler, Addison Wesley, 1999, p. 58.
http://search.cpan.org/author/SDOWD/POP3Client-2.12
There are numerous CPAN classes for generating accessors. The
purpose of this refactoring is to demonstrate the technique of
generating code to reduced repetition. In the real world, you'd use
one of the CPAN classes or, better yet, eschew accessors in favor
a super-class which implements accessors via get and put methods.
There are some minor changes to whitespace, indentation, and brace
placement for consistency and brevity within this book.
Except in FORTRAN, where the name of the variable can determine
its type.
When making changes to CPAN modules, XP, nor any other methodology,
helps to validate uses in the (unknown) importers.
|