Workshop discussion at XP2005
The aim of the workshop was to discuss
different aspects of acceptance testing in a fairly high-level
and tool-independent way. Running the workshop were myself,
Brian Swan (author of Exactor) and Rick Mugridge (author of
FitLibrary and a newly published book on acceptance testing).
What unites TextTest+StoryText, Exactor and Fit+FitLibrary is the
aim to support writing tests in the language of the business
domain : something most of the myriad other tools out there are
not concentrating on. The following text is something that all
three presenters agreed was a fair representation of the
discussion.
There was a growing realisation and
consensus on the importance of acceptance tests as a vehicle for
establishing communication between business types (“customers”)
and the development team about requirements. That is, while
producing maintainable executable tests is a major point of
acceptance testing, it isn't the only one.
For business types to be involved at
this level, it becomes a major advantage for the tests to be
written in their language: the language of the business domain.
This is in preference to the alternatives: tests being written
in a programming language, or a pre-packaged language defined by
the test tool. It was perhaps not a surprising conclusion given
the common thread of the tools developed by the organisers! A
further point that emerged, though, is that tools should focus
on constraining this language as little as possible to avoid
strait-jacketing the communication : few constraints will be
useful in all domains. We didn't examine the actual constraints
imposed by actual tools, though.
Highlighted by several participants was
the fact that many in management seem to want to regard testing
and development as entirely separate enterprises, and some tools
encourage this view by suggesting that effective tests can be
written without needing to touch the code or “disturb”
the developers at all. In fact, such compartmentalisation was
seen as dangerous and can lead to important conversations being
skipped.
Another problem for XP teams with long
experience of unit testing is the notion of “test first”.
You write a fully executable unit test, see the bar go red,
write the code and see the bar go green. Some expect a similar
process with acceptance testing : that the customer will write a
fully executable acceptance test, “throw it over the wall”
and the developers will write the code to make it pass. The
discussion emphatically rejected this view: writing an
acceptance test is an iterative procedure between customer and
developers, where the two “sides” understand each
other more and more, with the final result being an executable
test.
Many practitioners report that getting
business types involved in writing acceptance tests is
difficult: they do not see it as their job. Unrealistic
expectations in terms of the tools placed in their hands and
expectations they will do the whole thing themselves have
probably contributed to such problems, as discussed above.
A few points were raised about how to
mitigate this:
A significant precondition is that they are a fully
integrated part of the team and feel co-responsible for its
success. Then it is easier to see the importance of knowing as
soon as possible that the correct thing is being built and that
it is being built correctly. It is then more tempting to invest
energy in the process
Another key is making sure they are driving the process.
Meet them where they're at. Instead of starting from “here
is a tool that works this way”, ask them to express a
test in whatever form they find convenient and work from there.
Work with them and help them to refine it until it can be added
to a test suite that will be understood by a tool.
It would be very valuable if public examples were
available to document successful involvement – this would
help to inspire future business involvement.
There seemed to be a consensus that TextTest's approach of
using logging to verify behaviour was particularly suitable for
legacy systems, because it is much less intrusive to the system
under test. Log statements can be added to a legacy codebase in
an exploratory manner without any risk of breaking it.
Introducing any tool that aims to access the system under test
via an API (i.e. nearly anything else) involves performing
(maybe substantial) refactoring before tests can be put in
place.
It can be tempting to focus on testing the code. In fact,
large tracts of the code may be entirely unused. Far more useful
is to study the current usage patterns of the system and
concentrate on writing tests for those.
A related point is that it is essential to find out what is
now understood to be correct behaviour of the legacy system –
which is very likely to differ from its original specification.
The users may be so used to the bugs in it that they want to
preserve them! The focus at first needs to be on “behaviour
preservation” rather than “correctness assertion”.
Once such things are in place, the next step depends on the
aim of the enterprise. Is it to basically fix bugs or is major
new functionality needed? In the first case, behaviour
preserving tests may well be all you need. If major development
is needed, a serious refactoring effort will probably be needed,
and unit tests can be added along with that effort.
When creating unit tests, much effort is (or can be) invested
in isolating units that can be tested alone. This generally
involves creating “mocks”, “fakes” or
“stubs” to replace parts of the system that are not
interesting for the test currently under consideration. Similar
considerations arise in acceptance testing, though the
“requirement” rather than “code” focus
seems to make the distinctions between “mocks”,
“fakes” and “stubs” uninteresting. The
assembled company chose to borrow the work “mock” as
it sounds best, somehow.
So, the question became : what should you mock out in
acceptance tests? What is a valid motivation for doing so? The
answers were pretty varied and a consensus was not really
reached. In general Brian and I seemed to answer that as little
should be mocked out as was practically feasible, while Rick and
some others were happy mocking a bit more.
Naturally, creating mock versions of things creates the
potential for unfound errors. Exactly the reason, in fact, why
unit tests can miss important problems. The more what you test
differs from what you ship, the more potential there is for this
problem.
Of course, there are situations in which mocking is
unavoidable:
Hardware unavailibility. One attendee spoke of an
embedded system that needed to be run at a power station!
Clearly, the power station needs to be mocked out somehow as
most people don't have a spare one for running tests on. On a
less extreme level, some applications depend very heavily on
hardware type: in this case, products like VMWare provide an
effective means of “mocking the hardware”.
Simulating failure of external systems. To check that
the application behaves correctly in the presence of failure
conditions in its environment, you need to be able to simulate
those conditions somehow.
Clock dependencies. If certain things should happen at
8am every morning, you need a way to mock the system clock to
be able to test them.
Maintainibility. For example, with a GUI application,
the ultimate arbiter of what happens is what appears on the
screen, so the “correct” thing to do is take screen
dumps and compare them. Naturally, this is very fragile and
sensitive to irrelevant things like the exact machine setup
where the test is running, so practically we find a way to
“mock the screen” for assertion purposes. We assume
we know what will appear there, so we log what we think we are
telling the GUI toolkit to do (TextTest), or we examine its
internal state and assert things about it (everyone else).
Naturally both of these things are subject to errors creeping
in.
Then there are situations where the “liberals”
might mock things and the “fundmentalists” might
not...
The User Interface (for purposes of driving the system
under test). For some, mocking this is just practical, because
it is assumed to be too difficult to have it present for real.
For others it is a good idea anyway, because they believe it
isn't helpful to think about or get hung up on the user
interface when writing tests. For still others (and here I
include myself, naturally) the user interface is an essential
part of the system, and having the possibility to run tests
with it present (or even write tests using it, in our case)
greatly enhances the ease of writing and understanding tests.
Sending email. Naturally you can't send email for real
to the real person. But you could configure the test system to
send it to a test address and then examine the inbox there to
see if it arrived. In practice none of us are doing this yet as
it hasn't seemed worth the effort in practice in the systems we
have worked on.
Slow things. There are certainly situations where
acceptance tests can be very slow, and some feel this is a
reason to start mocking slow things out. Others (including me)
feel that there are better ways to address this: maybe run the
slower tests only every night or every weekend, or (better)
install a grid engine and run the tests in parallel.
A common example is databases. Many mock out the database
instinctively, but this is rarely a good idea in acceptance
tests if the database and its schema are a key part of the
system. One option is to replace a “big” database
like Oracle with a lightweight one such as MySQL – this
will generally save a fair bit of time in the tests. (this is
assuming creation and teardown of a new database instance per
test, without which your tests will be interdependent, which is
generally nasty for maintainance purposes)
Written by Geoff Bache, endorsed by Rick Mugridge and
Brian Swan, 19th July 2005
|