Archive for the ‘Testing Tenets’ Category

Revisting the testing tenets

Sunday, August 24th, 2008

Back in 2005 I wrote a series of seven blog posts called The Seven Tenets of Software Testing. These posts have been buried deep in this site, so I have added a new page - Tenets of software testing - that links to all these origional articles, hopefully making them easier to find if you are new to my software testing blog.

Tenet: If you are going to run a test more than once, it should be automated.

Wednesday, August 2nd, 2006

This post is the second in a seven part series covering my seven tenets of software testing.

Original post (31 Jan 2005)

Automation explained
Test automation means different things to different people in different contexts. When I talk about automated testing I am referring to; A method of executing a test without human intervention, that would otherwise require it. In practical terms that may mean a nUnit test, a GUI test using a commercial testing tool, or a test written using an application’s internal scripting language. The technology is not the key concern, the fact that the test can be run 100% without any human involvement is the key.

Why automate?
The primary reason to automate tests is time. As a tester, you always need more time.

Automation can provide immense reductions in the amount of time required to execute tests. My first introduction to automated testing, in 1997, managed to condense 5 days of manual testing effort into 1 hour of automated execution for a 97.5% reduction in execution time.

Think about that for a moment, by automating our tests, we have achieved the equivalent of adding 40 additional testers to the team, for a fraction of the price. In addition I had taken the drudgery from my teams work day, increasing morale. More importantly, it lets your testers perform more ad-hoc testing which is much more effective than performing the same manual test over and over again.

So that’s it then, should we just retrench all our testers, and use automation instead? Well, no.

Despite a recent example where Microsoft retrenched 62 Longhorn testers, citing automation as the cause. Test automaton is generally used to help increase the amount of test coverage that can be achieved with a given schedule and resources, not reduce testing head-count.

When implemented correctly, with enough hardware, automating your tests allows you to execute all your tests, at least once a day, every day. When combined with a daily build, you have one (if not the), most powerful testing tool on the planet.

Build Verification or smoke testing
Every night after a successful compile, the daily build should automatically be packaged and deployed into a test environment, and smoke tested with an automated build verification test (BVT). The term “smoke” test is derived from the idea of quickly plugging in some electronics to test them, checking that no smoke comes out. While ideally you would run all your tests, typically the smoke test is a carefully selected subset of your full automated tests that can be run in about 10-20 minutes or so.

Automated regression testing
regression n. “[To] relapse to a less perfect or developed state.”

Regression testing has the goal of ensuring that the quality of an application doesn’t decline as features are added. Regression testing has a significant challenge. When an application needs it most, there is less time and resources available to perform all the tests that were executed when the product first shipped.

Without automation, the typical approach is to perform localised regression testing, which is limited to directly testing the area around the changes.

With an automation suite in hand, it is simply a matter of executing all the tests that were developed previously. This allows the maintenance programmers to make frequent releases, and allows the quality of the application to improve over time.

This is particularly important in light of trend that Fred Brooks suggests in his classic work, the mythical man-month, where each defect that is fixed has a 20-50% chance of introducing another.

The largest automation project that I have personally worked on was a huge effort which my client ran their $1M+ investment of tests on a rack of 25 dedicated machines that pounded away relentlessly, shortening their regression testing cycle by 75%, and that was partially automated to begin with!

Update (Tue, 8 Feb 2005)
Sara Ford, a tester in the Visual Studio team at Microsoft has blogged about this topic as well here.

Update (Wed, 2 Aug 2006)
A recent post by Bj Rollison, For those of you dreaming the 100% automation dream…please wake up! makes a very good point that, an goal 100% automated tests is completely unrealistic. I agree with his post, feel that I need to add a clarification to this one.

With any form of testing, you have to focus on what is important. Blindly trying to automate everything just to reach an arbitrary automation goal is contrary to that tenet. Does that mean that this tenet is wrong then? No, I don’t think so. There is a significant difference between should be automated and must be automated. In my experience, most projects need a heck of a lot more automation than they have. If this tenet was called, “you must automate what you can”, the whole point of this tenet would have been lost. Personally, the highest I have ever achieved on a project was 95% automated, and that lasted for all of 1 day, before we added more tests.

The #1 priority for a tester on any project, should always be to finding and log issues, however you find them. However, investing in the right amount of automation can make that a whole lot more effective.

Assert me!

Wednesday, April 27th, 2005

Following on from my previous post referring to Joe Schmetzer’s work on unit testing anti-patterns, I felt the need to share my observations from some recent code reviews that I have performed.

Always include at least one assert in each test

If I hadn’t seen a unit test with no asserts myself, I would find it hard to believe. Each nUnit test must contain at least one assert statement. Without the assert, you will get a test pass regardless of what the code does. For example the following test:

[Test]
public void TestName()
{
    Console.WriteLine("\nThis is a test with no asserts.");
}

Will produce the following output in nunit-console.

.
This is a test with no asserts.

Tests run: 1, Failures: 0, Not run: 0, Time: 2.541 seconds.

As you can see this is not marked as a failure or not run, it is recorded as a pass. This is a common error that all automated test tool developers tend to make. The default result for any automated test should be to fail (or some other non-passing state), unless explicitly passed during test execution.

Microsoft have the right idea for their generated test stubs in Whidbey. In VS 2005 it places an Assert.Inconclusive statement in the body of any test it auto-generates. That is a good first step, but ideally it should be the default behaviour for any test with no asserts, and not require that the Assert.Inconclusive statement to even be there.

Back to the code review, to be fair to the developer involved, the test code was testing an asset object which is only one letter different to an assert statement, so it was pretty easy to miss.

Where multiple asserts are used in a test, include a descriptive string.

If your test has more than one assert statement, each statement needs to have a nice, descriptive string explaining what is being validated. The reason for this is if you have a test with a bunch of asserts that fails, you will have to debug through the test code to find out which assert failed. For example the following test:

[Test]
public void TestIntegers()
{
    int FirstInteger = 1;
    int SecondInteger = 1;
    Assert.AreEqual(FirstInteger, 1);
    Assert.AreEqual(SecondInteger, 2 );
}

will produce the following output in nunit-console.

.F
Tests run: 1, Failures: 1, Not run: 0, Time: 2.606 seconds

Failures:
1) Teknologika.Tests.BlogPostTests.TestIntegers :
	expected:<1>
	 but was:<2>

The challenge here of course is which test failed? In the contrived example above, it is obviously the second one. If it was an actual test, however, we would most likely have to debug through the test code to find out.
If we add some comments to the test, as follows:

[Test]
public void TestIntegers()
{
    int FirstInteger = 2;
    int SecondInteger = 1;
    Assert.AreEqual(FirstInteger, 1, "Failed : Verifying that FirstInteger is 1." );
    Assert.AreEqual(SecondInteger, 2, "Failed : Verifying that SecondInteger is 2." );
}

Then we get a much more meaningful output, which let’s us focus straight on the assert that is failing.

.F
Tests run: 1, Failures: 1, Not run: 0, Time: 2.791 seconds

Failures:
1) Teknologika.Bulldozer.Tests.BlogPostTests.TestIntegers : Failed : Verifying that FirstInteger is 2.
	expected:<1>
	 but was:<2>

Refactor your tests into smaller more atomic tests

The previous example is hiding something. There is more that one failure in the TestIntegers test, but we are only seeing the first one, as nUnit aborts the test once the first failure is reached.

If this was a real test, and the each of the failures were being caused by different things we would probably fix the first one, and re-run the tests only to have the test fail on the second error.

Incidentally, for my contrived example, where both our asserts have the same expected and actual values, if we hadn’t added a description, our test results would be identical, so you might still think that it was broken. Even though we have fixed the first error.

So if we refactor our test into two smaller atomic tests like this:

[Test]
public void TestFirstInteger()
{
    int FirstInteger = 2;
    Assert.AreEqual(FirstInteger, 1, "Failed : Verifying that FirstInteger is 1." );
}

[Test]
public void TestingSecondInteger()
{
    int SecondInteger = 1;
    Assert.AreEqual(SecondInteger, 2, "Failed : Verifying that SecondInteger is 2." );
}

then our results show the full story.

FF
Tests run: 2, Failures: 2, Not run: 0, Time: 3.209 seconds

Failures:
1) Teknologika.Bulldozer.Tests.BlogPostTests.TestFirstInteger : Failed : Verifying that FirstInteger is 1.
	expected:<1>
	 but was:<2>

2) Teknologika.Bulldozer.Tests.BlogPostTests.TestingSecondInteger : Failed : Verifying that SecondInteger is 2.
	expected:<2>
	 but was:<1>

The bottom line here is that spending a small amount of time when writing your tests can make things a whole lot easier when it comes to investigating your test failures.

Tenet: A test is successful when the software under test fails.

Saturday, April 2nd, 2005

This post is the seventh and final post in a seven part series covering my seven tenets of software testing.

As I discussed in tenet five, all software has bugs, and the goal of testing is to find them.

One of the ironies of testing is that when a test case runs without error we give it a big green tick, and say that it has passed. By doing this, we are incorrectly reinforcing that a test is successful if it doesn’t find any bugs. If we want to find defects and improve the quality of our software, we want our tests to fail.

This is a subtle but very important difference in testing approach. A good analogy are tests performed by a doctor. When a doctor returns with test results for your sore leg, the last thing you want to hear is: “That pain in your leg, the tests didn’t find anything, so you are fine, just ignore it.”

There is a danger when we change our expectation away from “All tests must pass all the time”, to “We want tests to fail”. The danger is that our expectation will instead become: “Most of our tests should fail and it’s ok to have tests failing for weeks on end.”

Ideally, your goal should be to have a high fidelity test system, with a core set of automated regression tests, that suffers an occasional failure as developers evolve the application over time. In addition to the regression tests, you should be adding tests for things like known defects, and new tests to expand the test coverage. Ideally, you should expect any new test to fail the first few times it is run. When the issue you found is resolved, the test should execute without failure and then stay that way.

It is also important that your test suite has a high signal to noise ratio. As the number of tests increases, so will the amount of analysis that you need to perform when there are test failures.

Now if only I could charge for tests like a doctor does …

Tenet: A developer should never test their own software.

Tuesday, March 22nd, 2005

This post is the sixth in a seven part series covering my seven tenets of software testing.

Test Driven Development (TDD) is great, and it is something that every developer should do. However, like most development techniques, TDD is not a silver bullet. TDD is primarily focused on defining how a class should work, implementing that class, and then verifying the implementation performs as expected. This post isn’t about TDD, however I feel it is important to mention it because it is the “exception to the rule”, where the “rule” that is the true subject of this post.

A couple who are good friends of my wife and I, recently had their first child. The child’s father is an orthopaedic surgeon, who, during his years as an emergency ward doctor, has delivered several babies. Before the birth I asked him, as he is qualified, and experienced, if he wanted to, could he arrange to deliver the baby himself? He answered pretty much as I expected. He would never consider delivering the baby himself, as he had too much emotional investment in the patient, his wife, and the event itself.

What the heck does this have to do with testing I hear you ask? Just as surgeon won’t operate on friends or family unless it is an emergency, a developer shouldn’t test their own code. The reason for this is clear; A developer cannot test their own code, because they simply have too much emotional attachment to it.

Development and testing are two diametrically opposed disciplines. Development is all about construction, and testing is all about demolition. Effective testing requires a specific mindset and approach where you are trying to uncover developer mistakes, find holes in their assumptions, and flaws in their logic. Most people, myself included, are simply unable to place themselves and their own code under such scrutiny and still remain objective.

Let’s say that a developer has to write some code that calculates a sales commission, where the commission is normally 5%, but rises to 7% for sales over ten thousand dollars, and they implement the following code.

if  (SalesAmount < 10000.00)
{
	Commission = SalesAmount  * 0.05;
}
else
{
	Commission = SalesAmount  * 0.07;
}

The developer has made the assumption that a sale of exactly $10,000 should earn 7% commission. If they are testing this code as well they might write tests similar to the following:

[Test]
public void VerifyLowerCommission()
{
    Assert.AreEqual(499.9995,CalculateCommission(9999.99));
}

[Test]
public void VerifyHigherCommission()
{
    Assert.AreEqual(700.0007,CalculateCommission(10000.01));
}

The problem with these tests, is that even though they achieve 100% code coverage, the developer has based them on the same assumptions and thought processes they used when writing the code itself. In this contrived example, let’s assume the actual calculation should have been based on commissions greater than or equal to $10,000. So, even though these test cases would pass, the calculation is actually wrong. This type of bug would probably manifest itself infrequently, as it would require a sale of exactly $10,000 to cause a problem and would otherwise remain dormant.

Having someone impartial write the tests for the code increases the chance of finding that type of issue significantly. This helps because they will have make their own ideas about how things should work, and challenge the developers assumptions.

Tenet: To build it, you have to break it

Friday, March 11th, 2005

This post is the fifth in a seven part series covering my seven tenets of software testing.

Let’s say that you are a modern, test-driven developer. You run your tests and the tests all pass. Great, your code must be bug free, let’s ship! Umm, not quite. Are you sitting down? Good, I need to tell you something. Your software has bugs.

It doesn’t matter if you are a graduate fresh out of college, Don Box or Anders Hejlsberg. If you are writing a program that does anything remotely useful, it will have bugs. In Code Complete, Steve McConnell presents some statistics of exactly how many bugs you should expect to find.

Setting the benchmark is the CMM poster child, the NASA team that writes the software for the space shuttle. The NASA team has achieved the impressive statistic of zero bugs for every 500,000 lines of released code.

For all the negative criticism about buggy software that Microsoft have received over they years, they do a pretty good job with 1 defect per 2000 lines of released code. By comparison the rest of the software industry achieve between 15 and 50 errors per 1000 lines of released code. 1
It is Important to note that these statistics are for bugs in released code, i.e. after testing has been completed. Even the impressive NASA numbers don’t mean that there isn’t any bugs in their code, especially before it is released. A much higher number of bugs will have been found and resolved before the code went out the door.
So, if your code is say 10,000 lines lines long, you should expect, at a minimum, to have between 150 and 500 defects. So, if the bugs are there, how do I find them?

Good testers will generally (sometimes subconsciously), use a technique known as error guessing. Error guessing is all about trying to throw something at the application that the developers haven’t thought of, otherwise known as a negative test.

Negative tests are basically trying to come up with permutations of data that the application has not been designed to handle. For example, an int32 in .net can handle numbers from -2,147,483,647 to 2,147,483,647. What is the behaviour of an application when an integer is set to 2,147,483,647 and then 1 is added to it?

Negative tests are effective at finding bugs because they do things that the developer may have never considered when they are coding the application. They also represent the types of things that real users may do to a system, sometimes bringing it to it’s knees. Ideally we don’t want our users to do that on a regular basis, or they won’t be users for long. We need to find the bugs, that we know are there, before our end users do. The best way to find the bugs is to do our damnedest to try an break the application, in parallel to construction, starting the day that the compiler produces some output.

Breaking the application as it is being built is important. It’s important because the longer a bug sits undiscovered the more it will cost to remove. You want to find those bugs as early as possible. when they are the cheapest to fix.

The best analogy to this technique is the development of a formula one engine. Whilst the exact techniques are closely guarded secrets, the engine developers will probably push the engine and its components to the absolute limit, identify the cause of failure, resolve the problem and then repeat the process. The alternative is to destroy engines race after race as the limits of the engine are discovered.

I’m sure Mark Webber doesn’t expect to have to be an engine test guinea pig during a race. Similarly, your users shouldn’t be expected to find your bugs for you either.

References

1 Steve McConnell 1993, Code Complete, Microsoft Press, pg. 612-613.

Tenet: Base your decisions on data and metrics, not intuition and opinion

Friday, February 25th, 2005

If you have only walked in the dark, you will have never known the clarity that light brings. - me

This post is the fourth in a seven part series covering my seven tenets of software testing.

I was giving a presentation once to the CEO of the company that I worked for about the current state of play within our our organisation. I was Development Manager at the time, so testing was not my primary focus. During the presentaiton, I couldn’t resist including a couple of testing related slides. The first slide showed an example defect trend graph, which I used to illustrate the sort of information that should be generated by the Test Manager to assist with the day to day decisions. The second slide was the same graph with the data removed so that only the two axes remained, illustrating the lack of information available when there aren’t any testers logging issues.

Steve McConnell used a brilliant analogy in Code Complete1, where he compares testing to a bathroom scale when you are trying to loose weight. Steve (or should that be Mr. McConnell) states that the scale does not help you loose weight at all. The scale is merely an indicator of your progress towards your goal.

In my way of thinking, to extend Steve’s analogy, a test team is more like a weight loss clinic. The statistics and metrics that they produce are like the weekly weigh in, and blood test results that tell the real story of how you are progressing.

Government health warning: Metrics can be addictive

I don’t smoke, but testing metrics are like a cigarette habit, once you are used to having them, it is almost impossible to give them up. You may be able to go for short, painful, stints without them, but you know it is a case of when they will be back, not if.

Metrics can provide insights and answers to curly questions such as: When will the product ship? The simple answer is to average the number of bugs fixed per day, and divide the total number of bugs by the average. That is approximately how many days until you reach zero bugs. So, if you are fixing 5 bugs per day and you have 200 active bugs, the earliest that you will ship is in 40 working days time. If you want to ship sooner, you will need to stop adding features and focus on fixing more bugs. The same information can be used in reverse to calculate a maximum allowable bug count. Say you only have 40 days until your desired ship date, and you are fixing 5 bugs per day as in the previous example. If you active bug count is over 200 today, you will probably miss your target. This number continuously decreases so in 2 weeks time, with 30 working days to go, your bug count should be at the 150 mark if you are going to hit your ship date.

Interpreting the results sometimes makes you feel more like a statistician than a test manager. But trust me, it is well worth the effort.

References

1 Steve McConnell 1993, Code Complete, Microsoft Press.

Tenet: Test the product continuously as you build it.

Friday, February 11th, 2005

This post is the third in a seven part series covering my seven tenets of software testing.

To start off, I’ll give credit where credit is due. I first came across this tenet in Microsoft Secrets1 some years ago. Whilst the book is starting to show it’s age these days, it is full of some great little gems of information, and in the past I have made Chapter 5: “Developing and shipping products” required reading for members of my team.

The key idea behind the tenet is that testing starts the day development starts. This is a conscious move away from the waterfall approach where testers don’t get to start their testing until the developers have hit code complete. Starting testing so late in the process creates a situation where the true state of the product only becomes visible in the last third of the project or so. I don’t know about you, but if something is going off the rails, I want to know about it as soon as possible, so I can take some corrective actions before things get really ugly, and expensive to fix.

There are several techniques that can be utilised to start testing earlier in the process, adding significant value to the project.

Buddy Testing

In a perfect world where budgetary constraints don’t exist, (like say for the testers of the computers on Star Trek), a testing “buddy” is assigned to each and every developer. At the end of every day, the developer submits their code and hands over a private release to their testing buddy. The buddy tests the newly crafted code in its semifinished state, and provides immediate feedback to the developer, rectifying any issues before the code is integrated into the main build.

This practice is apparently in wide use throughout Microsoft, and I am led to believe that the ratio of developers to testers in times past was approximately 1:1. However the ration may be more or less these days depending on the product, the quality bar and the amount of automation that is being used.

Well I don’t work for Microsoft, and this ain’t Star Trek, so how can the rest of us utilise this technique?

There are a couple of ways that this approach can be adopted in the absence of unlimited testing resources. Firstly, a tester may be allocated to a number of developers, say, an entire feature team, and they test the feature as it develops, instead of only Joe’s code.

In the complete absence of testers, a developer could pair up with another developer who has sufficient objectivity and emotional detachment from the code that they are testing. (Typically this would need to be a developer working on a completely different feature). To encourage the buddy testing practice, issues found as part of a private release, won’t be entered into the defect tracking system, allowing the developer to resolve the issues as quickly as possible.

Test Driven Development (TDD) and developer unit testing
In the last couple of years, TDD and the nUnit style test harnesses have changed the unit testing landscape. nUnit formalises and automates the unit testing techniques that the better developers were doing in times past. This style of testing is a great technique to improve the quality of the code, and definitely should be utilised in one form or another.

The challenge however, is the developers emotional attachment with their code, Particularly when it comes to performing negative (destructive) tests. As a development / testing professional I can only say that nUnit based unit tests are a great thing, but, they are no replacement for someone who has no emotional attachment to the code, pounding away at it. This becomes particularly important as API only testing becomes less and less effective, (finding fewer and fewer bugs) as the product matures.

Daily build and smoke tests
Discussed in the previous tenet, the daily build and smoke test is a key foundation process for a development team that is serious about producing quality code. This practice that should always be implemented if at all possible.

Pre-Checkin tests
Whilst the daily build and smoke test is great at identifying when something is broken, the technique has a fundamental flaw in that the smoke test will not prevent the breakage from occurring in the first place. If a developer performs a smoke test after they do a local build, but before they submit their code then the problem may be caught before the main branch is broken. The challenge with pre-checkin tests is that they can significantly increase the amount of time that a developer will spend submitting their code. You can expect a lot of resistance from developers for this type of process. Especially if they are used to working on a small team and just checking in to VSS whenever they like. If your developers are used to following a controlled check in process that becomes necessary on larger projects, this should be easier to implement.

Performance testing and application profiling
Application performance is almost always an issue, and the judicious use of a profiler early on can help identify issues that may come home to roost later. Also just stepping though your code in the debugger can provide some valuable insight where time is being spent, although this becomes harder and harder the larger your code base becomes.

Code Reviews
Code reviews are another technique that can significantly improve the quality of software that is being developed. Code reviews can vary from a quick informal review to a full blown inspection. The costs and results will vary along with the formality, but at least some form of review should be scheduled during the development process.

Overall there are a number of different techniques that can be applied to an application as it is being built, and judicious use of the resources that are available can improve the quality of software, from the start of the development cycle.

References
1 Michael A. Cusumano and Richard W. Selby 1995, Microsoft Secrets, pg. 294, Harper Collins.

Seven tenets of software testing

Sunday, January 16th, 2005

In both this MSDN magazine article and this episode of the .net show, Don Box introduced four fundamental tenets for developing service based or connected systems.


  • Boundaries are explicit
  • Services are autonomous
  • Services share schema and contract, not class
  • Service compatibility is determined based on policy

That inspired me to develop my own list of guiding principles that apply to software testing. These tenets are documenting some key learning’s from over the years working as a Test Manager, Senior Consultant and Development Manager for various software development shops.


  • You can’t test everything so you have to focus on what is important.
  • If you are going to run a test more than once, it should be automated.
  • Test the product continuously as you build it.
  • Base your decisions on data and metrics, not intuition and opinion.
  • To build it, you have to break it.
  • Apart from Test-Driven Development, A developer should never test their own software.
  • A test is successful when the software under test fails.
In a series of future posts I will be expanding on the tenets, explaining them in detail, providing links to reference materials; hopefully providing something helpful for you to use on your projects.