Fast feedback loops are a common attribute of high-performing software development teams. Reliable and fast automated test suites are a critical feedback loop, but common pitfalls often result in slow and flaky test suites that don’t improve software quality or team performance.

Companies are making automated testing more expensive than it needs to be and not realizing its value. Fortunately, automated testing is now widely acknowledged as an essential counterpart to manual, exploratory testing. Nevertheless, many organizations lack an understanding of the costs and benefits associated with different types of automated tests. Armed with this knowledge, these organizations have the potential to realize substantial returns on their investment in automated testing.

Two misconceptions set the stage for a risky and expensive approach to automating tests:

  • The idea that automated testing is the process of automating the steps of manual QA; automating the script the manual tester would follow.
  • These automated tests should be written by the QA organization, outsourced, or written by a team of “test automation engineers” who are not the team writing the software under test.

These misconceptions are travel companions, one is rarely seen without the other, and it spells trouble for the software organizations trying to build their automated test suites.

Who is Responsible for Quality?

This mindset takes hold in organizations that struggle with establishing cross-functional development teams. These organizations adhere to antiquated notions that testing is a phase relegated to the final stages, just before deployment, and subscribe to the belief that “testing is QA’s responsibility.” Meanwhile, the mindset persists that developers should concentrate solely on crafting production code. This disconnect makes it difficult for organizations to improve their automated testing strategy. High-performing software development teams know that cross-functional teams own their quality, not a department.

The Risks of End-to-End Focused Testing

As compared to well-written unit tests, end-to-end tests are slow and expensive. End-to-end test suites often demand substantial time and a considerable number of individuals to maintain their functionality. When end-to-end tests fail, it necessitates thorough investigation to discern whether these failures unveil actual issues in the code. Given the inherent limitation of these tests in covering all of the branching and business logic scenarios within a system, relying solely on such a suite can foster a false sense of security for the organization.

False test failures undermine the credibility of self-validating tests, eroding confidence in test suites plagued by frequent failures. Developers need to investigate failures to determine if they indicate genuine code problems or merely “expected failures” resulting from code changes. In essence, bad tests lead to bad conclusions.

Lacking Error Localization

End-to-end and UI-driven tests cover so much scope that they lack error localization. Failing tests can require significant time investments to determine whether failures truly indicate issues. Upon identifying genuine failures, additional time is needed to pinpoint the specific code responsible for the error and to formulate an effective resolution strategy.

The Test Pyramid

Test Pyramid split into three sections, unit tests, integration tests, and end-to-end tests

In these organizations, I recommend sharing Mike Cohn’s Test Pyramid. While it’s not perfect (no model is), it is an effective way to show organizations how to structure their test suites to get the most benefit from them. The testing pyramid illuminates the fact that organizations allocate a significant portion of their time and resources to the wrong category of tests when they invest too heavily in complicated automation testing frameworks and end-to-end tests.

The test pyramid describes an approach to creating sustainable and valuable test suites. The pyramid’s foundation consists of unit tests, or what we refer to at Industrial Logic as microtests. The majority of the test suite should consist of these tests. There should be a lot of them, they should be VERY small, verify a VERY small area of code (i.e., a few lines of production code) and they should be VERY fast (i.e., they should run in milliseconds). These tests should be able to run in isolation on developer computers continuously throughout the day. Furthermore, a microtest should be cheap to create and easily discarded if it no longer provides value.

The tests that occupy the middle tier are the integration and acceptance tests. Tests with broader system scope, such as those connecting to databases or the filesystem, are more interconnected compared to unit tests. Consequently, these tests should constitute a smaller proportion of the overall test suite. Frequently, they are crafted as business-facing tests using tools like Cucumber or SpecFlow, although such tools are not obligatory.

Finally, the apex of the pyramid is end-to-end tests. There should be significantly fewer of these tests, given their prolonged execution times, reliance on a running application, and increased likelihood of identifying false failures.

The Inverse Relationship of Scope and Detail

It is important to understand the inverse relationship between scope and detail in high-quality test suites. At the foundational level of the pyramid, unit tests ought to be testing all of the system details, necessitating a very narrow scope. In other words, they should verify only a few lines of code in the system. As the tests ascend the pyramid, there should be a corresponding increase in scope. As the scope broadens, the level of detail being tested should decrease.

The major flaw in end-to-end focused test suites lies in their attempt to test minute details and cover a broad scope. This is a recipe for brittle and unreliable tests.

Organizations adopt UI-testing tools like Selenium and attempt to wrap them in complex, homegrown testing frameworks. They use these frameworks to create comprehensive suites of end-to-end tests to replace manual QA testing, without understanding the drawbacks of this approach.

A Real World Example

Several times in my career I’ve encountered QA organizations that create custom frameworks that leverage Cucumber (an excellent tool for BDD) to drive UI testing tools that eventually exercise end-to-end tests. The aim is to provide non-programmers in QA departments the ability to automate their manual QA script. However, the outcome often involves a substantial investment of the organization’s time and resources in creating and maintaining this complex framework and the brittle tests the framework supports.

The Test Ice Cream Cone

The result of this approach is the Test Ice Cream Cone, where a majority of the test suite is occupied by end-to-end and UI-driven tests. Upon encountering such scenarios, my objective is to guide the organization toward realigning its focus, emphasizing the alignment of its test suite to the test pyramid.

The Automated Testing Ice Cream Cone by James Shore

James Shore - Test Ice Cream Cone - Agile 2019 Conference - used with permission

Push Tests Down

Some organizations will resist minimizing these end-to-end tests because they managed to find an issue before it got to production. They see these tests as beneficial since they have decreased the time it takes for manual testing. It’s understandable to see why they wouldn’t want to abandon these tests, however, the fact remains that they are slower and less reliable than tests lower in the test pyramid. My advice to help organizations improve their test suites is to push tests down. If a bug is identified in an end-to-end test or integration test, write a unit test that exposes it and then fix it. This advice can also be found in The DevOps Handbook.

Not only are errors detected during integration testing difficult and time-consuming for developers to reproduce, even validating that it has been fixed is difficult (i.e., a developer creates a fix but then needs to wait four hours to learn whether the integration tests now pass). Therefore, whenever we find an error with an acceptance or integration test, we should create a unit test that could find the error faster, earlier, and cheaper. Gene Kim, Jez Humble, Patrick DeBois & John Willis - The DevOps Handbook

The DevOps Handbook book cover

There is an argument that unit tests can’t catch every bug. While this is true, in practice, escaped defects are often a result of a gap in the unit tests or quality issues with the tests, not because of an inherent failure of unit testing. When this argument is employed, teams miss an opportunity to improve their unit tests.

Lacking Unit Tests

The most common reason organizations adopt this end-to-end testing approach is because they lack a high-quality and trusted unit test suite. Teams frequently abandon unit testing because they struggle with wrapping unit tests around untestable code. Some organizations may persevere through challenges to write tests for seemingly untestable code. The resulting suite of tests impedes progress rather than accelerating it. This is one of many reasons why Test-Driven Development (TDD) is a critical practice, the code produced is always testable as the tests are created before the production code.

You can't write good tests for bad code. Unknown

The Importance of Team-Owned Tests

Organizations that outsource testing to another team or department to try to “free up development teams” to focus on features don’t understand the value of team-owned tests. The best tests are the ones that can probe within the boundaries of the system. When testing is outsourced to another group or a QA department (who aren’t experts in the tested code) they can only test from the system perimeter. The book Accelerate, which describes the science and research behind high-performing and successful software teams, concluded that automated tests that were outsourced to the QA department or another team showed no correlation to organizational performance.

Developers primarily create and maintain acceptance tests, and they can easily reproduce and fix them on their development workstations. It's interesting to note that having automated tests primarily created and maintained either by QA or an outsourced party is not correlated with IT performance. The theory behind this is that when developers are involved in creating and maintaining acceptance tests, there are two important effects. First, the code becomes more testable when developers write tests. This is one of the main reasons why test-driven development (TDD) is an important practice—it forces developers to create more testable designs. Second, when developers are responsible for the automated tests, they care more about them and will invest more effort into maintaining and fixing them. Dr. Nicole Forsgren, Jez Humble, Gene Kim - Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations

Accelerate book cover

Inverting the Test Ice Cream Cone

For software organizations to improve their test suite situation they should start by mapping their tests to the test pyramid. They should place their largest investments in the lowest level tests, which means a focus on unit tests. All future code modifications should come with corresponding unit tests, which require teams to learn and improve their Test-Driven Development, refactoring, and microtesting skills.

As they learn how to build high-quality unit tests and reorient their tests around the test pyramid they can phase out complicated end-to-end testing frameworks and outsourced test suites. Development teams should own their test suites and decide how many integration and end-to-end tests they need. While end-to-end testing is important, the necessity for a significant number of these tests decreases when there are high-quality tests at the lower tiers of the pyramid.

The Great Shape Debate

In recent years there have been discussions and articles stating that the test pyramid is outdated and we should adopt a “test honeycomb” or a “testing trophy” or some other impressively shaped testing model. I find these debates unproductive because they arise from a lack of a shared understanding regarding the definition of a “unit” in a unit test what an integration test should test, etc.

Because of the semantic ambiguity of “unit” and “integration,” we end up with various shapes of testing models. Rather than getting hung up on definitions and the shape of testing models, focus on creating fast tests that exercise the business and branching logic that can run on developer machines continuously. Pair these tests with fewer tests that can exercise the integrated pieces. Martin Fowler explores this in his article, On the Diverse and Fantastical Shapes of Testing.

Conclusion

Software organizations continue to over-emphasize end-to-end and UI-driven automated testing. Creating teams of “automation engineers” to write automated tests for other teams’ code, creates friction in automated testing. Consequently, these organizations aren’t realizing the benefits of automated test suites.

Test suites are one of the main feedback loops available to software development teams. A focus on fast, reliable, repeatable tests accelerates software organizations. Automated test suites are essential, but be aware of the pitfalls in creating them.

Further Reading

Note: An earlier version of this article was originally published on Anthony’s personal website and in The Startup.