Characterization testing, aka Golden Master testing, is a technique where you apply known inputs to a process to verify the output against a known result. I have found this to be a great technique for testing legacy code that does not have many tests. However, my first use of this technique was at a nuclear power plant. So how did I use characterization tests in nuclear power?
Shipping Low-Level Radioactive Waste
After college, I started working at the Clinton Power Station as a radiological engineer shipping low-level radioactive waste. Working in the nuclear power industry required having sufficient controls in place to ensure the health and safety of the workers and the public. This was especially true when transporting low-level radioactive waste. Both the Nuclear Regulatory Commission (NRC) and Department of Transportation (DOT) have regulations that required documentation to ensure each shipment was prepared safely.
We had a step-by-step procedure on how to create the proper paperwork using a software program known as RADMAN. We could have created the paperwork manually, however, doing so was time consuming, complicated and subject to human error.
The software program generated the appropriate NRC and DOT paperwork so we didn’t have to do it by hand. It was an early Windows application that used Fortran to do the necessary calculations.
The software was reviewed and approved by the NRC in 1984. However, the NRC and DOT frequently changed the regulations. This caused us to get frequent updates from the vendor. As a result, we wanted to ensure that for several different types of shipments the software always produced the proper paperwork. But how did we know that the software was producing the correct paperwork? To answer this question I developed a procedure for regression testing the software using a characterization testing approach.
Tim Ottinger and Jeff Langr describe the steps to create a characterization test below. Similarly, to test the RADMAN software, I manually prepared the paperwork for several different types of shipments. I saved the results to a characterization test document. The document would be used to confirm that the software was generating the correct paperwork.
Subsequently, when we received a new version of the software, I created fake shipments for the scenarios in the characterization tests. Then I compared the software output to the characterization test document. If there were any discrepancies, we would notify the software vendor immediately. Using characterization tests gave me full confidence in using the software to generate the required shipment paperwork.
After several years, I left nuclear power for software development. And I continue to use characterization testing. I find it very useful for working on legacy code that does not have a lot of existing tests. Before making changes to the legacy code, I will write characterization tests to provide a safe way to proceed. Doing so gives me a way to get rapid feedback should I unintentionally alter the existing behavior.
For example, consider the Gilded Rose code kata. It is a fictitious legacy code base for an inventory system. The system works but is not very clean code. The code is all in one function as shown here in the Python version:
The kata exercise asks you to add a feature to this system. This will be difficult unless you refactor the code first. However, you don’t have adequate test coverage to do the refactoring safely. This is where you can use characterization tests.
Emily Bache has a great video explaining how you can do characterization testing on the Gilded Rose kata using a framework called Approval Tests. The Gilded Rose kata exists in many different programming languages. We will look at the Python version. The kata provides a failing test that can be modified to use Approval Tests:
Knowing the GildedRose class is being used in a fictitious production environment gives us confidence that it is behaving as desired. Therefore, you can use Approval Tests to generate about 80 known test inputs. The inputs will exercise all parts of the code, as shown here for Python:
Having these characterization tests then allow you to safely refactor the GildedRose class. For example, here is one possible way to refactor the GildedRose class:
Refactored Unit Test
As well as one possible way to refactor the unit test:
As you can see, the refactored code is much easier to understand. Consequently, the code will be much easier to modify in the future. Likewise, there are many other possible ways to refactor the GildedRose class and unit test. This is merely one example.
In conclusion, one benefit of working in software development is that I can use a testing framework like Approval Tests to automate the characterization tests. I’m glad that I don’t have to do them manually like I did at the nuclear power plant.
Special thanks to early reviewers Josh Kerievsky and Tim Ottinger.