Someone found a defect! But that’s okay, because you have a capable development team. Your team finds the problem, then determines how to fix it. Ship the fix, and give a sigh of relief. Whew! Your work is done. Move on to the next thing.
But is the work really complete? If you stop there, you’re losing an opportunity to get the most value from this defect. In particular, what’s going to keep it from happening again?
A defect isn’t just a bump on the road; it’s a resource for us to mine.
In Steve Maguire’s book Writing Solid Code, the author outlines “the two critical questions” we should ask about each bug in our code:
- How could I have automatically detected this bug?
- How could I have prevented this bug?
If we do this right, we not only prevent that one bug from showing its face again. We also close the door on others like it.
Teams that routinely ask themselves these questions create habits that help their code move forward. They get the most out of each defect by using them to create safety nets (detection) and guardrails (prevention). Let’s look at each activity.
Creating Safety Nets (Detection)
How can we automatically detect when someone makes a mistake and reintroduces a defect? It may seem like a daunting challenge, especially if you spent a lot of time analyzing the problem. We can’t automate such analysis, can we? Probably not. But that doesn’t mean we’re stuck.
Defect analysis often points to a single area of code, with the fix in just a few lines. Once I know how to fix the code, I put on my TDD hat and ask, “What microtest can I write that would call this fix into existence?”
This can look tricky if the fix will live deep inside a long method, or inside a private method. But as always, if the structure of the code makes testing hard, change the structure:
- Inside a long method? Extract another method from it.
- Method is private? Make it non-private.
- Exposing too many private details? Extract these parts into another class, and write tests against the extracted class.
Then test-drive the fix. The test should fail without the fix present, and pass when the fix is in place.
Sometimes a defect occurs because component Alpha calls component Bravo with invalid data. The fix is to change Alpha so it doesn’t do that. But is there a way to change Bravo to alert us if any other component makes the same mistake?
Essentially, we want to treat the defect in Alpha (calling Bravo incorrectly) as a single instance of a whole class of bugs. We can set up an alarm inside of Bravo that detects the problem at runtime. This is an assertion.
It’s helpful to assert that the arguments are valid, especially at component boundaries. Let’s say component Bravo takes a string argument, representing a part identifier. And say the problem over in component Alpha was that it called Bravo with a null string. Sure, we fixed Alpha. But in Bravo, we can assert that the argument is non-null. We can also make other assertions—for example, that the string is non-empty.
Assertions will catch problems at runtime. Be aware that failed assertions crash—an effective way to get your attention, but not a good customer experience. Your programming language probably provides two types of assertions: those it removes from your release build, and those it leaves in.
Creating Guardrails (Prevention)
Microtests and assertions catch problems introduced into code. But what if you can prevent those changes from reaching the code in the first place?
Classes over Primitives
Remember when your high school physics teacher corrected you for submitting an answer as a number with no unit of measure? Numbers and strings are primitive types, representing values without meaning. They come with no constraints. We also risk mixing them up when we pass them around.
Instead of passing around a string by itself, we can create an object containing a string, narrowing its use. Let’s think about that part identifier. Instead of a field or property like this,
we can define a new type to wrap the string value:
Then we’d declare function parameters and other variables using this new type. How does this help? Say we have a function with this signature:
(In Swift, function parameters have external labels in addition to internal names. We can omit external labels by adding those underscores.) Here’s an example of calling it:
The first and last parameters are both of type String. So we could make a mistake and get the order wrong:
Nothing is preventing this. And most languages don’t have argument labels—it’s up to you to get the order right. In Swift, the situation does improve if we remove those underscores. Then each parameter has an external label:
Someone with domain knowledge might question this, but the compiler still won’t stop you. But what if we stop passing primitive values and instead pass custom types? Then the function signature would like like this:
With this, there’s no mistaking what goes in where, whether your programming language has argument labels or not. The first parameter must be an instance of
PartIdentifier, and so on. By using types more specific than primitive values, we add semantic checking to our data flow. This makes it so that not all pipes fit together. If you want a part identifier, you need to create a PartIdentifier. And unless you deliberately unwrap it, you can only pass it to something expecting PartIdentifier input.
We’ve now satisfied our physics teacher by stating the units of measure. The pipes only fit together in certain ways.
This also leads us away from the Primitive Obsession code smell. Instead of running checks or assertions about the values where they’re used, we can move that logic inside the new type. Let the type manage its own integrity.
Reducing Impossible States
Programs consist not of single values in isolation, but of changing combinations of values. One common source of errors is to have an overly-broad API. What if we could narrow the API, making it harder to represent impossible states?
For example, Tony Hoare described null pointers as “the billion-dollar mistake.” Languages such as C#, Kotlin, and Swift make it so that things can’t have null values unless you say so. If a class is filled with properties that can all be null, is it really modeling the combinations you need? Using the Swift language question mark to show that a value can be null, let’s say we have a
Part type like this:
This holds a part identifier together with a part value. Or, maybe it’s missing the identifier. Or, maybe it’s missing the value. Or, maybe it’s missing both the identifier and value.
That’s a lot of combinations from just two fields! The problem space explodes quickly. Most of these combinations are likely impossible. If a “part” in the domain always has an identifier and a value, then declare those fields non-nullable.
Swift and TypeScript are also among the programming languages that support sum types, which are very useful for reducing impossible states. (And there is a way to get them in Kotlin using sealed classes.)
What’s a “sum type”? A quick Internet search will lead you to good examples. In Swift, it’s an enumeration where individual cases can have associated values:
Our simple modeling of duck actions has two cases. One action is to quack for a given number of times. Another is to sleep for a given time interval. You can’t accidentally provide a
TimeInterval for a
quack action, or an
Reducing impossible states means we no longer need to code “but what if” logic to handle the cases we don’t want. And that’s less code that can go wrong.
The next time you fix a defect, ask yourself:
- Can we create “safety nets” to automatically detect the problem?
- Can we add “guardrails” to prevent this problem, and a host of problems like it?
Then you can turn to your defect and say, “Thank you! You helped us improve the design of our system.”
If you enjoyed this post, please share it with your colleagues. And there are buttons at the top to share it on your social networks.
We had an open discussion through our Industrial Logic TwitterSpace. Here is the recording of that conversation.