A Language for Goodness
A “virtue” is defined as a commendable quality or trait or alternatively a beneficial quality or power of a thing.
While many people are aware of code smells to describe what is bad about bad code, I have long thought that we needed a language to describe what is good about good code.
The virtues are:
The number 7 was not intentional. It was just a bit of luck that 7 Virtues riffs (or contrasts) with other famous lists of virtues, including the Bushido Code (compassion, sincerity, loyalty, honor, courage, courtesy) and the Catholic Seven Virtues (chastity, temperence, charity, diligence, patience, kindness, humility).
We didn’t base our 7 virtues on any of those lists and I don’t know that either of us were even aware of them at the time. I was, however, aware of Ben Franklin’s 13 Virtues.
This language will allow people to say what they like about one way of expressing a code idea vs other ways of saying the same thing.
Do you like it better because it’s unique? Because it’s brief? Because it’s clear (to you)?
But what do the seven virtues mean?
We provide an opposite with each virtue term for context, a graphic illustration, and a short summary.
The first and most important feature of code is that it works.
The code is assembled and integrated. If you run the code, it will get the right answer. It may not do everything that one may wish it did, but it can run and can perform some meager functions.
Without this, no other quality is remotely interesting.
This means that a program must not only have the potential of running someday in the future given the right support and care and services and environment, but that it has already been observed running and behaving as intended.
Code that works today is superior in virtue to code that may someday run.
Code that has been proven to work recently, is better than code that has not. This suggest that code that performs frequently under automated tests (such as in TDD) is more likely to have the virtue that it works.
Code that was tested at some time in the past but has since been changed without testing is less likely to have this virtue.
And code that is working in production is superior to code that has never operated in real life.
This is the first virtue.
It is not the only virtue.
For it to work, work recently, and be proven to work, is merely table stakes. Once it works, then we can talk about the other virtues.
We know the issues with code duplication. We’ve seen the horrors that come from having implicitly shared algorithms and constants.
Code can work, but still be burdensome and dangerous.
Every system of code virtues has come up with one way or another to describe unique code as a virtue:
- OAOO (Once and Only Once)
- DRY (Don’t Repeat Yourself)
- SPOT (Single Point of Truth)
- No Duplication
My preference is SPOT.
The idea of Single Point of Truth is that every fact and every algorithm has a single definition within a body of code, so that every fact or algorithm has a single point of maintenance within the codebase.
The code isn’t bloated by duplication and maintenance is not hindered by finding that one module implicitly counts on another module using a specific algorithm.
One example we’ve seen is prepending punctuation to the starts of names in order to cause them to sort to the top of an alphabetical list. When the lists are modified to not be alphabetical (such as most-recent-first), the names show up as unnecessarily cryptic and can confuse users. This one is benign.
We’ve seen “magic numbers” used in a system that relied on the ID generator’s implementation never using a number greater than 100,000. By creating test IDs of 100,001 and higher they were duplicating the algorithm’s upper-limit fact. This worked fine for some time (years). When the ID generator changed, there was no clear path to also change the test’s implicit dependencies; tests began to corrupt user data (and vice-versa).
Single Point of Truth makes dependencies and assumptions explicit, which makes them maintainable.
Once code works, and has no duplication, the next interesting feature is how complex the code itself is.
Simple and Complicated are words that sound rather subjective, and people tend to use them subjectively. Still, in the 7 Virtues we mean this to be an objective structural idea rather than a relationship between the reader and the material.
“Simple” means that it has few operations, few operands, and very few paths through the code.
If there are dozens of variables, it is less simple (more complicated).
If there are dozens and dozens of computations steps, it is not as simple as if there were only a handful.
!isEmpty(x) is one operator more complex than
isPopulated(x) because of the not operator (
!). It may also be less clear, as human beings have trouble with not-logic, but that is a matter of clarity not simplicity.
Note that the
isPopulated method may merely call
!isEmpty in its implementation, and this makes no difference. Simplicity is a local phenomenon. The code you are reading can more simple even if it invokes functions of considerable complexity.
This includes logical operators, so
isFull(x) is more simple than
x->lastUsed < x->lastAvailable even if both expressions produce the exact same behavior. One function call is fewer operations than two member lookups and one numeric comparison. In addition, the second expression has a number of other problems (primitive obsession, shared knowledge of a data structure, etc) which likely make it less clear as well but simplicity is only concerned with operations, operands, and paths.
unless, and ternary statements will have fewer paths, and therefore be more simple than code that includes these conditionals even if the code that contains those statements is easier to understand.
This is the first subjective virtue in our list.
It is certainly true that readability and clarity are more about a relationship between the code and the programmer than about the code alone.
A passage of code may be clear to one programmer (perhaps its author) and unclear to others (perhaps its maintainers). This is often because of the context the reader has available to them.
Still, subjectivity doesn’t mean that it’s random and unworkable. For instance, food taste is subjective and yet people all over the world eat more potato chips than library paste. Much of subjective taste falls in a narrow band.
For instance, people may prefer two, three, or four character indents. Almost nobody prefers zero character indentation, and similarly it is rare to find someone who prefers 120 character indentation. Usually the range is 2-8 with some outliers. A quick survey of published code shows that 3 and 4 are the most popular, but which your teammates prefer is locally subjective.
A person who does not know how to program will likely find every expression of code essentially unreadable and puzzling, but that hardly seems important to us. Their opinion of the readability of Java, Python, or Clojure styling is not relevant to us.
Our interests fall to the people whose opinions are relevant; those who share the codebase on a day-to-day basis. These are likely people who know the stack fairly well and the domain fairly well.
Do the codebase denizens find a particular expression more or less puzzling than another expression? Is one more idiomatic, and therefore more clear to experienced developers? Is one ordered in a way that seems more logical?
Coding standards are about codifying a shared subjective sense of clear coding ideas. We negotiate them into existence, and amend or change them as needed.
Life is too busy for people to have to reverse-engineer code every time they visit a module. If we can make the code clear to people who will be maintaining and expanding it, we have done virtuous deed.
Easy is not about how easily one can read the code, but how easily one can change the code.
If one can write a new feature into the code in 15 minutes, isn’t that better than if one has to wrestle with it for days?
To a certain degree, the ease of writing is related to how clear, unique, simple, and developed the code has become. Easy is still a distinct virtue.
In some special cases code can meet all of the above virtues, but be intricate and opinionated in inconvenient ways, or it can be organized in a peculiar way that makes it hard to induce a change.
It is much better if the code is organized and expressed so that people can quickly make sense of it, quickly reshape it to meet their needs, and quickly add new functionality or remove old functionality.
This is somewhat subjective, so again we rely upon the taste and judgment of those who share our codebase.
This is a special case of “Easy” and “Simple.”
Primitive obsession is a code smell where the code lacks useful and clear abstractions. All operations are primitive operations (add, multipy, compare, concatenate) against primitive data types and structures.
The basis of good organization is that things that belong together are kept together. Date operations should belong in a date class or a date utility library. Once they are gathered, usually some abstraction is formed.
Say I have dozens of methods that each take in a parameter list consisting of integers representing a year, a month, a day of the month, an hour of the day, the minutes of the hour, the seconds of the minute, and a timezone indicator. This group of 7 primitive integers will usually give way to a structure of some sort.
At that point, the functions that operate on 7 integers become functions that operate on
date structures. Perhaps the methods become member functions of the date structure (creating a class).
Users of the date class have much simpler code (and a single point of truth) by calling date.nextDay() than by duplicating the logic of calculating the date one day from now.
In this way, most software projects become their own kind of Domain-Specific Language for solving problem in that domain. This makes code simpler and easier, but also more clear in the system.
Similarly, in functional systems composed higher-order functions can take some of the pain and suffering out of manipulations.
This one sometimes surprises people, but concise code is generally better than chatty code for expressing the same concept.
For example, consider how much better it is for me to say
instead of saying:
These code fragments all get the same result, but several of them have extra complication from additional variables and operations. That makes them less simple than is necessary, which also makes them less clear.
It seems to suggest that when we have applied the other 6 virtues, that this one will usually come along for free.
I think that’s the case, but still we may have some incidents when a language feature such as
reduce can provide a better answer than using a
for loop or a
Consuming more space than necessary can reduce the signal:noise ratio in the code.
All Together Now…
The idea of code virtues is not to replace or augment any laws of software design, but to help recognize and describe the “goodness” we see in “good” code.
These virtues are placed in order to help us consider making tradeoffs. We should generally never consider making code brief at the cost of being clear, nor making it clear at the cost of actually working.