License to Kludge

Posted March 12, 2011 by Joshua Kerievsky

A kludge is a workaround, a quick-and-dirty solution, a clumsy or inelegant, yet effective, solution to a problem, typically using parts that are cobbled together. — Wikipedia

My colleagues and I have a license to kludge.

We earned that license through years of crafting high-quality code and carefully managing technical debt.

When do we use our license?

We program a kludge on the rare occasion when it offers the highest speed, temporary solution to a critical problem.

Let's consider a real-world example about a defect in a report and how we kludged a fix.

Students of our Greatest Hits have albums, many of which come in language editions like C#, Java, Python, etc.

As a student studies an album, they go through the album's tracks, which are composed of pages.

We record the number of pages that a student has studied in an album and a report displays that progress by album for each student.

Even though an album may come in several language editions, we specify the contents in one place, like so:

<album title="CodeSmells">
  <track title="Introducing Code Smells" id="introducingCodeSmells" authors="Joshua Kerievsky">
    <page title="Welcome To Code Smells" id="welcome"/>
    <page title="Connoisseurs of Code" id="connoisseursOfCode"/>
    <page title="How Sensitive Is Your Nose?" id="sensitivity"/>
    <page title="Introduction To Code Smells" id="introduction" language="C++, C#, Java, C"/>
    <page title="What Are Code Smells?" id="whatAreCodeSmells"/>
    <page title="A Short History Of Code Smells" id="shortHistoryOfCodeSmells"/>
  </track>
</album>

The above XML shows a piece of our Code Smells album, inside of which is a track named "Introducing Code Smells," inside of which are six pages.

Notice how the fourth page (line 6, "Introduction to Code Smells") has a language attribute specifying that the page is available in C++, C#, Java and C.

If you own Code Smells in Python, you won't see that fourth page.

Thus, albums have different page counts, depending on what language edition you have.

We use the language attribute sparingly since we tend to provide language-specific content for nearly every page of every album.

On January 13, 2011, one of our newest clients wanted to see a student progress report.

Our operations manager created the report and noticed an inaccuracy in the data before sending it.

Sure enough, we found that the report was oblivious to the language attribute!

How did that happen? (After all, we are a shop that test-drives every line of code).

Rather than explore the reason, our first priority was to get our new client an accurate report as soon as possible.

We quickly discussed options for fixing the report.

Every idea we had was going to take too much time.

We then explored which albums this client used and which albums had page count inaccuracies.

It turned out that only our Microtesting album was causing problems in the report for this client.

Microtesting in Java currently has 123 pages while all other language editions currently have 115 pages.

Given that knowledge, we decided to quickly fix the report by programming a kludge (see lines 6-10):

public class StudentActivityByAlbum...
    private int getTotalAlbumPages() {
        if (compilation == null)
            return 0;

        if (compilation.getId().equals(&quot;Microtesting&quot;)) {
            if (devLanguage.equals(Language.JAVA))
                return 123;
            return 115;
        }

        return compilation.getNumberOfNonTitlePages();
    }

Awful, right?

 
Great programmers would have rapidly test-driven a quality solution and never needed that nasty kludge in the first place!
 

In the time it took to formulate, program and release the kludge, we would have still been test-driving and mercilessly refactoring a quality solution.

I'd be willing to bet that no programmer out there could have moved faster than we did with our kludge.

The kludge allowed us to quickly recover from our error and rapidly deliver an accurate report to our client.

 
Couldn't your client have waited for you to fix the problem with a quality solution?
 

We don't like to make our clients wait.

We would rather make them happy as quickly as possible and make code improvements afterwards.

Here's another way to say it:

Our decision to kludge was driven by our commitment to excellent customer service.
 
In our shop, once a kludge goes into production, it never gets removed!
 

We didn't remove it immediately.

Here are some reasons why:

  • We have a small programming staff
  • We were already in the middle of working on other important items
  • No other inaccuracies were discovered in the student progress report
  • We were making very few changes to the core set of albums featured in that report

A little less than 2 months after deploying the kludge, we got rid of it by test-driving a simpler design.

Removing the kludge allowed us to change where we compute the count of album pages by language.

Our StudentActivityByAlbum class now just reports the correct page count (line 5):

public class StudentActivityByAlbum...
    private int getTotalAlbumPages() {
        if (compilation == null)
            return 0;
        return albumPageCount;
    }

So what ultimately motivated us to remove the kludge?

Did we find more inaccurate data on student progress reports for other customers?

Nope.

Did we pause from working on new features to just spend time improving our code?

We certainly do that at times, but not on this occasion.

Did we remove the kludge because we happened to be programming in that area of the code?

Sort of.

We've recently been focusing on visualizing usage metrics, both for our purposes and for users.

The inaccurate data in the report was directly related to that effort so it was time to kill the kludge.

Origin of the Defect

So why did the report have inaccurate page data in the first place?

It turns out that the report predated our language attribute by a few years.

So even though we had test-driven both the report and the language attribute, we had neglected to test the two of them together.

A simple oversight.

Although our defect counts are so low that we don't even maintain a defect-tracking system, this oversight will hopefully teach us to be even more careful in our practice of test-driven development.

Conclusions

I'd rather not ever kludge.

Yet if a kludge provides the fastest resolution to a critical problem, I will gladly use it to make a customer happy and then take time to kill the kludge.

Using a kludge and then deliberately killing it reflect our company's core values of excellent customer service and quality code.

Without those values, we would not deserve our license to kludge.