Refactoring – Prepare, Improve, and Clean

Posted January 8, 2016 by Bryan Beecham

In a recent blog post, Joshua Kerievsky wrote about Modern Agile. In the intersection of the disciplines Deliver Value Constantly and Make Safety a Prerequisite, is Test & Refactor.

Refactoring is at the heart of the way we improve code. We make changes to the code for the purpose of making it easier to understand and easier to modify while maintaining the same behaviour.

Hopefully everyone reading this is familiar with the Test Driven Development (TDD) cycle of Red, Green, Refactor, and Integrate. This article is going to take a closer look at what happens when we get to the Refactoring step.

Inside this step, we look at our design and decide whether or not there is a better way for the code to be written. It is critical to note that we are not changing the behaviour of our code at this step. If we use Martin Fowler's definition:

Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of code after it has been written. ~ Martin Fowler 1999

In spending time refactoring code, I have noticed that there are three sub steps or phases that we do:
1. Prepare
2. Improve
3. Clean

Each step is optional. We don't think of them sometimes since the code may already be prepared for the improvement and sometimes there is no cleaning required afterwards. There is value, however, in looking at this in three separate phases. Prepare has also been referred to as nesting, making room, prefactoring or rough-in. We make small changes to the code so the improvement can take place.
Improve is when we make the improvement to the code without changing its external behaviour.
Clean is when we remove code we have made unnecessary and put the code back in a state to start something new. At this point we are done refactoring and can run tests, and, assuming they pass, integrate this refactoring.

Let's take a look at a simple example:

In the coding kata Roman Numerals, we have to convert integers to their corresponding Roman numerals. For example, given 1 the program should return I, given 25 the program should return XXV, and so forth. This kata is great for people new to katas as it shows them the importance of choosing the next test to write. If you keep things simple when you start you might end up with python code that looks like this:

def convert(arabic):
    numeral = ""
    for i in range(0, arabic):
        numeral += "I"
    return numeral

This would handle one, two and three and you would be thinking about your next test. The hasty will begin tackling four as it is the next number in the sequence. It is however complicated as four is IV in Roman which is made up of one and five. A simpler choice at this point is to move to 10 next. For brevity let's say our pair of developers decide on the strategy of following 10 with 20 and 30. Their code may end up looking like this:

def convert(arabic):
    numeral = ""
    current_number = arabic

    for i in range(0, current_number):
        if current_number >= 10:
            numeral += "X"
            current_number -= 10

    for i in range(0, current_number):
        numeral += "I"

    return numeral

After confirming their tests are passing, they would look for refactoring opportunities. The first for loop looks similar to the second but is not the same. If these two blocks were the same we could combine the code.
Step 1 - Prepare - Prepare for the refactoring by making the code the same.

def convert(arabic):
    numeral = ""

    for i in range(0, arabic):
        if arabic >= 10:
            numeral += "X"
            arabic -= 10

    for i in range(0, arabic):
        if arabic >= 1:
            numeral += "I"
            arabic -= 1

    return numeral

The structures are the same now but content is a bit different. Our next change makes the blocks identical.

def convert(arabic):
    numeral = ""

    arabic_value = 10
    roman_value = "X"
    for i in range(0, arabic):
        if arabic >= arabic_value:
            numeral += roman_value
            arabic -= arabic_value

    arabic_value = 1
    roman_value = "I"
    for i in range(0, arabic):
        if arabic >= arabic_value:
            numeral += roman_value
            arabic -= arabic_value

    return numeral

The two blocks are identical, let's continue to prepare by moving the numbers into lists.
Step 2 - Improve - Improve the existing code.
In this situation we remove duplication in the code by combining the two blocks. This makes the code easier to read and doesn't change its behaviour. Remember to run your tests at each step to make sure you haven't changed the expected behaviour. Here is the look of our code combined.

def convert(arabic):
    numeral = ""

    arabic_digits = [10, 1]
    roman_digits = ["X", "I"]
    for i in range(0, len(arabic_digits)):
        while arabic >= arabic_digits[i]:
            numeral += roman_digits[i]
            arabic -= arabic_digits[i]

    # arabic_digits = 1
    # roman_digits = "I"
    # for i in range(0, arabic):
    #     if arabic >= arabic_digits:
    #         numeral += roman_digits
    #         arabic -= arabic_digits

    return numeral

Note the change of the if to a while in combining the logic blocks. Robert C. Martin writes more about this in the Transformation Priority Premise.

And finally:
Step 3 - Clean - Remove anything unnecessary and format the file
In this case, the old code that has been commented out can be deleted.

def convert(arabic):
    numeral = ""

    arabic_digits = [10, 1]
    roman_digits = ["X", "I"]
    for i in range(0, len(arabic_digits)):
        while arabic >= arabic_digits[i]:
            numeral += roman_digits[i]
            arabic -= arabic_digits[i]

    return numeral

This refactored code is now fairly easy to read and easy to update. You just need to update the two lists:

    arabic_digits = [1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1]
    roman_digits = ['M', 'CM', 'D', 'CD', 'C', 'XC', 'L', 'XL', 'X', 'IX', 'V', 'IV', 'I']

Do you see examples of the these three steps in your refactoring?
1. Prepare
2. Improve
3. Clean
Please let me know by leaving a comment below.

I would like to thank the following people for their input into this post: Chris Freeman, Tim Ottinger, Curtis Cooley, Mike 'GeePaw' Hill, Bill Wake, Gerard Meszaros, Joshua Kerievsky and Alexandre Freire.