Reasons for Extracting Methods
When we are refactoring, and especially when we are teaching refactoring, our partners are surprised to see us extracting methods which are only (currently) called once.
There is often an assumption that the only reason for extracting a method is so that you can call it from multiple places in the code base.</p>
While this is a valid reason for extracting methods, reuse is only one of the many very good reasons for extracting a method.
The other reason usually touted is "readability" which is a rather vague notion. Extracting a well-named method certainly helps make the code calling it more readable, but there are ways of extracting methods that do not help with readability.
Let's explore a few more reasons for extracting:
- Simpler testing
- Name its operation
- Shorten its host function
- Improve code completion
Often, in writing tests, we discover defects or missed opportunities for improving our code.
TDD relies on a cycle of declaring intent in a test, writing code that works, scanning that code for more insights into how the code should be structured and improved, making improvements, making changes part of the code base.
We refer to the cycle as Red, Green, Refactor, Integrate.
When we follow this cycle, functions can't help but be testable. They tend to also become well-named and readable due to the repeated scanning and opportunity for revision.
Of course not everyone does TDD.
Some people write the code first, and later write tests for it.
Writing tests for existing code harder and feels less rewarding than doing TDD.
People who do test-last have diminishing goals; they start to skip tests that would require them to change the production code (the very kinds of tests TDD encourages) and those that are hard to verify.
Mostly, they get tired of the tedious work of setting up the loops and conditions that allow them to reach some deeply-nested bit of a long function.
Eventually, they "run out of time" and leave big functions partially untested. Who can blame them?
By extracting the block of code that is nested in loops and conditionals, we can test it directly, and cover its paths rather easily.
This reduces the number of tests we need because we are dealing with the small number of paths inside the extracted function instead of the cartesian product of all the paths in all the blocks of code woven together in the original longer function.
Explanatory tests on extracted functions greatly improve the understandability of the code, which in turn reduces the chance that we will make any program-breaking changes (defects) when we add functionality in the future.
Most of us have had to contend with some odd chain of boolean conditions like
if ((x and not y) and not z) or ((x and z) and (q>7))
We find that extracting those complex conditionals into a function with a reasonably descriptive name greatly reduces the difficulties of skimming and understanding the code.
Of course, this makes it easy to test ~securityNeeded~, so that the tests document all the chained conditions and the reasoning behind them (see 'testing', above).
In other cases, we find long stretches of code broken into paragraphs with blank lines.
Often there are "paragraph comments" above the section.
We've seen functions with dozens of paragraphs of code.
We usually extract the paragraph into a function, and to base the name of the function on the paragraph comment. The comment becomes redundant and is deleted.
This shortens and simplifies the original function, and makes the paragraph more testable and understandable.
Since a large component of our work is reading and understanding the existing code, this can save every team member minutes or hours every time the code must be modified in the future.
This "accumulation of small changes" accelerates development -- a difference that appeals to developers, testers, and managers alike!
Shorten the Host Function
The reason a long, deeply-nested function is hard to read is that it is long and deeply-nested.
The details of filtering/converting/copying arrays, the details of conditional logic, the exception handling, the little bits of trivia that collect in a long method can obscure its true purpose and (more frightfully) any defects and failure modes it may have.
With many concerns interwoven, functions become treacherous; any misstep can cause any number of surprising and unwanted side-effects.
The details ignore the intention of the function. It reduces the signal:noise ratio of the code.
In order to clearly see and understand the structure of such a method, we need to move details of sub-operations out of our way.
Sometimes methods contain a number of variables that exist only to support the block of code we intend to move. When we extract the method and its variables, the original code becomes smaller and more obvious.
Extracting a method gets it out of our way so that we understand the function better and manage it with a lower chance of accidentally injecting defects.
Our improved understanding of the function may lead to the "ah-ha!" moment that helps us to simplify, rename, move, test, and improve the method -- thereby making the whole system easier to modify safely.
This is the most obvious reason for creating separate functions, and probably the first reason most of us learn from our non-OO days.
It is my experience from years of refactoring that any function not extracted, named, and moved to its most reasonable/obvious class will never be reused.
Most developers type the name of an object, press the dot key (period) and then select a method from the code-completion list provided by their IDE.
If the operation they desire does not appear in the list, then they write it inline in the function they're composing. It becomes a bit of "feature envy" and "primitive obsession" but it is a small thing.
After all, anyone can write a loop and some subscript operation and maybe copy to a new object. It's really not very fancy or intellectually-challenging.
However, the next time someone uses an object of the same class, the method is still not presented in the list. It is written into other people's functions.
Consequentially, the author writes the method from scratch again, which adds more steps and distractions and operations to the current function and helps none of his or her colleagues who work in the same code base.
If we only extract for reuse, and only when it becomes obvious that an operation is needed in multiple places, then we create opportunities for duplication of time and effort across the team.
So perhaps if we look at an object and don't find the method we want, we should consider adding the method we want to that object.
After all, nobody will reuse it if we don't.
Improve Code Completion
Code documentation and code completion are closely related to code reuse.
Programmers (almost) never read the documentation. The time spent generating JavaDoc, Doxygen, and similar html-style online documents is usually wasted.
Instead, most programmers rely upon their IDE.
Whatever documentation the IDE provides on-the-fly is the best documentation for programmers, because it is likely the only documentation they read in a normal day.
Programmers don't read the source code files from top-to-bottom. While some people write code in long functions because it's easier to read and understand in detail that way (or because it's easier to debug) we find that programmers seldom consume code that way.
When a programmer reads a whole function (or a whole source file) from beginning to end, it is usually because they've exhausted all other options and have no choice.
Programmers skim. They scan the code to find relevant portions. Then they use tools in their IDEs to find where functions are called and where variables are used.
Eventually, programmers find a place to insert some code. They type the name of an object and the press the period ("dot" or ".") key.
The ide responds with a list of methods that exist and are callable for that object. The programmer picks the one that makes sense.
This sequence repeats many times per minute. In a typical day, it is done hundreds of times.
The operation becomes available for code completion once we extract it, name it, and make it public.
Too often, an operation repeats inside several functions, and programmers copy and paste it from one to another.
When the function is extracted it becomes available to programmers via the code completion mechanism that is activated by typing the dot.
Even when functions have light usage, they appear in the list. This gives programmers an understanding of what the class is for, how it is to be used, and what it means in its context.
This may be the most significant reason for extracting paragraphs of code and moving them between classes.
These are not the only reasons why we might use the "extract method" refactoring, and we welcome your additions and corrections below in the comments.
We hope that you consider these reasons as you write and refactor code in your own projects.