“Oh, no! We estimated 23 story points for the sprint, but we only turned in 20. We’ve failed the sprint!”
It seems that a lot of teams, especially scrum and safe teams, are spending a lot of time on story point estimates. This is understandable, and also disappointing.
You see, you can’t estimate your way to predictability.
Sometimes, what we expect and what actually happens are not the same. We can feel that as disappointment or shame or frustration.
This is especially true when we have goals and plans to meet those goals. The plans are specifically built to be followed, and when anything happens that is not according to plan it may set our goal at risk.
A plan is a formulation of how we expect to meet some delivery goal. It includes the things we intend to do, how long we will spend on each item, and how we intend to address foreseen obstacles.
This seems like a reasonable definition for a project.
Welcome Changing Requirements
In the more static world of waterfall projects, one began with the end in mind. One determined what “the system” should be like when finished. Given this well-imagined, carefully-studied end state, groups of people would do careful functional decomposition and assign time estimates to the various pieces. The pieces were imagined like a puzzle; they would fit together to complete the whole. If the pieces were properly described, they could be built by different people all working in parallel. To keep the exquisite plans in place, projects would have “change management panels.” After all, if any piece took longer to complete than expected then the delay or error might propagate and make the whole project late. When all the parts are complete and fitted together, the final picture is expected to emerge, beautiful and whole.
Whereas this method was made to work (or to appear to work – a topic for a different article someday), it is not appropriate to product development; it was a process specifically evolved for contract programming projects.
I have a friend who has a directorship in a company that builds software as a way of delivering their services. A couple of years ago they refused to design to an endpoint (even while agreeing the endpoint may be excellent) and instead delivered incrementally. They learned from customers using early versions of the feature and changed their vision for it to suit the customers’ actual needs. Had they built to the endpoint, the feature may have failed to provide the desired effect. As is, it has opened up new revenue opportunities.
In a product environment, the product concept is under constant revision. What we should deliver may vary considerably from what we originally planned to deliver. This alone disqualifies the waterfall process.
That said, let’s examine estimates and plans in a product-based world.
One of the key issues is that functional decomposition considers only a fraction of the actual cost of building a feature into an existing product.
Some teams estimate effort/duration in story points, though better ways exist. Story points are an XP invention that got transplanted into scrum and other methods. The idea was to abstract away from hours and days because people could make more accurate relative assessments (comparing one job to another) than they can absolute assessments.
This theory seems good enough and has worked for teams in the past when pressure for reporting higher or lower numbers is not present.
This is why it is ill-advised to “normalize story points” to days or hours. The whole point of story points was to abstract away from actual clock-and-calendar time.
If one is to estimate in actual time, then the formality of story points does not provide any benefit.
The effort is only part of an estimate. We may anticipate that we only need to change 15 lines of code in two or three files – a trivial amount of typing. Yet such a change could take days because programming is not typing.
An astute manager once took me aside and showed me a scatter graph of estimates (in story points) to actuals (in days).
Where the estimate was small, the actuals were also small. That seems reasonable.
Where the estimate was large, one would expect the actual to be large in a linear kind of way, but that’s not what we saw.
Instead, sometimes the 8-point stories were done faster than 2-point stories. Sometimes a 5-point story “blew up” and took twice as long as some 13 point stories.
The magnitude of the story point estimate didn’t relate so much to the magnitude of the actual effort, but rather to the variation!
The manager told me that she’d been tracking it for some time and realized that effort was not the only component of an estimate.
The gut-feel of the estimator included (at least) two more elements she identified as:
- Uncertainty: we’re not sure how this is done or how to do it in our environment. We will have to learn new things.
How many? We don’t know.
How hard will they be to learn? We don’t know.
How many mistakes might we make while learning? We don’t know.
- Risk: if we do the thing, we aren’t sure that doing it won’t cause other parts of a complicated system to break. Maybe we add a field to a screen, but then we have to change the database schema, and this change has to be taken into effect in a number of reports.
Which ones? _We don’t know.
What is the damage if we miss one? We don’t know.
How hard will it be to fix? We don’t know.
These factors are significant. An 8-point story isn’t necessarily 8 times as much work as a one-point story. It may be less work but with 8 times more uncertainty and/or risk.
If uncertainty and risk don’t materialize, then it might be a very quick job. If the unknown shows its ugly head, then the team could be at it for quite some time.
Estimators were taking these into account subconsciously but with reasonable accuracy.
A less-enlightened manager sometimes hears a large estimate, larger than hoped. To come up with a “better” (more palatable) estimate they have the programmer break down the work and estimate the pieces, then add the mini-estimates back together. Sure enough, the sum of the estimates is smaller than the programmer’s initial estimate.
This technique removes the risk and uncertainty from the equation, leaving with a much smaller estimate of effort alone.
Any joy at finding a smaller estimate usually evaporates when the actual task is undertaken and the actual time expands to an approximation of the original estimate.
Reality Intervenes Again
Sometimes even when we accept the “gut” answer that seems to include some allowance for risk and uncertainty, the estimate is still wrong.
This is because real work has two more elements which are also not present in estimates:
- Delays: the work may not start when we expect it to start
- Interruptions: the worker(s) may have production crises, a colleague who is stuck and needs help, overrun from under-estimated tasks, sick days, or might be pulled aside to do estimation on future stories.
Delays and interruptions cannot be predicted. A wise project planner will work a lot of “contingency time” into their schedules because projects are always at risk of a cascading schedule failure.
For a product manager, there is less of a need for perfect prediction and more of a need for building slack into the system so that teams can be responsive to interruptions without suffering intolerable delays.
It is understandable that managers, vexed by the recurring unpredictability of software projects, may choose to focus on predicting more accurately and precisely.
The most reasonable-seeming request is that the development teams provide more reliable estimates.
Teams can provide estimates with more contingency time built-in, but then the very long estimated times create an appearance that the team is “sandbagging” – taking more time than strictly necessary. It is suspected that they may not be working hard to achieve the goals of the company; therefore the managers work hard to “bring estimates down” and return to the original predictability problem.
This is a “damned if you do; damned if you don’t” situation. There are contradictory forces that require reliable estimates which are also ambitious and aggressive.
The problem here is not with estimates, but social forces and technical forces beyond the realm of estimation.
The problem isn’t that you aren’t good at estimation in general, but that the work has too much risk, uncertainty, delay, and interruption.
We can’t estimate our way to predictability.
So what can we do?
- We must characterize and reduce the chaos in our system if we are to have any hope of making valid predictions in the future.
- Realizing our plans are soft, we can de-risk by delivering complete end-to-end “walking skeleton” features, while we inspect and improve our development and delivery processes.
- We can keep jobs small so we can conduct many small-scale, safe-to-fail experiments on reducing variability.
- We can use automated testing (“checking”) and more frequent integration to help uncover risks more quickly – while we still have recovery time.
- Teams can reserve capacity so variations can be more easily absorbed. After all, if the team is 100% utilized then there is no capacity to spend examining, inspecting, and improving their work to make it predictable and efficient.
- By delivering incrementally we can adjust and improve our plans on the fly, find more frugal ways to reach our goals, and even adjust our goals as we learn from our users.
Many teams find that once their work is visible (small increments) and predictable, that they are no longer asked to produce estimates nor to track actuals against their predictions.
It takes a bit of work to get there, but any organization is rather more likely to become predictable than to become clairvoyant.