Everyone wants their software development organization to be faster and more predictable.
For most organizations, this is possible.
The changes are not easy but are fairly straightforward if you begin with an understanding of why work is slow and unpredictable.
Let’s explore a system (not hypothetical) and its problems, and then suggest some ways that their way of working might be improved.
This pattern of problems may sound familiar; if it does then the kinds of solutions we provide may be helpful for your organization.
Please consider trying some of these solutions, and give us a comment at the bottom to tell us how it works for you.
Preventing Software Delivery
The problem begins with a simple realization that nearly all software development policies and processes exist to prevent software from being released.
It’s not intentional. They don’t overtly want to prevent software from being released. It’s just the way the system works.
Our policies are based on the understanding that some of the work people are doing is going to be badly done. We have to stop the bad changes from reaching production!
Filtering steps are put in place so that bad software changes can be caught and removed, while good software changes can be allowed to flow (somewhat more slowly) to production.
These filters are set up as gating steps in a process: a developer does some work, the work is inspected, reviewed, or tested, and can pass forward through the process only if it meets the standard of quality at that stage.
To understand the system, think of from the point of view of the work itself:
I am a code change. If you don’t stop me, I’m going to go into the next release!
The gating steps are a way of stopping the change from making it into the codebase for the next release.
Typically a process has several of these gates. Some typically observed gates are:
- Code review
- Pre-merge code review
- Team lead approval
- Merge (may reject if it fails to merge or build)
- Post-merge review
- Automated testing
- UX testing
- Correctness testing
- Automated security scans
- Deployment testing
- Regression testing
- Load testing
- Managerial approval
If the stages of the process are drawn in a diagram, they are usually presented as a pipeline, showing code flowing from a developer to a code reviewer to a merge to a testing department and so on.
These diagrams give the impression of an orderly assembly line where all of the work will flow seamlessly and in an orderly fashion.
The Slow Trickle of Work
If the work were to actually flow smoothly through the system, it may be 2-3 days of development followed by minutes of automated tests, a quick merge, a day or two of testing, a bit of security and load testing, and a deployment!
Each new feature should be expected to emerge from the other end of the process in less than a week. It’s predictable and estimable and pretty straightforward if we just follow the plan, right?
What we observe is not a smooth flow from start to finish. Instead, it is unclear what the status of work might be and when it might complete. Even the quality of the work is difficult to assess.
Part of the reason for this is the inherent constantly-increasing complexity of software as we add features that bring new constraints into the system, some may be because of questionable practices like scatter-gather.
Let’s focus on the two key features that aren’t shown on our tidy pipeline-flow documents: Queuing and Loops.
Queuing happens because all the people in the system have plenty of work to do; they aren’t just sitting around waiting for one specific programmer to finish one specific piece of work.
In order to hand-off work to a busy person, you have to put the work in their inbound queue. In physical offices, this used to be done with an “in box” or “in tray” but in the modern era it is usually via some electronic system (possibly even Slack or email).
The work will wait until the person becomes available and checks for the next available job.
If we interrupted a person to handle our job, they have to to change jobs (in which case the work in progress goes back on the queue). People can actually work on only one item at a time.
Queuing is a natural and inevitable result of having handoffs where people are busy.
The Development Queue
Work is assigned to developers (as a group or as individuals) in most organizations, since other ways of working haven’t penetrated into many large corporations yet.
Typically a lead, manager, or product owner will pass work to a developer to be done. This person will not join the developer in doing the work but will move on to other duties. That makes this a handoff.
Wherever there is a handoff, there is a queue.
Developers are usually kept busy, so the work to be done will have to wait for developers to be available.
How big is this queue, and how long do items typically wait, on average?
Often 2 weeks of work will be assigned. Some work will begin immediately, but other work may not be started until the last day, so the average time an item sits in the queue is around 5 business days.
While work duration is variable, we find that often a programming job takes a few days. Some of the work may take minutes or hours, some may take a day or a week.
The reason for the variation is wrapped up in various issues, including the specific difficulty of each task, uncertainty around the platform and language, risk of breaking existing software features, how easy or difficult the code is to modify, the habits and skills of the developer, and whether the organization is skillful at breaking down larger features into small, deployable units of work.
We’ll for our purposes, let’s assume that jobs average 2 days or so. You can adjust the numbers for your organization.
So, on average, there would be 5 days of wait time for two days of development.
The Inspection Queue
It is assumed that some changes will be bad changes so it seems prudent to have people review each change to the system.
When I need you to review my work, I will hand off my work to you for inspection.
Wherever there is a handoff, there is a queue.
How long is this queue? How fast does it empty?
On one team, the lead developer decided that he would batch up all his code reviews and do them all on Thursday in order to protect his development time – all in the interest of efficiency.
He did a lot of reviews in a short time and always gave good feedback. The time to wait was a little under 3 business days on average (note that this would be considered a prompt review in many corporations). This isn’t extreme or unrealistic, so we will use 3 days as a review wait time. Again, you can adjust for your organization.
So we have an average wait of 5 days to start the work, 2 days of work, 3 days of waiting for a review, and then an hour of inspection. We can expect any given item to take a bit over 10 working days.
Our original idea of releasing at least a feature a week is very unlikely now. But let’s look at the rest of the process.
The Merge Queue
After developers’ code was finished and reviewed, it was merged with other changes that had also been individually reviewed and accepted.
Often reviewed code is handed off for a manager’s approval before being merged.
Wherever there is a handoff there is a queue.
So how big and how long is this one? Let’s call that a one-day wait.
Merges sometimes don’t go very well1 and need human intervention. A person is involved, so this is at least one more handoff.
Wherever there is a handoff there is a queue.
How long is this queue? How quickly does it empty?
Luckily, most of the time the merge is done the same day as the review, sometimes a day or two. Let’s add a day to the time for each feature (now about 12 or 13 business days on average).
The Test Queue
Merges create a unique new version of code that has never existed before. The various changes, which were correct when tested alone, may combine with unintended consequences.
It seems reasonable to add some kind of testing to help spot and correct any subtle defects caused by the merge.
Since the people who made, reviewed, approved, and merged the changes aren’t doing the work with the testers, this is a handoff.
Wherever there is a handoff, there is a queue.
How long is an item queued for testing on average? How fast does it empty?
Well, the integration test environment has to be prepared, and it is often in use by other testers. We have to queue up to access the integration environment, and then for that environment to be prepared with test data and the current version of code. Let’s call that 2-3 days.
After that, it will take a few days for the tester to have confidence that the new changes didn’t cause problems.
With this addition, 17 business days sounds pretty reasonable for our feature.
I suspect that many of our readers who are working in scrum shops are shifting uncomfortably in their seats. A typical scrum “sprint” is 10 working days, and already this process has made delivery within the sprint impossible.
But let’s move on.
Then We Deploy, Right?
After that, we have to hand off the work to security testing and performance testing. These will queue up for a few days, and each process is only two or three days long. Let’s add 4 days for each of them, so now we’re looking at something around 25 business days to complete.
There was a time that people delivered new releases every few years or every 18 months, but that time has passed. Today, even the relatively short period of 5 weeks is uncomfortable.
Feeling this discomfort, many organizations start looking at how to speed up the developers so that they can get their changes sooner.
If a 2-day change takes 25 days to make it through this system, how much faster would it have completed if the developer were twice as fast? Right. It would be 24 instead of 25 days. Nobody would notice the difference. If they were twice as slow, it would only be 27 days, which hardly matters at all.
Don Reinertsen says to “watch the baton, not the runners.” The problem is not with the speed of the programmers, but with the flow of work through the system as a whole.
It isn’t just the bad code that is slowed by the process; all code must pass through the system because we can’t know in advance which changes are the bad changes that need inspection.
Moving one work item through the process has a fixed minimum time, regardless of quality. What is the maximum time? Is there a maximum time?
In our system, we are checking for bad changes at many steps in the process. What do we do when we find one?
It’s very unlikely that the team or management will say “never mind, we didn’t really want this.”
When a problem is spotted at any of our gating steps, the work is returned to development for remediation.
When work is returned, it rejoins the flow at an earlier stage of the process. It will eventually return to the failed inspection step after remediation.
Returning work forms a loop in our process.
Returned Work Delay
While the above description of work flowing through approval gates sounds intolerably slow, now we realize it is unrealistically fast2 since we didn’t count the delay caused by rejecting a bad change.
If an initial code change is returned by the reviewer for any rework, then:
- The work will have to return to the development queue to wait for developers who have since moved on to other work.
- If the returned work moves to the front of the developers’ queue, it delays all of the other work in the queue. If it joins the back of the queue, it will be in the queue for a longer period.
- Where there is a lot of work in progress3, often a prioritization step will be added. A manager must prioritize returned work against the development teams’ other work4. The work will be handed off to the manager for prioritization. Wherever there is a handoff, there is a queue. This adds one more delay to the process.
- When the returned work is remedied, it will have to re-enter the reviewer’s queue for inspection. This adds another 3 days on average, as it did the first time.
- We don’t know for sure that the work will pass through the review without rejection after rework. It may have new problems. Some bad changes are tricky, especially with complicated or messy code5, such that a correction to one defect may cause another defect and therefore another return. So we can see that the opportunity to reject bad changes comes at a considerable cost.
How long will it take to create a change? It depends on the time items spend queued, the number of other items in the queues, their comparitive priority, and the number of times they loop.
It’s hard to blame the manager or the developer for not predicting delivery in such a dynamic system.
After that, if there is a defect noticed in testing, then it will be returned to developers for rework again, after which it will have to work it’s way through the reviewer, code merge, and testers again.
Every loop causes the work to revisit all of the intermediate queues and approval gates. It may loop again on any of those nested loops before moving on.
Returning work delays planned work, so we can’t be sure that even perfectly-made changes can pass through the system in a mere 25 days - it depends on the state of all the other work in progress at the time.
At this point, we’ve lost any semblance of predictability. We can’t be very sure of when any of the work will finish.6
Remember these are still only two-day changes on average, and most remediation takes only minutes to hours to complete each time. The developers’ time is fairly predictable7, as is the reviewer’s time, as is testing. It’s the non-work time that’s hard to imagine. The developers are fully utilized with their work, getting it into the review, merge, and testing pipelines as fast as possible.
Without a better understanding of the impact of the development system, it seems absurd that a two-day change may not be deployed for months.
Here is a shocker: the above description is a simplification. Consultants in our field see slower and more complicated systems of work.
How About Better Estimates?
Everyone is busy, but few things are ever finished.
Now the organization, frustrated with the unpredictable trickle of finished work, will demand that the development group provide them with reliable estimates that can be passed on to customers and other stakeholders.
This seems very reasonable, except that the developers are no more capable of overcoming the system’s innate unpredictability than anyone else in the system.
Our best coping mechanism is to look to statistical methods.
On average, what is the age of an item when it is finally deployed to production? What is the variance? Can we tell users that a new idea that enters the process today is 85% likely to make it to production in 157 working days, plus or minus 20?
To the broader organization’s cries of anguish, we can only say “that’s just the way it is.” We’re correct in this, because it is the inevitable consequence of the system we are using for software development.
This is a system of policies and choices put in place by people over time, grown organically into an effective release-preventing system through a series of unpredictable nested delays.
Every system is perfectly (if not intentionally) designed to get exactly the results it produces. – W.E. Deming
We need a system that doesn’t exhibit the same unwanted behaviors.
Tackle The System
The system described above is complicated by three dominant factors:
Consume Queues Faster
The first and most obvious thing we can do is to reduce queuing times.
We can cut the number of items in the queues. By lowering the number of items in progress (or in possession since they’re waiting instead of progressing) we shorten the queues. This dramatically shortens our time to market. The effects of lowering Work In Progress (WIP) are well-documented and worth a web search or two.
We could also reduce busyness in order to raise responsiveness. If we always had a tester and a test environment standing by, we could keep the testing queue from filling.
That sounds inefficient, having paid people waiting for work - but what if it saves several days per work item? How valuable would it be to deliver software changes sooner and more often? Remember “watch the baton, not the runners.”
Some part of the work could be done by automated means, which could allow some of the work to begin (and possibly complete) without any human intervention. For instance:
- Repetitive tests can be automated and run automatically every time a change is submitted. Human intervention is only required when code fails its tests.
- Merges can be done automatically on approval, and only queue for human intervention if they detect a merge conflict
- Security scans and load testing are routinely automated, and will require human intervention when new vulnerabilities are spotted.
Some of these tasks have an indispensable human component: You can’t avoid having humans test your system if you care about user experience, nor can you safely ignore the human side of security or reliability.
Still some tasks can be automated to free up human beings for the distinctly human side of their work.
Note that these automated tasks require ongoing human attention; there is no “set it and forget it” here, but it can be well worth the effort.
Eliminate Rejection Loops
Loops multiply the number of times we queue the same piece of work.
To eliminate loops, we have to reduce the incidence of returned work.
Since we still don’t want bad work to go to production, we have to reduce rejections by making better work products to begin with.
While we can’t necessarily eliminate all mistakes, we can make it possible to notice and fix them much earlier in our process.
We must target making and correcting all of our errors early, during our initial editing sessions, so that work doesn’t loop back.
Many organizations are mistakenly trying to speed up the programmers rather than the process. Developers respond to the demand for raw speed by rushing hastily-written code to inspection and testing, and of course this increases the number of times code will loop back for remediation.
Since the wait time dwarfs development time, reducing waits and loops should free up considerably more time for development.
Is there some way to raise our first-time-through percentage?
If every change is as perfect as we can make it, perhaps by spending more time in careful programming, testing earlier and more often, and involving more people in the work, then we could have far fewer returns, trending toward zero returns per item.
Better tools help programmers spot errors sooner. When editors started providing color syntax highlighting, the improved awareness of grammar stopped the majority of grammatical errors in software.
Now we have security scanners and “lint” tools built into our development environments. By being aware of problems before submitting code for review, developers can avoid round-trips.
While these smart, augmented editors can prevent a number of problems with languages, they don’t understand the software we write; they don’t have visibility into our problem domain and the solution we are trying to create.
A technique that helps here is test-driven development (TDD). With TDD, developers write small, fast tests before making changes to production code. The tests pass when the code is correct, and the same tests are run after each change to help ensure that developers don’t make mistakes that would cause existing code to fail its tests.
This provides a basic safety net to ensure that code works within its immediate domain, augmenting smart tools with application intelligence.
Still the (micro)tests we use in TDD don’t protect us in the larger system context. We need broader tests in addition to TDD’s tests.
But writing code that won’t fail during automated systems testing will likely require more knowledge than a single programmer can hold in their head at one time. We may need more perspectives brought to bear.
Working in pairs has been shown to reduce defects by as much as 40%, while also helping a team to maintain higher quality and adherence to standards. Other ensemble techniques like mob programming and swarming should only compound the benefits.
If we compose our teams well, then for most mistakes we might make, someone in the team should have the skill and knowledge to recognize and correct them before the work is passed on for review or testing.
Team Up To Eliminate Queues
We can cut the number of queues by combining consecutive process steps, such as combining development and review, combining review and merge, etc.
To eliminate queues, you must gather the people who would have handed off the work between them and have them work on one work item together, reviewing and testing on-the-fly. That includes developers, testers, UX, UI, security, … all the people.
If they can start together, work together, and finish together then there are no handoffs and queuing between them.
This idea of working together in a cross-functional way may sound radical, yet when we look at the system as a flow and are aware of loops and queues, it is an obvious simplification.
Imagine the speed increase of eliminating the queues between coding and review, between review and merges, between merges and testing. The queues in many companies hold the code up much longer than the team spends working on the code, so you eliminate a major source of delay and schedule uncertainty.
Without loops, and without queues, we would only be concerned with the amount of time it takes to actually do the work. If loops and queues are responsible for 90% of lead time, this could theoretically give you up to a 90% speed improvement8 per work item, though that is more than anyone reasonably expects or promises.
Add into the mix some automated testing in the background, and suddenly it seems reasonable that a given change could actually make it to production within a few days’ time9.
This works without even hiring better programmers or everyone working harder, or even invoking productivity management.
It’s a significant system change, and will require some change of habits, but it’s not impossible or unthinkable.
It’s always handy to improve technical skills, too, but where the problem is the system, the only way to be faster and more predictable is to deal with the waits and queues in the system.
This is due to something we’re currently calling integration drift, where code in a developer’s local branch and the main code line are changing independently, slowly becoming incompatible. It is complicated, and perhaps a topic for a different blog post in the future. ↩
Yes, I know it doesn’t sound fast. But bear with us. ↩
Queued work is not truly “in progress”. I like to refer to assigned-but-waiting work as being “in possession” rather than “in progress”. ↩
We have seen managers working hard to prioritize correcting dozens of partially-finished features, thousands of production errors, and hundreds of new feature requests. Systems like this can get out of hand, and even the best and most-organized managers struggle just to keep up. ↩
Refactoring is a technique we use to keep code from becoming messy, so that we can make changes without breaking anything. Sadly, many teams don’t choose to do refactoring and some aren’t allowed to. ↩
On average, we know roughly how many features or programming assignments are done in a given period, so we can rely on statistical methods here, but completion time for an individual item is uncertain. ↩
For small values of “predictable ↩
Once they get the hang of working together. ↩
This is an understatement. Via ensemble programming and automated testing, many organizations are deploying code to production many times per day, sometimes hundreds of times per day. This style of work is called “Continuous Deployment.” ↩