How to Go Faster

Ok, I’m going to tell you how to make your software development organization go faster. I’m going to tell you how to get more done without adding people while improving your time to market and increasing your quality. And I’m going to back it all up with queuing theory. [ By actually explaining the relevant concepts of queuing theory, not just by ending sentences with “…which is obvious from queuing theory”, which is usually a good bluff in a technical argument being had over beers. Generally a slam dunk in mixed technical/non-technical company. But I digress. ]

An Important Perspective

It’s worth saying that this article assumes you’ve figured out how to deliver software incrementally somehow, even if that’s just by doing Scrumfall. The point is that you are familiar with breaking your overall feature set down into discretely deliverable minimum marketable features (MMFs), user stories, epics, tasks, and the like. If you have any customers, you are probably also familiar with production incidents and bugs, which are also discrete chunks of work to do. Now, here’s the important perspective:

Your software development organization is a request processing system.

In this case, the requests come from customers or their proxies (product managers, etc.), and the organization processes the request by delivering the requested change as working software. This could end with a deployment to a live website, publishing an update to an app store, or just plain cutting a release and posting it somewhere for your customers to download and use. At any rate, the requests come into your organization, the software gets delivered, and then the request is essentially forgotten (closed out). Now, looking at your organization this way is important, because it means you can understand your capacity for delivery in terms borrowed from tuning other request processing systems (like websites, for example) for performance and scale. Most importantly, though, is that this mysterious branch of mathematics called queuing theory applies to your organization (just as it applies to any request processing system).

A Little Light Queuing Theory

One of the basic principles in queuing theory is Little’s Law, which says:


where N is the average number of requests currently being processed by the system, X is the transaction rate (requests processed per unit time), and R is the average response time (how long it takes to process one request). In a software development setting, R is sometimes called cycle time.

To put this in more familiar terms, suppose we have a walk-in bank with a number of tellers on staff. If customers arrive at an average rate of one person per minute (X) and it takes a teller an average of 2 minutes to serve a customer (R) then Little’s Law says, on average, that we’ll have XR = 1(2) = 2 tellers busy (N) on average at any given point in time. We can similarly flip this around: if we have 3 tellers on staff, what’s the maximum average customer arrival rate we can handle?

X = N/R = 32 = 1.5 customers per minute

Ok, the last thing we need to talk about is: what happens if we suddenly get a rush of customers coming in? Anyone who has entered a Starbucks or visited Disneyland knows the answer to this: a line forms. (The time a customer spends waiting in line is known as “queuing delay” if you want to get theoretical about it.) Let’s go back to our bank. Suppose we just have 5 people suddenly walk in all at once, in addition to our regular arrival of one person per minute. What happens? Well, we get a line that is 5 people long. But if we only have 2 tellers on staff, then people come off the line at exactly the same rate that new people are entering from the back, which means: the line never goes away and always stays 5 people long.

What does this look like from the customers’ point of view? Well, we know they’ll spend 2 minutes with the teller once they get up to the front of the line, and we know that it will take 5 minutes to get to the front of the line, so my average response time is:

R = RV + RQ = 2 + 5 = 7

where RV is the “value added time” where the request (customer) is actually getting worked on/for, and RQ is the amount of time spent waiting in line (queuing delay). Now we can see that on average, we’ll have:

N = XR = X(RV + RQ) = 1(2 + 5) = 7

people in the bank on average. Two people at the tellers, and five people waiting in line. We all know how frustrating an experience that is from the customer’s point of view. Now, let me summarize this section (if you didn’t follow all the math, don’t worry, the important thing is that you understand these implications):

  1. If you try to put more requests into a system than it can handle, lines start forming somewhere in the system.

  2. If the request rate never falls below the system’s max capacity, the lines never go away.

  3. Time spent waiting in a line doesn’t really serve much useful purpose from the customer’s point of view.

Software development as customer request processing

If your experience is anything like mine, there is an infinite supply of things the business stakeholders would like the software to do, which means the transaction rate X can be as high as we actually have capacity for. This means one of the primary goals of the organization is figuring out how to get X as high as possible so we can ship more stuff. At the same time, we’re also concerned with getting R as low as possible, since this represents our time-to-market and can be a major competitive advantage. If we can ship a feature in a week but it takes our competitors a month to get features through their system, who’s more reactive? Every time the competition throws up a compelling feature, we can match them in a week. Every time we ship a compelling feature, it takes them a month to catch up. Who’s going to win that battle?

Now, one of the tricky things here is that software development is often far more complicated than our example bank with tellers, since we tend to staff folks with different skillsets. If I have a team of one graphic designer, three developers, a tester, and a sysadmin, it’s really hard to predict how long it will take that team to ship a feature, because they will have to collaborate. If I want to hire someone to help them, is it better to hire another tester or another designer? Probably I can’t tell a priori, because it depends on the nature of the features being worked on, and it’s really hard to measure things like “this user story was 10% design, 25% development, 50% testing, and 15% operations.” Nonetheless, we can look at this from another point of view, which is that I have a fixed number of people in the organization, and each person can only be working on one thing at a time (just as a teller can only actively serve one person at a time), and they are probably (hopefully) collaborating on them.

This means the maximum number of things you can realistically be actively working on is less than the number of people in the organization.

If we have more things in flight than that, we know at least some of the time those things are going to be sitting around waiting for someone to work on them (queuing delay). Perhaps they are sitting on a product backlog. Perhaps they are simply marked “Not Started” on a sprint taskboard. Perhaps they are marked “Done” on a sprint taskboard but they have to wait for a release to be rolled at the end of the sprint to move onwards towards production or QA. As we saw above, this queuing delay doesn’t increase throughput, it just hurts our time-to-market. Why would we want that?

First optimization: get rid of queuing delay

Ok, as we saw above, we know that the total response time R consists of two parts; actual value-adding work (RV) and queuing delay (RQ). Typically, it’s really hard and time consuming to try to measure these two pieces separately without having lots of annoying people running around with stopwatches and taking furious notes. Fortunately, we don’t have to resort to that. It is really easy to measure R overall for a feature/story: mark down when the request came in (e.g. got added to a backlog) and then mark down when it shipped. Simple.

Now, let’s think back to our bank example where we had a line of people. Most software development organizations have too much in flight, and they have lines all over the place inside, many of which aren’t even readily apparent because that’s just “the way we do things around here.” Lines are bad. Now, we know the only way to drain these queues is if the incoming feature request rate is less than the rate at which we ship them. Sometimes we can try hiring more “tellers”, but in a recession that’s not always an option. Instead, for many organizations, the best option is admission control, which is to say that we don’t take on a new request until we’ve shipped one out the other side. You can think of this as having a certain number of feature delivery “slots” available, and you can’t start something new until you’ve freed up a slot. This at least prevents you from having your lines get any bigger.

In order to drain the lines out of the system, the easiest thing to do is to periodically retire a slot after it ships. In other words, don’t let something new in just that once. This will reduce the overall number of things in flight, and since presumably everyone is still working hard, what we’ve just gotten rid of must be queuing delay. Magic! So we can just keep doing this and draining queuing delay out of the system, improving our time to market all the time, without necessarily having to change anything else about the way we do things. When do we stop? We stop once we have people standing around not doing anything. At that point, all the queuing delay is out of the system (for now), and we know that we’re at a level where all of our “tellers” are busy. To summarize:

  1. We can remove queuing delay from our delivery process simply by limiting and reducing the amount of work in-flight; this improves time-to-market without having to change anything else.

  2. We can keep doing this until people run out of things to work on; at that point we’ve squeezed all the queuing delay out.

Second optimization: reduce failure demand

The next thing to realize is that the N things we have in flight actually come in two flavors: value demand and failure demand. In our case, value demand consists of requests that create value for the customer: i.e. new and enhanced features. Failure demand, on the other hand, consists of requests that come from not doing something right previously. These are primarily things like website outages (production incidents), bug reports from users, or even support calls from users asking if you’ve fixed the problem they previously reported. If you have someone collecting these, then these are requests that your organization as a whole has to deal with. On the other hand, for each request of failure demand, someone is busy triaging/fixing it when then could be creating new value. In other words:

N = NV + NF

where NV is value demand and NF is failure demand. Or, if we look at things this way:

X = N/R = (NV + NF)/R = NV/R + NF/R

we can see that the failure demand is stealing a portion (NF/R) of our organization’s throughput! This is, incidentally, why spending extra energy on quality up front results in lower overall costs (as Toyota showed); failure demand essentially requires rework.

This means that another way to improve overall throughput of the organization is to reduce failure demand, reclaiming that portion of your throughput that’s getting siphoned off. One way to do this involves figuring out how to “build quality in” on new development, but since software development is a creative process (different every time for every feature), it’s not possible to actually completely prevent bugs. That said, there are many techniques like test-driven development and user experience testing that can help improve quality. The other way to reduce failure demand involves vigorously fixing root causes of failure as we experience them. In other words, when we fix a problem for a customer, we should fix it in a way that prevents that type of problem from ever occurring again, for any customer. This keeps overall failure demand down by preventing certain classes of it, thereby reserving that precious organizational throughput for delivering new value. To summarize this section:

  1. Improve value delivery capacity by reducing failure demand (production incidents and bug reports).

  2. The cheapest way to reduce failure demand is by building in quality up-front.

  3. When serving a failure demand request, we can reduce overall failure demand by also fixing the root cause of the problem.

Final optimization: cycle time reduction

Ok, now we’ve gotten to the point where RQ = 0 (or near zero), so R = RV. Now at this point, let’s look back at Little’s Law:

N = XR

We’ve already established via draining out our queuing delay in the first phase what our target N is (number of requests in-flight). But we still want to ship more with the same number of people; we want X to go up. But recall that:

X = N/R

If our N is fixed due to the number of people we have on staff, then the only way to increase throughput is to reduce R. Now is where we start to look at process changes and automation. How do we make it so that it takes people less time to handle a request? Focusing on this improves not only time to market but also overall throughput. And furthermore, if we are measuring R over time, we have an easy way to do this: change the process in a way you think will help, and then measure if R went down or not. If it didn’t help, try something else. If it made things worse, go back to the old way. Rinse, repeat. The things to try are going to be different for every organization, and one of the best sources of ideas will be the folks actually doing the work. But this doesn’t require any kind of high-tech tracking software – post-it notes on walls with the start and end dates written on them are more than sufficient to measure R and carry these experiments out.

  1. As failure demand and queuing delay are squeezed out of the system, the only way to improve throughput is by reducing response time.

  2. Response time can only be reduced by process changes.

  3. By measuring response time, we have a convenient experimental lab to understand if process changes help or not.

Say, haven’t I heard this all before?

Well, yes. You may have heard pieces of this from all sorts of places. The feature “slots” we were talking about before as a means to [limit “work-in-progress” WIP](, and are often called kanban. The notion of continually adapting your process to improve it is a tenet of Scrum. Test-driven development and pair programming are methods from <a Extreme Programming (XP) of building in quality up front. Failure demand is sometimes called out as a form of technical debt, and the list goes on and on.

Hopefully what I’ve done here, though, without putting a name on any kind of methodology, is explain why all these things are good ideas (or are good ideas to try). Ultimately, practices won’t help unless they do one of three things:

  1. drive out queuing delay (RQ);

  2. reduce value-adding response time (RV); OR

  3. reduce failure demand (NF/R)

In general, the easiest way to do these for an organization is:

  1. reduce the number of things in-flight
  2. aggressively beat back failure demand by fixing root causes and building in quality up-front
  3. measure response (cycle) time and improve via process experimentation

Fortunately, all of those things are very, very easy to measure. If you can mark a request as either value or failure demand, if you can count the number of things in-flight, and if you can measure the time between starting something and shipping it, that’s all you need.

Update: See the next post on this topic for a more intuitive motivation of the theory presented in this article.