jonm.dev

The Art of Writing Software



The Power of Visualizing Iterative Waterfall

Tags [ Agile, kanban, process improvement, Scrum, WIP ]

We’re going through a process mapping exercise at work just to try to understand how we get things done. Now, we are running what I would describe as “scrumfall”; doing Scrum for development but having that sit inside a traditional waterfall process. The waterfall is run iteratively and pipelined, although the degree of true pipeline isn’t what most people think it is, due to developers having to support upstream and downstream activities like backlog grooming and addressing bugs in QA. I thought I would work through the exercise to try to define the value stream that our features actually experience.

Stage Definition
Recorded user story appears on a backlog
Defined user story has acceptance criteria and estimate
Prioritized user story has been assigned a priority/rank
Committed user story has been pulled into a sprint
Coded user story has been marked “complete” in the sprint
Accepted user story has been shown and accepted in a sprint review
Released user story has been included in a versioned release
Tested enclosing release has achieved an acceptable quality level
Approved enclosing release has been approved for launch (go/no-go)
Deployed enclosing release has been deployed to production

Several of Scrum’s standard meetings (plus some other common ones) show up here: backlog grooming moves stories from “recorded” to “defined”, sprint planning moves stories from “prioritized” to “committed”, daily scrum moves stories from “committed” to “coded”, and the sprint review moves stories from “coded” to “accepted”.

Just having laid it out and thinking about it, some observations:

  1. we ask our product owners and other stakeholders to sign off on a particular user story twice, once at the sprint review, and once at the go/no-go meeting.

  2. user stories that may well be production-ready upon reaching “coded” get batched and bound to surrounding stories and thus become deployment dependent on them afterwards, even though they may not be functionally dependent on them

  3. interestingly, even though good user stories are supposed to be independent of one another (the “I” in INVEST), we nonetheless batch them together into sprints and treat them as a unit

  4. we don’t have a good way to understand what happens to stories that don’t get completed in a sprint, or bugs that are deemed non-launch blockers, or production incidents

Another thing as we think about batch size is whether a two week sprint iteration is actually tied to any relevant process capability metrics. For example, interesting metrics to consider here are some things that are per-batch (sprint or release); these are things that take roughly the same amount of time whether they contain one story or one hundred:

And then there are some activities that are dependent on the complexity of a particular story:

Batching makes sense if the organization’s overall throughput bottleneck is on a batch-size-independent step, in which case, sizing the batch so that it runs in cadence with the cycle time of the bottleneck will maximize throughput. To make that more concrete, let’s say we only have certain deployment windows available and can only do a deployment once a week; if this is the slowest part of our process, then we should take batches of work in a way so that upstream steps produce a deployable release once a week. Or, if the slowest part is running a full regression of manual tests over three days, then again, we should take batches that can be finished in three days. Perhaps the product owner is only available once a month to carry out sprint planning or sprint review; then we should batch at a month.

It might seem weird that the calendar of your product manager might be the bottleneck in your software development process, or that it makes sense to roll a release of completed work every three days, but that’s queuing theory for you. Optimizing an overall system’s throughput means organizing the work according to the current bottleneck’s constraints (even if that means non-bottleneck parts might not be locally optimized) and/or moving the constraint elsewhere in the system (Theory of Constraints).

Interestingly, putting the entire workflow up on a kanban board would make a lot of this very obvious, even if all we did was put up WIP limits corresponding to obvious limitations (I can only deploy one release at a time, and I can only test as many releases as I have QA environments, etc.). The great thing about kanban-style development is that you don’t have to change your process to start using it; you just model your current one, visualize it, and then watch what happens. You probably have all the information needed to track the metrics that matter, although you may have to start writing down times when various emails pass through your system (like whether the release went out, or whether the new build got deployed to QA, etc.).

However, to me, the most powerful reason to start visualizing the flow is that it shows you exactly what parts of your process you should change, and when. There’s nothing like being able to show a product manager that their availability is driving overall throughput to encourage spending more time with the team. Or being able to show a development manager that the amount of time being spent doing bugfixing rework in QA is the bottleneck–encouraging practices like TDD. In other words, being able to make an empirical case for the potential use of Agile practices that aren’t currently in place, and then being able to show that they worked. This is a good way to bring about an Agile evolution grounded in facts relevant to the current organization and not just based on opinion or philosophy.