Problem in a nutshell: Work is often split before being developed by a team. The count of work completed is post-split, the count of work still in the backlog is pre-split. If the rate of completion (or velocity) doesn’t account for this split rate, 2x or 3x errors often occur in any forecast using this data. Put simply, forecasts not accounting for split rate forecast early by 2x or 3x – a major miss. (the actual error will depend on how granular your original backlog is, but we see split rate averages of 2.5).

I see this issue in just about every tool vendors attempt at Monte Carlo forecasting using data. They use the throughput rate computed as the count of items completed, and use a Monte Carlo forecast based on the count of items remaining in the backlog. This is a serious error. The two counts are essentially different units of measure.

Definition: Split Rate –  The ratio that backlog work grows whilst being developed. Expressed as a low estimate to high estimate range. For example, 2 means that for each one item in the original backlog, there are two in the completed item list as counted throughput.

Rough analogy: Pilots are aware that their speed over the ground is impacted by head-wind (or tail-wind). They can be flying at 100 knots indicated air speed, and with no headwind they would be traveling 100 knots over the ground. But if they were traveling into a  25 knot headwind, they would only travel 75 knots over the ground even though their instruments might say 100 knots air-speed. If you are a passenger on this plane, hope they added 25% more fuel. Work items splitting in projects suffers the same issue. We often need to use more fuel (do more) in order to deliver all of our planned work (i’m inventing delivery speed to represent this). Our air-speed might read high because we split items into two or three before working on them (we record 2 units of air speed for every 1 unit of delivery speed), or we have defect rework (we make no delivery progress, but we count effort as air-speed). For us to be able to forecast project, we need to know what the ratio of air-speed is to delivery progress. We call this work item split rate.

Estimating the Split Rate Distribution

Splitting is anything that makes a single item in the backlog become two or more items in the completed list. It won’t be the same for every item. Some items won’t be split at all, and some will be split many times. For reliable forecasting, its just as important to estimate the range of split rate as it is the range of story counts. Using an average will cause the same Flaw of Average errors for split rate as any other random input into a model, so our tools use a probability range, most often a simple uniform distribution between a low and high estimate.

Consider what factors will cause splitting in your context. Here is my list of “Things that cause splitting or scope growth” –

  1. Defects. Often a defect is reported when doing the work. If defects are counted in throughput or velocity, then they need to be compensated by increasing split rate. My suggestion is to NOT count defects in throughput or velocity records. They still occur, but our rate of progress is pure “delivered value work.”
  2. Splitting. Work when looked at by the team gets split into multiple smaller parts. This won’t happen for small items in the backlog, but it will happen with the bigger items. The distribution of higher split rates should correlate in some way to work item size estimates (mostly). I’m not saying estimate, but a quick Small, Medium or Large might help identify how many items might be split.
  3. Discovered Work. By doing work often new work is discovered. This growth will often be higher at the beginning of a project, and taper off later. Don’t be alarmed if its huge to begin with. My rule of thumb is it will decrease b half at the half way point of the project, and halve again by the later 1/3. I measure for the first half of the project by recording what the initial backlog count was at the start of the project, and at the half-way point (half the original scope completed). The ratio will be equal to the original count / (count in the completed list – defects)

At the start of a project where no data exists, you still need an estimate to do planning. My rule of thumbs without any data is a uniform range 1 to 3. Here is my process to confirm this is close enough until real data flows –

  1. One is my go-to low-bound. I know that some items will not be split at all. If its a major investment, I get the team to estimate 10 items from the backlog at random. I expect 3 to 4 of them to be 1 point stories or “small.” If its less than that (zero, one or two) I set my low bound estimate split rate to 2.
  2. Three is my go-to high-bound. Again, if investment is major, i look at the 10 random estimates i got the team to do and expect that 3 to 4 items will be 8+ or “large.” If more are large, i bump up my high split rate to 4 or 5. Often this has significant forecast impact, so i need to confirm. I pick a few of the “large” stories and get the team to split them. I take the average of the test splits.

If done well, this take less than one-hour to get a solid grounding of likely split rate. Start with the base rate of 1 to 3, and adjust based on your context.

When real data starts flowing pay particular attention to splits. Look to make your estimate range match actual. Beware of the early outliers though. Sometime the early work is the most technically risks, and high splits are common. Exclude anything that happens just once in every ten items.

Also beware of the “things” that make splitting look bigger than it is –

  1. Some work is abandoned. After the team splits the work, a story or two are left very low priority and shipping occurs without them. I’ve seen this as high as one item for every piece of original backlog. If i see this, i drop 1 from the high estimate.
  2. Minor defects where the risk of fix is higher than leaving them in place.
  3. Carried over stories. Sometime work is closed at the end of a sprint and a new story created. This is an important thing to consider properly, but make sure your realize its happening. If forecasting using cycle-time, rather than throughput – this is already factored into the model, so counting these would double-count.

Split rate is a complex beast to consider. We have included it as an input to our free spreadsheet forecasting tools (Single Feature Forecaster: I want to forecast how long a single feature may take and Multiple Feature Cut Line Forecaster: I want to forecast multiple features at once) and see that many vendors in the industry haven’t considered this issue to the peril of their customers.