Traffic Light Simulator – Learn how delays compound into cycle time “shapes”

Posted by in Featured, Forecasting, Reference |

An early paper I wrote on the economic impact of process decisions (see http://focusedobjective.com/paper-the-economic-impact-of-software-development-process-choice-cycle-time-analysis-and-monte-carlo-simulation-results/) attempted to outline how cycle time is impacted by development process choice and how it has altered over the years from the early 1990’s to the current day. I often get asked why this is useful, and decided to take a shot at providing a teaching framework for helping other understand it, and why they care.

I just release the Traffic Light Simulator. This spreadsheet shows the eventual travel time for cars traveling along a commute to work that has five traffic signals. Playing and varying the probability of hitting each light, and the delay time incurred if that happens, the simulator shows a histogram of how many cars achieve the allowable travel times. It also shows a detailed map of all of the possible situations cars may encounter. The lucky ones get all green lights, the unlucky get all red.

Get the Traffic Simulator Spreadsheet here. (plain excel spreadsheet)

Set the probability of up to 5 delays and see the impact on the cycle-time (travel time) distribution

Understand how many cars are impacted by red and green traffic signals, and how this plays out into different probabilities

Exercises to learn how different process decisions about delays impacts cycle time

Many real world processes, like car travel, follow this general pattern. A general amount of expected time if things go well, plus a number of possible delays. Possible delays are expressed as probabilities, 0% never, to 100% always. Software development work is one such process. Work we undertake has a hands-on time, plus a number of possible circumstances that slow that work down. By understanding how delays cascade into eventual cycle time, we can make smarter decisions about what improvement ideas are more likely to work than others.

This is an active area of exploration for me in 2017. My hypothesis is that given just the evident cycle time distribution currently exhibited by teams, the process can be identified. This spreadsheet has five other hypotheses, and I’m interested in hearing reasons why they are wrong.

For now, I’m just starting to fill the spreadsheet with interesting exercises. There are two at the moment. One gets you to find the delay probabilities that cause an Exponential distribution common to service and operations teams. The second exercise gets you to define delay probabilities that causes a moderate Weibull distribution common to software development teams.

Exponential style distribution – common for operations and support teams

Weibull style distribution – common for software development teams

Learning why cycle time distributions match either Exponential or a skewed Weibull gives solid evidence what improvement factors might work. For example, If the distribution looks skewed Weibull, it’s likely that the estimates of effort WILL NOT correlate to the cycle time. This is because the process is dominated by delays, and the amount of time spent actually hands-on the work in minor in comparison to idle time. Solving the reasons for the delays is the best avenue for improvement. If the current cycle time distribution is Exponential, then work time dominates the cycle time. Automation and more people are the better ways to decrease cycle time and throughput in these cases.

Get the Traffic Simulator Spreadsheet here. (plain excel spreadsheet)

It is early days, and I’m sure there are more insights. I’m hoping you join me for that journey.

Troy

 

Read More

Can Estimation Be Harmful?

Posted by in Featured, Forecasting | 1 comment

There is significant debate about whether estimates are waste. Too little debate as to whether (more correctly when) they are misleading.

When asked the questions, “Are estimates waste? Are they harmful?”, my answers are “Sometimes, and sometimes.” Situations of never or always are dangerous. What determines a definitive yes or no are the pre-conditions required to sway the balance one way or the other. This post is about what pre-conditions make estimates useful and beneficial and the conversely – what pre-conditions make estimates not just wasteful but misleading. This is all very new material, and likely not correct! I want the conversation to start.

NOTE: Nothing in this article says you should stop or start estimating or forecasting. This article is looking at the reasons why you should trust an answer given ANY forecasting technique or tool. If its working keep doing it until you find something cheaper that works just as well.

Why are size estimates used?

When Story Point estimates are used for forecasting the future delivery date or timeframe, a sum of unit of Story Points is converted into calendar time. Most often dividing a sum of unfinished work by an average velocity number (sum of completed points over a period of time, a sprint for example).

The same transformation occurs for people using Story Counts (no size estimate of each item is attempted other than splitting, just a count of items). In this technique, the count of unfinished items is divided by the average count of items finished in some period of time (a week for example).

There really isn’t a massive difference. Each technique is a pace based model for converting an amount of remaining work to calendar time, by simple division of some measure of pace. If you have used a burn-down chart, burn-up chart or cumulative flow chart to extrapolate how much longer, then you have seen how ongoing progress is used to convert a unit of un-finished work into how long in calendar time that work would take to complete.

Given that background this article will assume, “The goal of software estimates is to convert “size” into “calendar time”” – this is true if using Story Points or Story Counts. Sure there are other uses for estimates, but the purpose of this post is to discuss whether estimates can cause poor decisions and why.

The six requirements for estimates to be useful/reliable time forecasters

I commonly see six main reasons that cause estimates to degrade in useful proxy measures for converting size into time. The six are –

  1. Estimable items: The items under investigation are understood and can be accurately sized in effort by the team (who has the knowledge to estimate this work)
  2. Known or estimable pace: The delivery pace can be accurately estimated or measured for the duration of the work being delivered
  3. Stable Estimate and Time Relationship: There is a consistent relationship between effort estimate and time
  4. Stable size distribution: The items size distribution doesn’t change and is consistent over time
  5. Dependent delays are stable: Delays likely in the work could possibly be known in advance don’t change
  6. Independent delays are stable: Delays not due to the item itself but other factors like waiting for specialist staff don’t change

It’s unlikely any software development system of complexity fully satisfies all six assumptions. Small deviations from these assumptions may not matter.

How small is small enough to not matter? This is an area too little research has taken place. We know it occurs, some teams report managing to hit estimates. Others report failing. A way to know in advance if the odds are stacked against estimates will be a reliable predictor is needed.

Note that five out of the six reasons have nothing to do with the items estimated themselves, they have to do with the delivery system and environment.

This is an important point – even if the estimates themselves are PERFECT, they still may not be good predictors of calendar time.

For some contexts common in larger Government Aerospace and Defense projects, most of these assumptions are covered through rigorous analysis, which is why estimates are seen to be of benefit. In other contexts, teams are asked to give estimates when all six assumptions are violated. These teams are right to assume estimates are waste.

I want teams to say, the estimates aren’t just waste but are misleading and have the evidence to prove that.

To this ambition, I’m working on simple diagnostic worksheets to determine how likely your estimates are impacted by these factors. The goal is to show what system areas would give the biggest bang for the buck if you wanted to use some unit of size estimates for future calendar time forecasts. If we need to use calendar time in decision making (not saying we always need to, but sometime we do), then lets understand how exposed we are to giving a misleading answer even given due rigor.

Please vigorously attack these ideas. Here is what I want –

  • I want to move the conversation away from waste into usefulness.
  • I want people to understand that similar poor assumptions will apply to story count forecasting techniques, and to know when.
  • I want people to go one level deeper on the Never Works / Always Works arguments into the contexts that cause this to happen.
  • I want to learn!

Troy

Disclaimer: I strongly AVOID story point estimates for forecasting in ISOLATION. I use throughput (delivered story counts over time) primarily, BUT USE story points and velocity as a double check at least once every three months. So, I work fluently in both worlds and think you should never throw away a double check on your mathematics until it’s too costly for the benefit it provides. I also think for the part the team is responsible for, they can get better at that – estimation is a skill worth learning.

Read More

Doing Team Metrics Right

Posted by in Featured, Reference, Tools | 1 comment

It seems common knowledge that measuring teams and presenting data is an Agile dysfunction. I disagree. But can see and have participated in abusive metric relationships in the past. I think we need to discuss better ways of achieving an evidence based Agile approach; without those participating feeling (or being) abused.

Here are my top five list of traits that make metric dashboards useful –

  1. Measure competing things – its relatively easy to game a single metric, so its important to measure the impact of moving one metric by showing the others. Help teams target moving one metric and observe any negative impacts on others.
  2. Make informed and smart trades – trading something the team is better than other teams in similar circumstance for something they desire to improve. Help teams identify what metric category they could trade (be less good) to raise another metric (become better).
  3. Trends not numbers are important – observing unintended drifting over time of metric averages. Its about understanding something has changed, not how good or bad. Help teams react earlier to often slow moving regression in a metric or two. Less effort in correction the earlier it is detected.
  4. Look for global or local trends – Comparing trends across teams is key to spotting system level opportunities (every team is impacted) versus single team opportunities. Help teams target improving things they can do without fighting a system level factors they are unlikely to solve.
  5. No team will be good at everything – If a team is adversely trending on one metric, point out they are above average on another. Pick competing metrics so that no team will be great or terrible at all of them. There will always be a mix.

This list borrows heavily from the work of Larry Maccherone who correctly observed that a balanced blend of metric types gives the most information for identifying trends and improvement opportunities. His advice is to measure at least one thing from four broad areas –

  1. How much
  2. How well
  3. How responsive
  4. How repeatable or reliably

An implementation of this was recently made available in spreadsheet form. Driven from work item Start date, Completed date and Type, the spreadsheet builds a dashboard page in Excel. The choice of the four metrics was somewhat from experience, and there are plenty of alternative that might fit your context better. The advice stands though, pick a metric from the four areas.

To help roll out the dashboard, we have created a single page cheat-sheet to educate the teams on what each metric means and what to expect if that metric is overdriven. The goal is to stable in all four, not excessively good at any one.

Download the spreadsheet by clicking here

Download the cheatsheet documentation here

As always, love feedback and stories as to how this spreadsheet has worked in the real world. We want to improve it over time.
Troy

Read More

Is Agile Anti-Science? Not yet, but trending that way.

Posted by in Featured | 4 comments

There are many engagements where I work alongside very smart people. From leading coaches and trainers in the Agile world, to smart teams committed to delivering quality products that solve customer problems. In the trenches there is a constant feel of improvement and curiosity for doing better tomorrow what ailed us today. And a constant enquiring mind as to “why?”

I don’t see the same vigor in Agile conferences. I see a narrowing of ideas presented. I see risk adverse programs that cater to a simplistic mass message. 

This is a similar predicament that technical journals in other fields have faced for years. A huge bias exists for publishing experiment results where an outcome was positive and expected, with rare publication of studies that failed to show expected results (ironically, often more or at least equal learning in failures). The pressure to cater for readers need to see articles from luminary known personalities versus taking a risk on currently unknown often polarizes work in the “old way.” Not to mention the commercial pressure of advertising and sponsorship concerns that shouldn’t influence editorial, but survival to offer anything is dependent on making sure they get value for money and continue support.

The dumbing down process starts silently. Commercial frameworks stifle innovation and polarize messages. Add certification, and you accelerate that stifling of new ideas freely emerging. This is a sure fire way to extinction.

To avoid this plague, here are a few suggestions for balancing a conference program –

  1. Blind submission process – hard to do in reality, but most academic programs are build absent of the authors name or affiliation. The topic is discussed at length. Not, we can’t knock them back.
  2. Conferences should publish in advance the allocation of subjects they want covered. This is crudely done at some conferences by having tracks, but even within a track they should say the percentage of topic allocation they want. E.g. 20% ideas for managing dependencies, 20% ideas for creating safety in teams.
  3. The abstract should be brief during the submission process and then upon acceptance constructed into what the track chair and program desires in collaboration with the speaker (just like an editor in journals and book publishers commonly do)
  4. For abstracts that are important but from a first time speaker, pair the science expert with a luminary speaker and have them co-present or work together. TED talks have shown that given coaching ANYONE PASSIONATE about a topic can make a compelling talk.
  5. At the start of the conference, each track chair should present 10 minutes about the program they have assembled, and help the attendees understand why they should attend each talk. Often, the abstracts are too abstract for people to bother reading, and important sessions don’t get attended because people don’t know what they are about. A good topic title wins out over good content every time.

These are just a few ideas. I want to keep the Agile community vibrant and on a quest for learning. I think Agile conferences are a leading indicator of how new ideas might be lost, and want to avoid that. Not every conference is bad, but some are.

Troy.

 

 

Read More

Top Ten Data and Forecasting Tips

Posted by in Featured, Forecasting | 2 comments

Here is a list of the top 10 tips i find myself giving out. Its not in any particular order of importance, just the order they come to my head. Its a long weekend, so writing things down helps me relax. Would love to hear yours, so please add them to the comments.

1. If two measures correlate, stop measuing the one that takes more effort. E.g. If story counts correlates to story point forecasts, stop estimating story points and just count.

2. Always balance measures. At least one measure in the following four domains: Quality (how well), Productivity (how much, pace), Responsiveness (how fast from comitting), Predictability (how repeatable) (thats Larry Maccherone)

3. Measure the work, not the worker. Flow of value over how busy people appear. Its also less advantageous to game, giving a more reliable result in the longrun. Measuring (and embarassing) people causes poor data.

4. Look for exceptions, don’t just explain the normal. Find ways to detect exceptions in measures earlier. Trends are more insightful than individual measures for seeing exceptions.

5. Capture at a minimum, 1- the date work was started, 2 – the date it was delivered and 3 – the type of work (so we can see if its normal within the same type of work).

6. Scope Risk play a big role in forecasts. Scope Risks are things that might have to be done, but we aren’t sure yet. Track items that might fail and need reworking, for example server performance criteria or memory usage. Look for ways to detect these earlier and remove. Removing isn’t the goal – knowing if they will definately occur adds more certainty to the forecast.

7. Don’t exclude “outliers” without good reason. Have a rule, for example 10 times the most common value. Often these are multiple other things that haven’t been broken down yet so can’t be ignored.

8. Work often gets split into smaller pieces before delivery. Don’t use the completion rate as the forecast rate for the “un-split” backlog items. Adjust the backlog by this split rate. 1 to 3 times is the most common split rate for software backlogs (but measure your own and fix).

9. If work sits idle for long periods waiting, then don’t expect effort estimates for an items to match calendar delivery time. In these cases, forecast system throughput rather than item sizes (story points).

10. Probabilistic forecasting is easier than most people expect. If average are used to forecast (like traditional burndown charts) then the chance of hitting the date that gives is 50% – a coin toss. Capture historical data, or estimate in ranges, and use that.

Read More

Do Story Size Estimates Matter? Do your own experiment

Posted by in Featured, Forecasting | 2 comments

This is one of the most common questions I receive when introducing forecasting. Don’t we need to know the size of the individual items to forecast accurately?

My answer: Probably not.

It depends on your development and delivery process, but often system factors account for more of the elapsed delivery time than different story sizes.

Why might story point estimation NOT be a good forecaster?

Consider commuting to work by car each day. If the road is clear of traffic, then the distance travelled is probably the major cause of travel time. At peak commute time, it’s more likely weather and traffic congestion influence travel time more than distance alone. For software development, if one person (or a team) could get the work and be un-disturbed from start to delivery of a story, then story point effort estimates will correlate and match elapsed delivery time. If there are hand-offs to people with other specialist skills, dependencies on other teams, expedited production issues to solve or other delays, then the story size estimate will diverge from elapsed delivery time.

The ratio between hands-on versus total elapsed time called “process efficiency.” Often for software development this is between 5-15%. Meaning even if we nailed the effort estimates in points, we would be accurately predicting 5-15% of elapsed delivery time! We need to find ways to accurately forecast (or remove) the non-work time influenced by the entire system.

This is why using a forecasting technique that reflects the system delivery performance of actual delivered work is necessary to forecasting elapse time. To some degree, traditional story point “velocity” does represent a pace including process efficiency, but it has very little predictive power than story counts alone. So, if you are looking at an easy way to improve process efficiency, dropping the time staff spend on estimation might be a good first step.

Running your own experiment

You should run your own experiment. Prove in your environment if story point estimates and velocity perform better than story count and throughput for forecasting. The experiment is pretty simple, go back three months and see which method predicts the actual known outcome today. You can use our forecasting spreadsheets to do this.

  1. Download the forecasting spreadsheet Throughput Forecaster.xlsx
  2. Make two copies of it, call one “Velocity Forecast.xlsx” and the other “Throughput Forecast.xlsx”
  3. Pick a prior period of time. Say, 3 Months. Gather the following historical data –
    1. Number of completed stories per sprint or week. A set of 6 to 12 throughput samples.
    2. Sum of story points completed per sprint or week. A set of 6 to 12 velocity samples
  4. For each spreadsheet enter the known starting date, the historical data for throughput or velocity, and the sum of all samples (a total of ALL completed work over this period) as the starting story count and velocity (in the respective spreadsheets).
  5. Confirm which method accurately predicted closest to the known completion date.

This experiment is called backtesting. We are using a historical known outcome to confirm out forecasting tool and technique hits something we know to have occurred.

If performed correctly, both spreadsheets will be accurate. Given that, is the effort of story point estimation still worth it?

Troy

Read More

Forecasting techniques – effort versus reward

Posted by in Featured, Forecasting |

Why should I use probabilistic forecasting? This is a common question I have to answer with new clients. I always use and recommend the simplest technique that answers the specific question being asked and progress in complexity only when absolutely necessary. I see forecasting capabilities in five stages of incremental improvement at an effort cost.  Here is my simple 5 level progression of forecasting techniques –

forecasting levels of capability

Level 1 – Average regression

Traditional Agile forecasting (1) relies on using a running average and projecting that average out over time for the remaining work being forecast. This is level 1 on our capability measure. Does it work? Mostly. But it does rely on the future pace being similar to the past, and it suffers from the Flaw of Averages (read about it in the book, Flaw of Averages by Sam Savage). The flaw of averages is the terminology that covers errors in judgement because a single value is used to describe a result when the result is actually many possible outcomes each with a higher or lower possibility. When we project the historical average pace (story point velocity or throughput), the answer we calculate has around 50% chance. A coin toss away from being late. We often want better odds than that when committing real money and people to a project.

flaw of averages 1

flaw of averages 2

Level 2, 3, and 4 – Probabilistic Forecasting

Probabilistic forecasting returns a fuller range of possibilities that allow the likelihood of a result to be calculated. In the forecasting software world, this is normally “On or before date x.” In a probabilistic forecast we look at what percentage of the possible results versus all results we calculated were actually “on or before date x.” This allows us to say things like, “We are 85% certain to deliver by 7th August.”

Probabilistic forecasting relies upon the input parameters being non-exact. A simple range estimate like 1 to 5 days (or points, or whatever unit pace is measured) for each of the remaining 100 items is enough to perform a probabilistic forecast. Its the simplest probabilistic model and gets us to level 2 in our capability. The goal is that the eventual actual result is actually between 1 to 5 days for an item. Our spreadsheet tools use this technique when estimates are set to “Range estimate”  (download it here).

Levels 3 and 4 are more refined range estimate forecasts. Level 3 specifies a probability distribution that helps you specify if part of the range estimate is more likely than another. Low-Most Likely-High estimates are this type of distribution. It helps firm up the probabilistic forecast by showing preference to some range estimate values based on our knowledge of the work. Over the years different processes have demonstrated different distribution curves, for example, manufacturing often shows a bell curve (Normal distribution) and software work shows a left-skewed distribution where the lower values are more likely than the higher tails. This allows us to take a good “guess” given what we know what values are more likely and encode this guess in our tools. It is more complex, and to be honest, we only use it after exhausting a straight range estimate and proving an input factor makes a material difference in the forecast. Out of ten inputs there might be two that fall into this category.

Level 4 forecasts use historical data. Historical data is a mix of a range estimate (it has a natural lowest and highest value), and probability for each value. Some values occur more often than others, and when we use it for forecasting, those values will be given more weight. This naturally means our forecast match the historical nature of the system giving reliable results.  Our spreadsheet tools use this technique when estimates are set to “Historical data”  (download it here).

Level 5 – Simulation + Probabilistic Forecasting

Level 5 forecasts model the interactions of a process through simulation. This is the domain of our KanbanSim and ScrumSim tool (see Downloads to download this tool). It allows you to make a simple or as complex a model as you need that exhibits the same response as your organizational process. This not only helps understand the system and forecast in detail, it allows you to perform “what-if” experiments to detect what factors and process setup/assumptions give a desirable result. This what-if analysis is often called Sensitivity Analysis, and we use it to answer complex process questions with reliable results. But, it takes some work, and if your process is changing, or inconsistent, or unstable – then this may not be the best investment in time. We can help advise if we think you need this level of forecasting.

Which one should you use?

Avoid any regression based forecasting. With our free spreadsheets and tools there is little upside in doing it the “traditional” way and risking the Flaw of Averages causing you to make a judgment error.

A probabilistic technique at level 2 if you have no historic data, or level 4 if you do is our advice. All of our spreadsheet tools allow you to use either range estimates or data for the forecast inputs. Given its free, we can’t break down the barrier to entry any more than we have – download it here.

Use our simulator if you have complex questions, and we are here to help you make that step when you need it.

Troy.

 

Read More