Machine Learning to Accurately Estimate IT Project Time and Budget

Has your IT project come in on time and on budget? Ever? Accurately estimating IT projects is a colossal task with many unknowns and assumptions. But big business decisions are made based on IT project estimations, so it’s critical to get it right.

Can machine learning finally get us there?

Absolutely, and we’re doing it now!

Is IT Project Estimation a Problem?

Perhaps the reigning king of cost and time overruns is the F-35. Looking at the F-35 Wikipedia article section on  overruns is daunting. It’s a year-by-year, blow-by-blow list of compounding changes. It’s overwhelming.

“Lockheed F-35’s Cost Could Rise by $1 Billion Because of Extra Testing.” Bloomberg, 2017

But the F-35 isn’t alone. Neither is IT. Other top contending mega-projects managed between 560% and 1,900% cost overruns (statista).

The Harvard Business Review showcased an SAP system migration project the Levi Strauss estimated at $5 million, but resulted in a $192.5 million impact. That’s 3,750% in the wrong direction.

“When we broke down the projects’ cost overruns, what we found surprised us. The average overrun was 27%—but that figure masks a far more alarming one. Graphing the projects’ budget overruns reveals a “fat tail”—a large number of gigantic overages. Fully one in six of the projects we studied was a black swan, with a cost overrun of 200%, on average, and a schedule overrun of almost 70%.Harvard Business Review

PS – For a cool article on the IoT technology inside the F-35, check out our post Top 3 IoT Devices in Aviation.

Can Machine Learning make IT Project Estimation Sexy?

Has this ever happened to you…you are sitting in your weekly portfolio review meeting, one of your Program Managers becomes choked up as they tell you that their high-profile project is already 40% over budget and only 50% of the way complete?  If so, have you ever asked yourself, how did we get so far away from the original estimate?

When we ask IT Project and Program Managers what they lack when creating an estimate, they all express similar struggles.

They oversee the project through its lifecycle, but do not have the necessary information or tools to make an accurate estimate on the fly.  When they seek to gather the information by involving other people in the business, there is no process to help create accountability, so they are forced to go with their best assumption.

We have also found that most of the estimates are still being created in Excel, oftentimes by a single person (Excel Ninja)…talk about a bottle neck and risk.

So why are individuals and companies still using Excel? It’s not that more targeted tools don’t exist, it’s that they aren’t flexible enough to change as the business shifts.

Enter The IT Project “Estimate Modeler”

An Estimate Modeler solution is designed to solve the problem of creating accurate estimates on the fly while interacting with the experts to help refine numbers.  The tech foundation includes:

  • Analytical Dashboards to help avoid surprises
  • Cloud-Based solution allowing for easy collaboration
  • Centralized Project Request processes

The particular solution we’ll discuss here is one we use in actual enterprise IT environments, but provides a great framework for exploring the many ways to improve IT project estimating.

Machine Learning Solution

At its core, we set out to design a solution that utilizes Machine Learning to help estimators make smarter decisions based on historical and actual data.

The solution centralizes the estimation process allowing Estimators to collaborate with Subject Matter Experts (SME) to refine estimates.  The estimate is built up from a default template that is enhanced by user-controlled parameters.  Finally…because we realize you don’t have all of the information in the beginning…the solution uses complexity levels to refine the estimate through the planning process.

Machine Learning to the rescue!

We are using machine learning to help process data and suggest new defaults to the estimator.  We use Linear Regression to process and compare different data sets including: Current Estimate; Historical Estimate; and Actual Hours billed.  We also leverage Bayesian techniques to fill in holes and suggest new variables.  At this point the solution is built to be suggestive–the estimator will still have the final say and ability to override all inputs.

Machine Learning IT Project Estimate

Centralization = Collaboration & Accountability

The solution supports collaboration without having to use other tools.  We have implemented internal triggers to send information requests to SMEs.  The SMEs can access the solution and provide their input on the information requested.  The solution will embrace accountability giving estimators the ability to assign due dates, track progress, and send reminders about upcoming or past due, due dates.

User Controlled Parameters1

The solution uses a set of default parameters in order to gather information about a project.  Reviewing different estimation processes, we were able to create a list of over 100 parameters to get the model dialed in.  These parameters were interchangeable across different industries and business sizes.  The solution will allow the Estimators and SMEs to adjust these parameters like a music producer behind the boards. 

Leveling up your estimates with Complexity2

In order for the estimator to create more accurate estimates they need the ability to recreate estimates as they know more about the project.  We call these complexity levels.  Complexity Levels and Parameters have a direct correlation.  As you go up in complexity level you go deeper into the more granular parameters.  The complexity levels also create benchmarks for the self-learning model to process against. 

Tying it all together – accurate IT project estimates

In order to bring the solution to life, we have utilized open-source tools and industry accepted methodologies.  We have built libraries, functions and a modern user interface on top of the solution in order to remove the need for a developer once it is stood up.


Project and Program Reporting dashboards will allow managers to quickly view project health and financial reports.  User Management and Role Based Access Control allows users to interact with only the components that apply to them.   When up and running, this solution is the framework to help Enterprise Estimators be massively more accurate–whether the project is $100K or $100M.

Terms and Workflows Defined:

1 Parameters – information about tasks, data, or other assets that effect a projects timeline.

  • High – A high level parameter should not require additional information to be gathered from Subject Matter Experts (SME). These parameters are open ended questions: Who is the client; What is the skill level of the PM/s; Is the project scope defined.
  • Medium level – A parameter that would require input from an SME, often times requiring additional research by the SME. These parameters are still open ended questions at this point: Skill Level of PM with Tools X, Y, Z; What is the data model; What is the data source.
  • Low level – A parameter that would require input from an SME. These parameters will require research and some level of analysis.  These parameters ask for quantities: # of Reports; # of Cubes; # of Data Marts.

2 Complexity Levels– The higher level of complexity = higher level of accuracy.

  • Level 0 – This will be the default estimate that is created at the time a project is requested. The estimator will have little information about project specifics so all hours are set at a department level.  The project is still in the Idea Phase.  This estimate can be +/- 100% from actuals.
  • Level 1 – The Estimator has started gathering more information about the project and it has moved to become an Approved Idea. The estimator will create a new level of the estimate answering questions about high level parameters.  This estimate can be +/- 50% from actuals.
  • Level 2 – At this point the estimator will have to reach out to SMEs to answer questions about the project complexity. The project has entered the definition stage at this point.  This estimate can be +/- 25% from actuals.
  • Level 3 – The estimator will continue to interact with SMEs in order to refine project information and deliverables. This will be the last estimate that is made before the project launches into development phase.  This estimate can be +/- 10% from actual.

What do you think?

Let us know in the comments below…are you using machine learning to improve your project estimates?

[embedit snippet=”after-article-getresponse”]


Explore other topics of interest