Master’s Student, School of Information Technology and Engineering, Kazakh-British Technical University, Republic of Kazakhstan, Almaty
PROBABILISTIC MODELING AND ANALYSIS OF BUDGET OVERRUN RISKS IN IT PROJECTS
УДК 004.942:005.8
ABSTRACT
Budget overrun remains one of the most critical challenges in IT project management because it affects financial stability, project delivery, and stakeholder confidence. This paper presents a probabilistic analysis of budget deviation risks in IT projects using empirical data analysis, conditional probability estimation, correlation analysis, sensitivity analysis, and Monte Carlo simulation. The study is based on a dataset of 585 IT projects containing variables related to project size, effort, complexity, requirement stability, risk score, defects, and schedule deviation. Monte Carlo simulation with 10,000 iterations was applied to estimate the distribution of possible budget outcomes and evaluate uncertainty. The results show that 69.4% of projects experienced budget overrun, while 32.8% had high overrun above 25%. The expected simulated budget deviation was 13.96%. The proposed probabilistic framework can support contingency planning, uncertainty assessment, early risk detection, and data-driven decision-making in IT project management.
АННОТАЦИЯ
Превышение бюджета остается одной из наиболее актуальных проблем в управлении IT-проектами, поскольку напрямую влияет на финансовую устойчивость организаций, сроки реализации проектов и эффективность распределения ресурсов. В данной статье представлен вероятностный подход к анализу рисков отклонения бюджета в IT-проектах с использованием методов эмпирического анализа данных, корреляционного анализа, условно-вероятностного анализа, анализа чувствительности и моделирования Монте-Карло. Исследование основано на наборе данных, содержащем информацию о 585 IT-проектах, включая характеристики размера проекта, трудозатрат, уровня сложности, стабильности требований, уровня риска, количества дефектов, отклонения сроков и других факторов. Для оценки неопределенности и распределения возможных бюджетных отклонений было выполнено моделирование Монте-Карло с использованием 10 000 итераций. Результаты показали, что 69,4% проектов столкнулись с превышением бюджета, а 32,8% — с высоким уровнем превышения более 25%. Полученные результаты демонстрируют, что бюджетное отклонение является многофакторным и вероятностным явлением. Предложенный подход может быть использован для планирования резервов, раннего выявления рисков, оценки неопределенности и поддержки принятия управленческих решений в сфере управления IT-проектами.
Keywords: IT project management, budget overrun, cost deviation, probabilistic analysis, Monte Carlo simulation, risk assessment, project uncertainty.
Ключевые слова: управление IT-проектами, превышение бюджета, отклонение стоимости, вероятностный анализ, моделирование Монте-Карло, оценка рисков, проектная неопределённость.
Introduction
Budget overrun remains a persistent challenge in IT project management, as cost, schedule, and scope uncertainty are widely recognized as key constraints in project management practice [1; 2]. Organizations initiate IT projects with planned cost, effort, and schedule expectations; however, actual project execution often differs from initial estimates due to uncertainty, changing requirements, technical complexity, and resource constraints. When the final cost of a project exceeds the planned budget, the organization may face financial losses, delayed business benefits, reduced stakeholder confidence, and additional pressure on project teams.
The problem of budget overrun is especially important in IT and software-related projects because such projects are frequently knowledge-intensive and difficult to estimate accurately [2; 5]. Unlike repetitive operational processes, IT projects may involve new technologies, evolving user needs, complex integration tasks, and uncertain development effort. Even when formal project management methods are applied, budget deviations may still occur because project risks interact with one another during implementation [7].
Existing studies on project risk management often consider risk as a broad multidimensional concept that includes cost, schedule, quality, and operational outcomes [1]. Such an integrated view is important, but it may not provide enough detail for understanding one specific outcome: budget overrun. Budget deviation deserves separate investigation because it directly affects financial planning, resource allocation, and project portfolio decisions. A project may be technically successful but still problematic if its actual cost significantly exceeds the approved budget.
Traditional budget risk assessment usually relies on deterministic estimates, expert judgment, and static risk matrices [1; 6]. These approaches are useful for initial planning, but they do not fully represent the uncertainty of possible cost outcomes. A deterministic estimate gives one expected value, while project managers also need to understand the range of possible deviations and the probability of exceeding critical thresholds. Therefore, probabilistic analysis is more suitable for evaluating budget overrun risk under uncertainty, since simulation-based methods can represent a range of possible project outcomes rather than a single deterministic estimate [3; 8].
This paper focuses on probabilistic modeling and analysis of budget overrun risks in IT projects. Unlike broader studies that examine multiple project risks at once, this study narrows the analysis to budget deviation and cost-related uncertainty. The research uses descriptive statistics, correlation analysis, conditional probability analysis, Monte Carlo simulation, and sensitivity analysis to evaluate budget overrun patterns in a dataset of 585 IT projects [3; 4].
The main research questions are as follows:
- What proportion of IT projects in the dataset experienced budget overrun?
- What is the expected magnitude and uncertainty range of budget deviation?
- Which project variables are most strongly associated with budget deviation?
- How do different project scenarios affect the probability of budget overrun?
- How can probabilistic results support financial risk management in IT projects?
The contribution of this study is a focused empirical analysis of budget overrun risk using probabilistic methods, which are widely applied in uncertainty analysis and project risk assessment [4; 5]. The study provides not only descriptive information about budget deviation but also simulation-based estimates, scenario probabilities, and factor influence rankings that can be used for decision support.
Materials and methods
The study follows a quantitative research design based on empirical data analysis and probabilistic simulation, which supports project planning and decision-making under uncertainty [1; 3]. The methodological process consists of six stages:
- Preparing the dataset and identifying budget-related variables;
- Creating binary indicators for budget overrun and high budget overrun;
- Calculating descriptive statistics for project characteristics and budget deviation;
- Analyzing associations between project variables and budget deviation;
- Estimating conditional probabilities for selected project scenarios;
- Applying Monte Carlo simulation and sensitivity analysis to evaluate uncertainty and factor influence.
The purpose of this design is to move from simple descriptive analysis to probability-based interpretation, which is consistent with probabilistic reasoning approaches used in risk analysis. Descriptive statistics show the general distribution of budget deviation, while conditional probability analysis and Monte Carlo simulation help estimate risk under uncertainty.
Figure 1 shows the proposed research framework for analyzing budget overrun risks in IT projects. It illustrates the transition from dataset preparation and budget overrun indicator creation to statistical analysis, conditional probability estimation, Monte Carlo simulation, and final interpretation of sensitivity and scenario results.
/Khamidulla.files/image001.png)
Figure 1. Research framework for probabilistic budget overrun analysis in IT projects
The analysis is based on a public dataset containing 585 IT projects and 12 variables. The dataset includes both categorical and numerical indicators related to project characteristics, effort, risk, and delivery outcomes. The target variable of this study is Budget_Deviation_Percentage, which represents the percentage difference between planned and actual budget performance.
The variables used in the study are presented in Table 1.
Table 1.
Variables used in the analysis
|
Variable |
Description |
Type |
|
Project_ID |
Unique project identifier |
Numerical |
|
Project_Size |
Project size: Small, Medium, Large |
Categorical |
|
Team_Experience_Level |
Experience level of the project team: Low, Medium, High |
Categorical |
|
Estimated_Effort_Hours |
Planned effort required for project execution |
Numerical |
|
Actual_Effort_Hours |
Actual effort spent during project execution |
Numerical |
|
Risk_Score |
General project risk score |
Numerical |
|
Complexity_Level |
Project complexity: Low, Medium, High |
Categorical |
|
Requirement_Stability |
Stability of project requirements: Stable, Moderate, Unstable |
Categorical |
|
Budget_Deviation_Percentage |
Percentage difference between planned and actual budget |
Numerical |
|
Schedule_Deviation_Percentage |
Percentage difference between planned and actual schedule |
Numerical |
|
Defects_Introduced |
Number of defects introduced during execution |
Numerical |
|
ML_Feasibility_Score |
Score reflecting suitability for machine learning analysis |
Numerical |
The dataset was chosen because it contains variables that reflect both project conditions and measurable outcomes, which are commonly used in software risk prediction and project management research [7; 10]. This makes it suitable for analyzing budget overrun not only as an isolated financial result but also as an outcome related to effort, complexity, requirements, and uncertainty. The dataset was checked for missing values and consistency. No missing values were found. Categorical variables were transformed into numerical form for correlation and sensitivity analysis. The following ordinal encoding was applied:
- Project_Size: Small = 1, Medium = 2, Large = 3;
- Team_Experience_Level: Low = 1, Medium = 2, High = 3;
- Complexity_Level: Low = 1, Medium = 2, High = 3;
- Requirement_Stability: Unstable = 1, Moderate = 2, Stable = 3.
This encoding allowed categorical project characteristics to be included in statistical analysis while preserving their ordered meaning.
Spearman correlation was used to evaluate associations between project variables and budget deviation, as it is suitable for ordinal variables and non-linear monotonic relationships. This method was selected because the dataset contains ordinal variables and because the relationships between project factors and budget deviation may not be strictly linear. The correlation analysis was used to identify whether any single variable had a strong relationship with budget deviation.
Conditional probability analysis was applied to estimate how the likelihood of budget overrun changes under different project conditions, following the logic of probabilistic reasoning used in Bayesian approaches [5]. The baseline probability was calculated across all projects. Then, selected scenarios were analyzed, including:
- projects with high technical complexity;
- projects with high complexity and unstable requirements;
- projects with high complexity, unstable requirements, and low team experience;
- projects with low complexity, stable requirements, and high team experience.
For each scenario, the probability of budget overrun was calculated as
|
|
(1) |
where
is the number of projects with budget overrun in a selected scenario, and
is the total number of projects in the same scenario. This approach supports Bayesian-style scenario reasoning by estimating the probability of budget overrun under observed project conditions [5; 7].
Monte Carlo simulation was used to estimate the distribution of possible budget deviation outcomes, following its established use in uncertainty and risk analysis [4; 8]. The simulation was based on empirical resampling from the observed values of Budget_Deviation_Percentage. A total of 10,000 iterations were performed. During each iteration, one value was randomly sampled from the empirical distribution.
The simulation produced the following indicators:
- expected budget deviation;
- median budget deviation;
- standard deviation;
- 95% confidence interval;
- probability of budget overrun;
- probability of high budget overrun.
This simulation approach avoids relying on a single deterministic estimate and instead provides a probability distribution of possible budget outcomes, which is one of the main advantages of simulation-based risk assessment.
Sensitivity analysis was conducted using a Random Forest regression model, with Budget_Deviation_Percentage as the target variable, reflecting the increasing use of data-driven methods in software and project risk prediction [7; 10]. The purpose of this analysis was not to build a final predictive model, but to estimate the relative influence of project variables on budget deviation. Feature importance scores were used to rank the variables according to their contribution to explaining budget deviation.
Results and Discussion
The Spearman correlation analysis showed that the relationships between individual variables and budget deviation are generally weak. The results are presented in Table 2.
Table 2.
Spearman correlation between project variables and budget deviation
|
Variable |
Spearman Correlation |
|
Actual_Effort_Hours |
0.093 |
|
Complexity_Level |
0.089 |
|
Team_Experience_Level |
0.071 |
|
ML_Feasibility_Score |
0.056 |
|
Defects_Introduced |
0.019 |
|
Schedule_Deviation_Percentage |
0.012 |
|
Project_Size |
0.004 |
|
Requirement_Stability |
-0.015 |
|
Risk_Score |
-0.044 |
|
Estimated_Effort_Hours |
-0.060 |
As can be seen from Figure 2, the strongest positive correlation with budget deviation was observed for Actual_Effort_Hours, followed by Complexity_Level and Team_Experience_Level. However, the values are low, which means that budget deviation cannot be explained by a single variable alone. This result supports the need for probabilistic and scenario-based analysis, since budget overrun appears to be shaped by combinations of project conditions rather than one dominant factor [5; 9].
/Khamidulla.files/image005.png)
Figure 2. Correlation heatmap of project variables
Budget deviation by requirement stability is presented in Table 3.
Table 3.
Budget deviation by requirement stability
|
Requirement Stability |
Number of Projects |
Mean Budget Deviation (%) |
Median Budget Deviation (%) |
|
Moderate |
225 |
12.74 |
12.95 |
|
Stable |
223 |
14.17 |
12.60 |
|
Unstable |
137 |
15.52 |
12.95 |
Projects with unstable requirements show the highest mean budget deviation at 15.52%, which is consistent with project management practice where requirement changes often lead to additional analysis, redesign, development, testing, and coordination effort [1; 2].
The conditional probability analysis was conducted to estimate the probability of budget overrun under selected project scenarios. The results are presented in Table 4.
Table 4.
Conditional probability of budget overrun under selected scenarios
|
Scenario |
Conditions |
Number of Projects |
Probability of Budget Overrun |
|
Baseline |
All projects |
585 |
69.40% |
|
Scenario 1 |
High technical complexity |
156 |
75.00% |
|
Scenario 2 |
High technical complexity + unstable requirements |
33 |
72.73% |
|
Scenario 3 |
High technical complexity + unstable requirements + low team experience |
7 |
100.00% |
|
Scenario 4 |
Low complexity + stable requirements + high team experience |
23 |
73.91% |
The baseline probability of budget overrun is 69.40%. Projects with high technical complexity show a higher overrun probability of 75.00%. The scenario combining high technical complexity, unstable requirements, and low team experience shows an overrun probability of 100.00%; however, this result is based on only seven projects and should be interpreted with caution.
These findings show that budget overrun risk increases under certain combinations of project characteristics. Even though individual correlations are weak, conditional scenarios reveal meaningful risk patterns. This is important for project managers because practical risk assessment is often based not on one variable but on combinations of project conditions, which corresponds to Bayesian-style reasoning under uncertainty.
Monte Carlo Simulation Results
Monte Carlo simulation was performed using 10,000 iterations based on empirical resampling of budget deviation values. The simulation results are summarized below in Table 5.
Table 5.
Monte Carlo simulation summary
|
Output Metric |
Value |
|
Number of iterations |
10,000 |
|
Expected budget deviation |
13.96% |
|
Median budget deviation |
12.96% |
|
Standard deviation |
20.03 |
|
95% confidence interval |
[-18.46%; 47.75%] |
|
Probability of budget overrun |
69.79% |
|
Probability of high budget overrun |
32.54% |
As a summary of Monte Carlo simulation, Figure 3 shows that the simulated expected budget deviation is 13.96%, which is very close to the observed dataset mean of 13.93%. The simulated probability of budget overrun is 69.79%, which is also close to the observed value of 69.40%. This confirms that the empirical simulation accurately reflects the budget deviation pattern in the dataset.
The 95% confidence interval ranges from -18.46% to 47.75%, indicating a wide range of possible budget outcomes. This uncertainty range is important for project planning because it shows that projects may finish below budget, but they may also experience significant overruns. A deterministic estimate would not capture this variability, while Monte Carlo simulation provides a distribution of possible outcomes rather than a single fixed prediction [3; 8].
/Khamidulla.files/image006.png)
Figure 3. Monte Carlo simulation histogram of budget deviation
Figure 4 presents the cumulative probability curve for budget deviation. It shows the probability that budget deviation remains below a selected threshold. The probability of exceeding a threshold can be calculated as one minus the cumulative probability value. For example, the probability of high budget overrun above 25% is 32.54%. This information can be used to determine contingency reserves and define acceptable risk thresholds, similar to other simulation-based approaches used in project risk studies [9].
/Khamidulla.files/image007.png)
Figure 4. Cumulative probability curve for budget deviation
The sensitivity analysis in Figure 5 shows that effort-related variables have the greatest influence on budget deviation. Actual effort hours ranked first, followed by estimated effort hours. This result is practically important because budget overrun is often connected to effort underestimation and additional work required during project execution.
The presence of ML_Feasibility_Score and Risk_Score among the top variables suggests that broader project characteristics and data-driven feasibility indicators may also contain useful information for budget risk assessment. Schedule deviation and defects introduced also show moderate influence, indicating that budget performance is related to delivery and quality outcomes.
/Khamidulla.files/image008.png)
Figure 5. Sensitivity analysis of factors influencing budget deviation
The results demonstrate that budget overrun is a frequent and significant issue in the analyzed IT project dataset, supporting the need for systematic project risk analysis. More than two-thirds of the projects exceeded the planned budget, and almost one-third had high budget overrun above 25%. This confirms the need for systematic financial risk assessment in IT project management.
The weak correlation values indicate that budget deviation is not controlled by one dominant variable. This finding is important because it suggests that simple linear interpretation may be insufficient. Budget overrun appears to be a complex outcome influenced by effort, complexity, requirements, schedule behavior, and defects. Therefore, probabilistic and simulation-based methods are more appropriate than single-factor analysis.
The conditional probability results show that project scenarios can provide more meaningful interpretation than isolated correlations. For example, high-complexity projects had a higher probability of budget overrun than the baseline. The combination of high complexity, unstable requirements, and low team experience represented the most severe observed scenario. Although this subgroup was small, it reflects a realistic project management situation where technical difficulty, unclear requirements, and limited team capability occur together.
Monte Carlo simulation provides additional value by showing the distribution of possible budget outcomes and supporting uncertainty-based risk estimation. The expected simulated budget deviation was close to the observed mean, while the confidence interval revealed substantial uncertainty. This means that project managers should not rely only on average expected deviation. Instead, they should consider probability ranges and prepare contingency reserves for unfavorable outcomes.
From a practical perspective, the results can support several management activities. First, project managers can use budget deviation distributions to define contingency reserves. Second, high-risk scenarios can be used as warning patterns during project initiation. Third, sensitivity analysis can guide monitoring priorities, especially regarding effort estimation and actual effort tracking. Finally, probability-based results can improve communication with stakeholders by showing not only expected cost deviation but also the uncertainty range.
Conclusion
This study presents a focused probabilistic analysis of budget overrun risks in IT projects. Unlike broader project risk studies that analyze several outcomes simultaneously, this paper concentrated specifically on budget deviation and cost-related uncertainty. The analysis was based on a dataset of 585 IT projects and included descriptive statistics, correlation analysis, conditional probability estimation, Monte Carlo simulation, and sensitivity analysis.
The results showed that 69.4% of projects experienced budget overrun, and 32.8% had high budget overrun above 25%. The average budget deviation was 13.93%. Monte Carlo simulation estimated the expected budget deviation at 13.96%, with a 95% confidence interval from -18.46% to 47.75%. The simulated probability of budget overrun was 69.79%, confirming that budget overrun is a dominant outcome in the dataset.
The study also showed that individual variables have weak correlations with budget deviation, which means that budget overrun should be treated as a probabilistic and multifactorial outcome. Conditional probability analysis demonstrated that project complexity and combinations of unfavorable project conditions can increase overrun probability. Sensitivity analysis showed that effort-related variables have the strongest influence on budget deviation.
The main contribution of this paper is the development of a focused empirical framework for analyzing budget overrun risk in IT projects. The proposed approach can help project managers estimate uncertainty, compare project scenarios, identify high-risk conditions, and support contingency planning.
This study has several limitations. First, the dataset does not include direct planned and actual budget values; therefore, the analysis relies on budget deviation percentage. Second, some conditional scenarios include small numbers of projects and should be interpreted carefully. Third, the study identifies statistical relationships but does not prove causal effects. Future research may extend this work by using larger datasets, adding real financial indicators, applying dynamic Bayesian Networks, and validating the model in real organizational project environments.
References:
- Project Management Institute, A Guide to the Project Management Body of Knowledge (PMBOK Guide), 7th ed. Newtown Square, PA // USA: Project Management Institute – 2021.
- Standish Group, CHAOS Report 2020: Beyond Infinity // Standish Group International – 2020.
- N. Metropolis and S. Ulam, “The Monte Carlo method,” // Journal of the American Statistical Association, vol. 44, no. 247, pp. 335–341 – 1949.
- D. P. Kroese, T. Taimre, and Z. I. Botev, Handbook of Monte Carlo Methods. // Hoboken, NJ, USA: Wiley – 2011.
- K. A. Guinhouya, P. Marle, and M. Lauras, “Bayesian networks in project management: A scoping review,” // Expert Systems with Applications, vol. 214, Art. no. 119214 – 2023.
- L. Chen, Q. Lu, and D. Han, “A Bayesian-driven Monte Carlo approach for managing construction schedule risks of infrastructures under uncertainty,” // Expert Systems with Applications, vol. 212, Art. no. 118810 – 2023.
- M. H. Mahmud, M. T. H. Nayan, D. M. N. A. Ashir, and M. A. Kabir, “Software risk prediction: Systematic literature review on machine learning techniques,” // Applied Sciences, vol. 12, no. 22, Art. no. 11694 – 2022.
- A. Suhobokov, “Application of Monte Carlo simulation methods in risk management,” // Journal of Business Economics and Management, vol. 8, no. 3, pp. 165–168 – 2007.
- Y. Song and M. Vanhoucke, “Schedule risk analysis for project control with risk interactions,” //Annals of Operations Research – 2025.
- D. S. Adamantiadou and L. Tsironis, “Leveraging artificial intelligence in project management: A systematic review of applications, challenges, and future directions,” // Computers, vol. 14, no. 2, Art. no. 66 – 2025.
/Khamidulla.files/image002.png)