PREDICTING SATISFACTION LEVELS IN BANKING SERVICES USING MACHINE LEARNING MODELS

ПРОГНОЗИРОВАНИЕ УРОВНЯ УДОВЛЕТВОРЕННОСТИ БАНКОВСКИМИ УСЛУГАМИ С ИСПОЛЬЗОВАНИЕМ МОДЕЛЕЙ МАШИННОГО ОБУЧЕНИЯ

Rahimov M.E.

28.04.2026 270

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Rahimov M.E. PREDICTING SATISFACTION LEVELS IN BANKING SERVICES USING MACHINE LEARNING MODELS // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22378 (дата обращения: 28.07.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.145.4.22378

Статья поступила в редакцию: 14.03.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

In the banking industry, customer happiness is a measure of client relationships and service quality. The usefulness of machine learning regression models for forecasting customer satisfaction based on demographic, financial, and service-related variables is examined in this study. Three regression models—Random Forest, XGBoost, and Linear Regression—were used. Mean Absolute Error, Root Mean Squared Error, and the coefficient of determination were used to assess the models. The results of linear regression were an R² value of 0.70, an RMSE of 9.22, and an MAE of 7.38. With an MAE of 5.53, an RMSE of 7.88, and an R² value of 0.78, Random Forest enhanced the outcomes. With an MAE of 5.52, an RMSE of 7.71, and an R² value of 0.79, the XGBoost model produced the greatest results. The results demonstrate that machine learning models can predict customer satisfaction and provide useful insights for improving decision making in banking services.

АННОТАЦИЯ

В банковской сфере удовлетворённость клиентов является важным показателем качества обслуживания и уровня взаимоотношений с клиентами. В данном исследовании рассматривается эффективность моделей регрессии машинного обучения для прогнозирования удовлетворённости клиентов на основе демографических, финансовых и сервисных факторов. Были использованы три модели регрессии — Random Forest, XGBoost и линейная регрессия.

Оценка моделей проводилась с использованием показателей Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) и коэффициента детерминации (R²). Результаты линейной регрессии составили: R² = 0.70, RMSE = 9.22 и MAE = 7.38. Модель Random Forest улучшила результаты, показав MAE = 5.53, RMSE = 7.88 и R² = 0.78. Наилучшие результаты продемонстрировала модель XGBoost с показателями MAE = 5.52, RMSE = 7.71 и R² = 0.79.

Полученные результаты показывают, что модели машинного обучения способны эффективно прогнозировать удовлетворённость клиентов и предоставляют полезную информацию для улучшения процессов принятия решений в банковских услугах.

Keywords: Customer satisfaction, banking industry, machine learning, regression models, XGBoost, Random Forest, prediction.

Ключевые слова: Удовлетворённость клиентов, банковская отрасль, машинное обучение, регрессионные модели, XGBoost, Random Forest, прогнозирование.

Introduction

Customer satisfaction is crucial in today's banking environment for assessing service quality and fostering enduring bonds between financial organizations and their customers. The quantity of customer-related data that is available for analysis has greatly expanded due to the quick development of digital banking services and financial technologies. These data, which can offer important insights into customer behavior and service quality assessment, include financial indicators, demographic traits, and service evaluation measures [1].

The banking industry has made extensive use of machine learning techniques in recent years to solve a variety of analytical tasks, including credit risk assessment, fraud detection, customer churn analysis, and service quality rating. Machine learning algorithms are better at finding complex and nonlinear correlations between variables than traditional statistical methods, which enhances the precision of predictive models and facilitates data-driven decision-making processes in financial institutions [2].

Because it enables banks to better understand the elements that affect customer experience and enhance the quality of financial services, predicting consumer satisfaction levels in banking services is a crucial responsibility. Financial companies can identify possible issues with service delivery, improve client contact tactics, and boost overall service efficiency by accurately predicting satisfaction metrics [3].

This study aims to evaluate how well machine learning regression models predict customer satisfaction with banking services. Using widely used regression evaluation criteria, the paper examines a number of machine learning techniques and assesses their prediction effectiveness.

Materials and methods

A dataset of bank customers' demographic, financial, and service-related data was used for the study. A number of variables that characterize consumer attributes and service evaluation metrics are included in the dataset. The customer satisfaction score is the study's aim variable.

Data preprocessing was done during the study's initial phase. This procedure involved looking for missing values in the dataset, spotting any discrepancies, and getting the data ready for analysis.

In order to forecast customer satisfaction levels in banking services, a number of machine learning regression models were used in the second stage. Extreme Gradient Among the models considered in the paper are Boosting (XGBoost), Random Forest

Regression, and Linear Regression.

One of the most popular statistical models for simulating the relationships between dependent and independent variables is linear regression, which is regularly employed in jobs involving predictive analysis [4].

By combining the results of several decision trees, Random Forest, an ensemble learning technique, increases prediction accuracy [5].

An sophisticated boosting approach called Extreme Gradient Boosting (XGBoost) creates sequential decision trees and uses gradient boosting techniques to maximize model performance [3].

Several widely used regression evaluation measures were employed to assess the predictive performance of the regression models.

The Mean Absolute Error (MAE) measures the average absolute difference between the predicted and actual values of the target variable [6]. The formula for this metric is given in (1):

(1)

where represents the actual value, represents the predicted value and is the number of observations.

The Root Mean Squared Error (RMSE) is used to measure the square root of the average squared differences between predicted and actual values [6]. The formula for this metric is given in (2):

(2)

This metric penalizes larger prediction errors more strongly compared to MAE.

The percentage of the dependent variable's variation that can be accounted for by the model's independent variables is indicated by the coefficient of determination R² [7].

The formula for this metric is given in (3):

(3)

where represents the mean value of the observed data.

Results and discussions

In order to forecast customer satisfaction levels, this study assesses the predictive ability of three regression models: XGBoost Regression, Random Forest Regression, and Linear Regression. Three commonly used regression evaluation metrics—mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination (R²)—were employed to evaluate the models. Table 1 displays the comparison analysis's findings.

Table 1.

Performance Comparison of Regression Models

Models	MAE	RMSE	R²
Linear Regression	7.38	9.22	0.70
XGBoost Regression	5.52	7.71	0.79
Random Forest Regression	5.53	7.88	0.78

The baseline model, Linear Regression, has an R² value of 0.70, an MAE of 7.38, and an RMSE of 9.22. These findings show that a significant amount of the link between the predictor factors and the target variable can be captured by the linear model. The dataset's underlying relationships could not be entirely linear, though, based on the comparatively higher error values. Consequently, complicated interactions between the variables may be difficult for the linear model to capture.

Two ensemble tree-based models, Random Forest Regression and XGBoost Regression, were used to overcome this restriction. Improved prediction performance is frequently the result of these models' ability to capture complicated interactions and nonlinear correlations between predictors.

The Random Forest Regression model produced an R² value of 0.78, an RMSE of 7.88, and an MAE of 5.53. The Random Forest model greatly decreased the prediction error as compared to Linear Regression. In particular, the RMSE dropped by more than 1.3 points, whilst the MAE dropped by around 1.85 points. Furthermore, the R² value rose from 0.70 to 0.78, suggesting that a greater percentage of the target variable's variance can be explained by the model. These enhancements imply that the data contains nonlinear patterns that ensemble tree-based techniques can successfully identify.

Strong predictive performance was also shown using the XGBoost Regression model. Out of all the models that were assessed, this one had the lowest RMSE value of 7.71 and the lowest MAE value of 5.52. Additionally, it had the strongest overall explanatory power with the highest R² value of 0.79. These findings demonstrate that XGBoost makes the most accurate forecasts in this investigation. The gradient boosting framework of XGBoost, which constructs trees consecutively to reduce prediction errors and increase model accuracy, is responsible for its outstanding performance.

Even though Random Forest performed marginally worse than XGBoost, there is not much of a difference between the two ensemble models. There is barely a 0.01 difference between the two models' MAE and R² values. This suggests that for the specified dataset, both ensemble techniques offer remarkably similar forecasting skills. These kinds of outcomes are typical in machine learning research, where several ensemble tree models frequently produce equivalent results when the dataset size and structure are moderate.

Overall, the findings unequivocally show that ensemble-based models predict customer satisfaction scores more accurately than the conventional Linear Regression model. The improvement in R² values and error metrics attests to the significance of nonlinear interactions between variables in the prediction job. As a result, sophisticated ensemble methods like Random Forest and XGBoost are more suited for simulating intricate patterns in consumer behavior data.

XGBoost Regression was the most successful model for the supplied dataset out of all the models that were tested since it had the best prediction performance. Nonetheless, the comparatively modest performance gap between Random Forest and XGBoost also implies that both models can be regarded as trustworthy methods for tasks involving the prediction of consumer happiness.

Conclusion

This study evaluated the performance of three regression models for predicting customer satisfaction in the banking sector. The models were assessed using MAE, RMSE and R² metrics. The results show that ensemble based models outperform the traditional linear regression approach.

The results of linear regression were an R² value of 0.70, an RMSE of 9.22, and an MAE of 7.38. With an MAE of 5.53, an RMSE of 7.88, and an R² value of 0.78, the Random Forest model enhanced the outcomes. The XGBoost model produced the best results, with an MAE of 5.52, an RMSE of 7.71, and a R² value of 0.79.
The findings show that machine learning models can help with decision-making in the banking industry and are useful for forecasting client happiness.

References:

Sreejesh S., Paul J., Strong C., Pius J. Integrated banking channel service quality (IBCSQ): Role of relationship quality and brand equity // Journal of Retailing and Consumer Services. 2024. Vol. 76. Article 103567. DOI: https://doi.org/10.1016/j.jretconser.2023.103567.
Makridakis S., Spiliotis E., Assimakopoulos V. Statistical and machine learning forecasting methods: Concerns and ways forward // PLOS ONE. 2018. Vol. 13(3). DOI: https://doi.org/10.1371/journal.pone.0194889.
Chen T., Guestrin C. XGBoost: A scalable tree boosting system // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. P. 785–794. DOI: https://doi.org/10.1145/2939672.2939785.
Roustaei N. Application and interpretation of linear-regression analysis // Medical Hypothesis, Discovery and Innovation in Ophthalmology. 2024. Vol. 13(3). P. 151–159. DOI: https://doi.org/10.51329/mehdiophthal1506.
Breiman L. Random forests // Machine Learning. 2001. Vol. 45(1). P. 5–32. DOI: https://doi.org/10.1023/A:1010933404324.
Willmott C.J., Matsuura K. Advantages of the mean absolute error over the root mean square error in assessing average model performance // Climate Research. 2005. Vol. 30. P. 79–82. DOI: https://doi.org/10.3354/cr030079.
Botchkarev A. Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology // Interdisciplinary Journal of Information, Knowledge, and Management. 2019. Vol. 14. P. 45–76. DOI: https://doi.org/10.28945/4184.

Информация об авторах