CASH FLOW FORECASTING USING TIME SERIES ANALYSIS, MACHINE LEARNING AND DEEP LEARNING MODELS: EVIDENCE FROM NEW YORK CITY FINANCIAL DATA

ПРОГНОЗИРОВАНИЕ ДЕНЕЖНОГО ПОТОКА ИСПОЛЬЗУЯ МЕТОДЫ АНАЛИЗОВ ВРЕМЕННЫХ РЯДОВ, МАШИННОГО ОБУЧЕНИЯ И ГЛУБОКОГО ОБУЧЕНИЯ: СВИДЕТЕЛЬСТВО ИЗ ФИНАНСОВЫХ ДАННЫХ ГОРОДА НЬЮ-ЙОРК

Sharmetov N.K. Kartbayev A.Zh.

28.04.2026 158

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Sharmetov N.K., Kartbayev A.Zh. CASH FLOW FORECASTING USING TIME SERIES ANALYSIS, MACHINE LEARNING AND DEEP LEARNING MODELS: EVIDENCE FROM NEW YORK CITY FINANCIAL DATA // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22565 (дата обращения: 28.05.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.145.4.22565

Статья поступила в редакцию: 09.04.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

Accurate cash flow forecasting is a cornerstone of liquidity risk management, yet identifying the optimal model for small, seasonal datasets remains a challenge. This study conducts a comparative performance analysis of four distinct modeling paradigms: Autoregressive (AR), Seasonal Autoregressive Integrated Moving Average (SARIMA), Support Vector Regression (SVR), and a Hybrid LSTM-GRU neural network. Using a monthly cash flow dataset characterized by 3-month seasonality, the models were evaluated on predictive accuracy RMSE, MAE, goodness-of-fit (R2), and computational efficiency. The results indicate a clear hierarchy in model performance. While the SVR model achieved the lowest RMSE (16,096) and the fastest fitting time (0.0011s), the SARIMA(2,0,0)(0,0,1)[3] model emerged as the most effective tool for practical financial application, delivering the highest R2 (0.79) and the lowest MAE (11,688). Despite the increased computational complexity of the Hybrid LSTM-GRU and SARIMA architectures, the findings suggest that for small-scale financial time series, the statistical precision and seasonal-handling capabilities of SARIMA outweigh the marginal speed advantages of machine learning alternatives.

АННОТАЦИЯ

Точное прогнозирование денежных потоков является одним из главных аспектов управления риском ликвидности, однако определение оптимальной модели для небольших сезонных наборов данных остается сложной задачей. Данная работа проводит сравнительный анализ четыре определенных моделей Autoregressive (AR), Seasonal Autoregressive Integrated Moving Average (SARIMA), Support Vector Regression (SVR), и Hybrid LSTM-GRU. Используя набор данных ежемесячных оборотов денег характеризуемая 3 месяцами, модели были оценены по метрикам RMSE, MAE, R2, вычислительная эффективность. Результаты показывают явную иерархию между моделями. Несмотря на то что SVR набрал наименьшие показатели по оценке RMSE(16,096) и наибыстрейшее время для обучения(0.0011 сек), SARIMA является более эффективной моделью, так как у нее лучше показатели по R2 (0.79) и по MAE(11,688). Несмотря на возросшую вычислительную сложность гибридных архитектур LSTM-GRU и SARIMA, полученные результаты свидетельствуют о том, что для небольших финансовых временных рядов статистическая точность и возможности обработки сезонных данных SARIMA перевешивают преимущества машинного обучения в плане скорости.

Keywords: Time Series, Cash Flow Forecasting, ARIMA, Machine Learning, SVR, Deep Learning, LSTM

Ключевые слова: Порядок времен, Прогнозирование Денежных Потоков, ARIMA, Машинное обучение, SVR, Глубокое обучение, LSTM

I. Introduction

Cash flow plays critical role in financial planning of any organization. It reflects the inflows and outflows of funds, which we can use to analyze the company's past, current and forecast future conditions as well. Accurate forecasting of cash flow supports strategic planning and risk management, whereas incorrect estimation may lead to shortages in liquidity and financial resources.

Cash flow is dynamic data, consisting of company's inflows and outflows of cash for specific periods. Hence, most of the receiving and spending from one month transfer to the next, we claim that the cash flow data follows temporal dependence and seasonality. It is also worth noting that results of any period should not be shuffled or swapped places, as data from each month is hugely dependent on the previous[1].

Park et al[2] compared ARIMA, SARIMA, LSTM, SVM models for predicting electricity consumption of different consumers and came to conclusion that SVM outperformed other models . ARIMA, SARIMA could not handle the non linearity and abrupt changes of the data. LSTM showed unstable results, it would have been more feasible for large scale datasets. Yunita et al[3] compared 9 different neural network models, including hybrid models. They evaluated the models with MAE, MAPE, RMSE as well as Monte Carlo based evaluation and found that LSTM-RNN, LSTM-GRU came on top, even though, vanilla RNN was computationally the most efficient.

Datasets used in these studies are mostly of large-scale[4] and not easy to interpret with linear models, just as much as cash flow data. However, the data used for training the models are aggregates of certain period, which leads to having really small sized dataset. Iskandar et al.[5] were motivated to build cash flow forecasting models from dataset of Indonesian government, as Mu et al.[6] argued that developing countries have rather poor cash flow management.

This paper will compare different models ranging from simple linear models, like Autoregression(AR), Seasonal Autoregressission Integrated Moving Average(SARIMA) , machine learning models, like Support Vector Regression (SVR), hybrid neural network, like Long Short-Term Memory - Gated Recurrent Unit (LSTM-GRU) and evaluate them using Mean Absolute Error(MAE), Root Mean Squarred Error(RMSE), R-squared(R^2) for monthly cash flow of New York City.

II. Methodology

A. Data

The dataset used in this study was obtained from the official New York City Open Data platform (data.cityofnewyork.us). The data are publicly available and provided by the Mayor’s Office of Management and Budget (OMB). The dataset contains transaction-level financial records of government cash inflows and outflows. The raw dataset consists of approximately 24,400 individual transaction records. Each observation represents a specific financial transaction associated with government revenues or expenditures. For the purposes of time-series modeling, the transaction-level data were aggregated to a monthly frequency to construct a series of net cash flow values. Net cash flow for each month was computed as the difference between total inflows and total outflows within the corresponding period. Observations labeled as “Adjustments” in the month field were excluded from the analysis, as they do not represent standard monthly financial activity but rather accounting corrections to prior periods. The final dataset consists of monthly net cash flow observations spanning the period from 2021 to 2025. No missing values were identified after aggregation and preprocessing. The resulting time series serves as the basis for subsequent forecasting analysis(Figure1).

Figure 1. Plot of Net Cash Flow of Public Sector NYC 2021-2025

B. Exploratory Data Analysis(EDA)

The goal of the EDA was to examine the fundamental properties of data, as the trend, seasonality, stationarity, and autocorrelation structure to enable the subsequent selection and specification of an appropriate time series model.

The initial dataset comprised monthly observations from January 2021 to December 2026. Crucially, the values for the year 2026 were excluded from the model development process. Consequently, the analysis sample was restricted to the period of actual historical data, from January 2021 to December 2025. For the purpose of model training and testing, this sample was further partitioned chronologically. The training set was defined as the period from January 2021 to December 2024 (48 observations), while the test set was reserved for the final twelve months, from January to December 2025. This split preserves the temporal order of the data and provides a full seasonal cycle for out-of-sample forecast evaluation.

An STL (Seasonal-Trend decomposition using Loess) decomposition was applied to the training data to visually disentangle the observable components of the series. The decomposition yielded three distinct components(Figure2):

Figure 2. STL components of the series

Trend Component: The trend-cycle component revealed a gradual decline from the beginning of the series until approximately January 2024, after which a notable and sustained increase was observed through the end of the training period (December 2024).

Seasonal Component: The seasonal component exhibited a stable and repeating pattern over the observed years, suggesting the presence of a consistent intra-year fluctuation. The amplitude of the seasonal swings appeared relatively constant, indicating an additive seasonal effect.

Residual Component: The residual component fluctuated randomly around zero. An inspection of this component identified two observations that exceeded three standard deviations from the mean, classifying them as potential outliers. These spikes, while notable, were isolated events and did not exhibit a systematic pattern.

The stationarity of the series was formally assessed using the Augmented Dickey-Fuller (ADF) test. The null hypothesis of the test is that the series possesses a unit root (i.e., is non-stationary)[7]. Applied to the original training series (January 2021 – December 2024), the ADF test yielded a p-value significantly below the conventional significance level of 0.05 (p-value < 0.05), leading to the rejection of the null hypothesis. This result provides strong statistical evidence that the monthly net cash flow series is stationary over the training period, eliminating the need for differencing transformations[8].

C. Autocorrelation(ACF) and Partial Autocorrelation Analysis(PACF)

To identify the appropriate orders for a SARIMA model, the ACF and PACF of the stationary training series were examined.

ACF: The ACF plot displayed a gradual, decaying pattern, a hallmark of an AR process. Notably, statistically significant spikes were observed at lags 3, 6, 9, and 12(Figure 3). The recurring significance at multiples of lag 3 provides compelling visual evidence of a seasonal pattern with a period of 3 months (i.e., a quarterly cycle).

Figure 3. ACF

Partial Autocorrelation Function (PACF): The PACF plot exhibited a sharp cutoff after lag 5. Significant spikes were present at lags 1, 2, and 5, after which the coefficients fell within the confidence bands (Figure 4).

Figure 4. PACF

D. Summary of EDA Findings

The exploratory analysis revealed the following key characteristics of the NYC monthly net cash flow data from 2021 to 2024.

The series is stationary (confirmed by ADF test), indicating that an ARMA-class model (with d=0) is appropriate.

The PACF suggests a low-order autoregressive structure, with potential models including AR(2) or AR(5)[9]. The ACF provides strong evidence of a recurring quarterly pattern (period s=3), characterized by significant spikes at lags 3, 6, 9, and 12. The gradual decay of these seasonal spikes in the ACF points towards a seasonal autoregressive component.

E. Modelling

1. SARIMA

These findings collectively suggest that the most suitable framework for modeling this series is a SARIMA model [10]. The model is denoted as . Based on the PACF showing significant spikes at lags 1 and 2, a non-seasonal AR order of p=2 was selected. ACF exhibited a recurring pattern every 3 lags, indicating a seasonal periodicity of s=3.Specification: The final model was specified as SARIMA(2,0,0)(1,0,1)[3]. This configuration captures both the short-term momentum (AR2) and the quarterly seasonal fluctuations (Seasonal AR and MA)[11].

2. SVR

To perform forecasting on monthly cash flow data, the raw time series was transformed into a supervised learning framework using a sliding window approach. Given the significance of short-term dependencies identified in the PACF, a third-order lag configuration p=3 was implemented. For any given time , the input feature vector is defined as:

where represents the historical monthly cash flow in millions of USD. To ensure chronological integrity, the dataset was split into a training set 80% and a test set 20% without shuffling. The inclusion of three lags resulted in the truncation of the initial three observations to maintain a consistent feature space. SVR was selected due to its effectiveness in high-dimensional spaces and its robustness against noise in financial data[12]. The model seeks to find a function that deviates from the actual target by a value no greater than the precision threshold . The decision function is represented as:

where denotes the kernel function. In this study, the Radial Basis Function (RBF) kernel was utilized to capture non-linear temporal dynamics. Because SVR is sensitive to the scale of input features[13], Z-score standardization was applied to both the feature matrix and the target variable. This ensures that the distance-based kernel calculations are not biased by the magnitude of the cash flow values. Hyperparameters, including the regularization parameter , the kernel coefficient , and the tube width , were optimized using a Grid Search cross-validation (GridSearchCV)[14] approach to minimize mean squared error. The software used for utilizing SVR is scikit-learn and final hyperparameters were = 100, =0.5, =0.1, kernel = rbf.

3. LSTM-GRU

To facilitate the training of both machine learning and deep learning models, a sliding window of three months = 3 was utilized to generate the feature matrix. For the deep learning models, the input data was reshaped into a three-dimensional tensor , where is the number of samples, = 3 is the number of time steps (lags), and = 1 is the number of features. Given that Recurrent Neural Networks (RNNs) are sensitive to the magnitude of input values, Min-Max Scaling was applied to squash the cash flow values into the range [0, 1]:

This study proposes a hybrid Recurrent Neural Network (RNN) architecture that combines the strengths of LSTM and GRU. The architecture consists of:

LSTM Layer (30 units)
Dropout Layer (0.2)
GRU Layer (30 units)
Dense Output Layer: A single-neuron layer that maps the hidden states back to a continuous numerical prediction.

The model was compiled using the Adam optimizer and the Mean Squared Error (MSE) loss function. Training was conducted with a batch size of 4 for a maximum of 100 epochs. To prevent over-parameterization and ensure the model generalizes to unseen data, an Early Stopping callback was implemented. This mechanism monitored the validation loss and halted training after 5 consecutive epochs of no improvement, restoring the model weights from the best-performing epoch.

F. Evaluation

Figure 5. RMSE comparison

The predictive accuracy of the models is assessed using both absolute and relative error metrics. RMSE and MAE provide a measure of the error magnitude in the original units, with RMSE offering higher sensitivity to large outliers.

The metric is calculated to quantify the proportion of variance in the monthly cash flow that is predictable from the independent variables, providing a standardized metric for cross-model comparison.

III. Results

The following table and bar graphs are visual comparisons of results of evaluation metrics, done using matplotlib library in Jupyter notebook’s python environment.

Table 1.

Values

Model	RMSE	MAE	R2	Fitting time(s)
AR	31 972	25 300	0.26	0.0011
SARIMA	16 926	11 688	0.79	0.0662
SVR	16 096	14 022	0.78	0.0011
LSTM-GRU	23 735	17 448	0.59	12.3526

Figure 6. MAE comparison

Figure 7. R2 comparison

A. AR & SARIMA

The SARIMA model was subjected to rigorous residual diagnostics to ensure its validity:

Ljung-Box Test: A p-value of 0.58 indicates that we fail to reject the null hypothesis of no autocorrelation in the residuals. This confirms the model has successfully captured all available temporal information.
Jarque-Bera Test: A p-value of 0.65 suggests that the residuals follow a normal distribution, satisfying the classical assumptions for time series forecasting.
Stationarity/Heteroskedasticity: The Prob(H) of 0.56 indicates that the variance of the residuals remains constant over time (homoscedasticity).

The results demonstrate that the SARIMA model significantly outperformed more complex architectures like SVR and LSTM-GRU. The cash flow exhibits a rigid 3-month cycle. SARIMA explicitly models this seasonality through its parameters, whereas SVR and LSTM must "learn" these patterns from a relatively small sample size. Statistical models like SARIMA are highly efficient with small datasets (low "n"). In contrast, deep learning models often require larger datasets to overcome their high variance. The SARIMA model provided a more robust mathematical fit with fewer parameters, avoiding the over-parameterization risks seen in the LSTM-GRU hybrid.

B. SVR

The close proximity between the RMSE and MAE values in SVR suggests that the model is making consistent errors, but no huge bruises. Given that the target variable is measured in millions of USD. The comparison between actual cash flow values and SVR predictions indicates that the model successfully captures the underlying momentum of the series. By utilizing the 3-month lag features, the model demonstrates an ability to adapt to shifts in cash flow trends. Regarding the residual analysis, SVR has effectively extracted the primary "signal" from the noisy financial data. The exceptionally low fitting time (0.0018s) highlights the model's suitability for iterative forecasting environments where rapid re-training is required as new monthly data becomes available.

C. LSTM-GRU

The LSTM-GRU achieved the lower MAE compared to SVR, indicating that its predictions are, on average, closer to the actual cash flow values. This suggests the deep learning model is highly effective at capturing the typical seasonal fluctuations of the series. The higher RMSE suggests that while the model is more accurate on average, it occasionally produces larger residuals when encountering abrupt shifts or "shocks" in the cash flow data. The SVR offers a more "conservative" forecast with fewer extreme misses, whereas the LSTM-GRU provides a more precise "typical" forecast. Hybrid LSTM-GRU required 12.66 seconds to converge, the slowest out of all. While the deep learning approach is significantly more resource-intensive, the training time remains well within the acceptable threshold for monthly financial forecasting cycles. Early Stopping successfully prevented the model from overfitting, as evidenced by the convergence of the training and validation loss curves.

IV. Conclusion

This research examined insights for econometric and machine learning models for monthly cash flow forecasting. The comparative analysis between standard statistical models, machine learning model and hybrid deep learning model yielded several critical insights into model selection for financial small-sample time series. The results highlight a distinct trade-off between SVR and SARIMA. The SVR model demonstrated a slightly higher resilience to outliers, as evidenced by its superior RMSE of 16,096. However, the SARIMA model proved significantly more accurate in capturing the "typical" monthly fluctuation, outperforming the SVR by over 2,300 points in MAE. Given that the objective of municipal or corporate cash flow forecasting is to minimize the average deviation for budget stability, SARIMA’s superior MAE and of 0.79 establish it as the primary model of choice. The study confirms that model complexity does not always correlate with accuracy in low-data environments. The Hybrid LSTM-GRU model, despite its sophisticated gating mechanisms, struggled to generalize as effectively as the statistical models, resulting in an of 0.59. This reinforces the "Parsimony Principle": for datasets with fewer observations and rigid seasonal cycles, traditional econometric models like SARIMA remain more robust than deep learning architectures.

In conclusion, while computational efficiency is a common metric in machine learning, this study argues that in the context of monthly financial reporting, absolute predictive precision is the paramount factor. The SARIMA model is recommended for deployment due to its ability to interpret seasonal signals effectively, providing a reliable framework for risk managers to anticipate liquidity needs with high confidence.

References:

Bondina, Natalia, Igor Bondin, and Irina Pavlova. "Methodological justification and analytical support for cash flow forecasting." (2021): 111-118.
Park, M.-J.; Yang, H.-S. Comparative Study of Time Series Analysis Algorithms Suitable for Short-Term Forecasting in Implementing Demand Response Based on AMI.Sensors2024, 24, 7205. https://doi.org/10.3390/s24227205
Ariana Yunita, MHD Iqbal Pratama, Muhammad Zaki Almuzakki, Hani Ramadhan, Emelia Akashah P. Akhir, Andi Besse Firdausiah Mansur, Ahmad Hoirul Basori, Performance analysis of neural network architectures for time series forecasting: A comparative study of RNN, LSTM, GRU, and hybrid models, MethodsX,Volume 15, 2025, 103462, ISSN 2215-0161, https://doi.org/10.1016/j.mex.2025.103462.
Szostek, K.; Mazur, D.; Drałus, G.; Kusznier, J. Analysis of the Effectiveness of ARIMA, SARIMA, and SVR Models in Time Series Forecasting: A Case Study of Wind Farm Energy Production. Energies 2024, 17, 4803. https://doi.org/10.3390/en17194803
Iskandar I, Willett R, Xu S (2018), "The development of a government cash forecasting model". Journal of Public Budgeting, Accounting & Financial Management, Vol. 30 No. 4 pp. 368–383
Mu, Yibin. "Government cash management: good practice & capacity-building framework." Available at SSRN 918008 (2006).
Chang, Yoosoon, and Joon Y. Park. "On the asymptotics of ADF tests for unit roots." Econometric Reviews 21.4 (2002): 431-447.
Dao, Phong B., Tomasz Barszcz, and Wieslaw J. Staszewski. "Anomaly detection of wind turbines based on stationarity analysis of SCADA data." Renewable Energy 232 (2024): 121076.
Hassani, Hossein, et al. "White noise and its misapplications: Impacts on time series model adequacy and forecasting." Forecasting 7.1 (2025): 8.
Sirisha, Uppala Meena, Manjula C. Belavagi, and Girija Attigeri. "Profit prediction using ARIMA, SARIMA and LSTM models in time series forecasting: A comparison." Ieee Access 10 (2022): 124715-124727.
Szostek, Kamil, et al. "Analysis of the effectiveness of ARIMA, SARIMA, and SVR models in time series forecasting: A case study of wind farm energy production." Energies 17.19 (2024): 4803.
Ampountolas, Apostolos. "Enhancing forecasting accuracy in commodity and financial markets: Insights from garch and svr models." International Journal of Financial Studies 12.3 (2024): 59.
Du, Ke-Lin, et al. "Exploring kernel machines and support vector machines: Principles, techniques, and future directions." Mathematics 12.24 (2024): 3935.
Ahmad, Ghulab Nabi, et al. "Efficient medical diagnosis of human heart diseases using machine learning techniques with and without GridSearchCV." IEEE access 10 (2022): 80151-80