Graduate Student, School of Information Technology and Engineering, Kazakh-British Technical University, Kazakhstan, Almaty
NEURAL NETWORKS FOR COMPUTATIONALLY CHALLENGING PROBLEMS
УДК 004.032.26
ABSTRACT
Many high-value scientific problems are computationally intractable to solve from first principles at the required resolution — yet their inputs and outputs can be recorded at scale using existing high-performance simulations. Neural networks occupy a productive middle ground: once expensive simulation data exists, a comparatively small network can learn the input-to-output mapping and serve as a fast surrogate for future queries. The aim of this work is to demonstrate this on climate downscaling over Kazakhstan — mapping coarse-resolution (~1°) CMIP6 GCM output to ERA5 reanalysis at 0.25° for daily temperature and precipitation, enabling rapid multi-scenario assessment without the computational burden of dynamical Regional Climate Models. The methodology combines three convolutional architectures (CNN-LSTM, CNN-Transformer, CNN-only ablation) trained on three CMIP6 models (MPI-ESM1-2-LR, CESM2, CanESM5) and evaluated against six improvement techniques: quantile mapping, multi-GCM ensemble training, transfer learning, auxiliary atmospheric variables, larger model capacity, and a diffusion decoder. The best temperature result (CNN-LSTM + quantile mapping + auxiliary variables) achieves T nRMSE 34.6% and P r=0.197; the best precipitation result (CNN-LSTM + transfer learning + auxiliary variables) reaches P r=0.204 — a 44% improvement over the single-variable CMIP6 baseline (r=0.142). Training completes in under 3 hours on a single GPU. Applied to SSP2-4.5 and SSP5-8.5 projections through 2100, the framework projects a 4–5°C spring warming in the western Kazakhstan flood basin — high-resolution future information achievable only at prohibitive cost via dynamical methods.
АННОТАЦИЯ
Многие научные задачи высокой ценности вычислительно неразрешимы с первых принципов при требуемом разрешении, однако их входные и выходные данные можно записывать в масштабе с помощью существующих высокопроизводительных симуляций. Нейронные сети занимают продуктивную среднюю позицию: как только дорогостоящие симуляционные данные существуют, сравнительно небольшая сеть может выучить отображение вход-выход и служить быстрым суррогатом для будущих запросов. Цель настоящей работы — продемонстрировать это на задаче климатического даунскейлинга над Казахстаном: отображении грубого (~1°) вывода GCM CMIP6 в реанализ ERA5 с разрешением 0.25° по ежедневной температуре и осадкам, обеспечивая быстрый многосценарный анализ без вычислительных издержек динамических региональных моделей. Методология включает три свёрточные архитектуры (CNN-LSTM, CNN-Transformer, CNN-only), обученные на трёх моделях CMIP6 (MPI-ESM1-2-LR, CESM2, CanESM5) и оцениваемые по шести техникам улучшения: квантильное отображение, ансамблевое обучение на нескольких GCM, перенос обучения, вспомогательные атмосферные переменные, увеличение ёмкости модели и диффузионный декодер. Наилучший результат по температуре (CNN-LSTM + квантильное отображение + вспомогательные переменные) достигает T nRMSE 34.6% и r=0.197 по осадкам; наилучший по осадкам (перенос обучения + вспомогательные переменные) — r=0.204, что на 44% превышает базовый результат (r=0.142). Обучение завершается менее чем за 3 часа на одном GPU. При применении к проекциям SSP2-4.5 и SSP5-8.5 до 2100 года модель прогнозирует потепление весной на 4–5°C в западно-казахстанском бассейне — информация, недостижимая методами динамического даунскейлинга без огромных вычислительных затрат.
Keywords: neural networks, climate downscaling, CMIP6, ERA5, Kazakhstan, CNN-LSTM, Transformer, statistical downscaling, surrogate model
Ключевые слова: нейронные сети, климатический даунскейлинг, CMIP6, ERA5, Казахстан, CNN-LSTM, трансформер, статистический даунскейлинг, суррогатная модель
Introduction
A recurring pattern in computational science is that generating ground-truth data is extremely expensive — requiring large high-performance computing (HPC) clusters, weeks of wall-clock time, and specialised physical solvers — yet once a sufficient corpus of input-output pairs exists, the mapping between them may be learnable by a neural network at a fraction of the cost. This positions neural networks not as replacements for physical simulation, but as surrogates: models that amortise the upfront simulation cost into cheap, repeatable inference [1]. Training a modern large language model itself requires hundreds of thousands of GPUs [25], showing that neural networks are not inherently cheap; the argument for scientific surrogates is more specific — the data already exists from prior HPC runs, the target function is smooth and structured, and the required model capacity is modest.
We test this hypothesis on climate downscaling — a canonical spatiotemporal super-resolution problem in Earth sciences. Global Climate Models (GCMs) from the Coupled Model Intercomparison Project Phase 6 (CMIP6) ensemble [2] produce simulations at coarse resolution (1°–2°, ~100–200 km). ERA5 reanalysis [3] provides a high-resolution (0.25°) target constructed by assimilating observations into a physical model. Bridging these two grids via dynamical Regional Climate Models (e.g. Weather Research and Forecasting (WRF) model [4]) requires weeks of compute per 30-year scenario. Once CMIP6 and ERA5 datasets exist, however, a neural network can learn the coarse-to-fine mapping in a few hours.
ERA5 is a reanalysis: it assimilates real satellite, radiosonde, and surface observations into a physical model, producing the most accurate representation of past climate at 0.25° resolution. Precisely because it is grounded in observations, ERA5 cannot project beyond the present — it has no future. CMIP6, by contrast, is a free-running climate model that can simulate any emission scenario through 2100. The neural network is the bridge: trained on historical CMIP6–ERA5 pairs, it learns fine-scale spatial structure and transfers it to future projections.
The application — daily temperature and precipitation at sub-25 km resolution over Kazakhstan (2.7 million km², complex terrain), validated against 338 Global Historical Climatology Network-Daily (GHCN-Daily) station observations and extended to Shared Socioeconomic Pathway (SSP) future scenarios [45] — carries practical importance for a region where water security depends on mountain snowmelt and where the Spring 2024 flood [5] underlined the need for high-resolution climate projections.
Deep neural networks have been applied as surrogates for fluid dynamics [26], molecular dynamics [27], and weather forecasting [28, 29]. Reichstein et al. surveyed this paradigm across Earth system science [30]. Within climate modelling, Rasp et al. demonstrated that a neural network can replace the entire subgrid physical parameterisation of a GCM at orders-of-magnitude lower cost [31]; Brenowitz and Bretherton extended this to stable prognostic neural parameterisations [32]; Scher showed that large-scale atmospheric circulation of a simplified GCM can be emulated by a deep network [33].
Downscaling methods fall into two broad families: dynamical and statistical [34]. Quantile mapping (QM) and its variants are widely used in operational climate services but treat each grid point independently and cannot generate spatial coherence beyond the input resolution [21]. Thrasher et al. showed QM reliably corrects temperature extremes but can introduce distributional artefacts at the tails [35]. Bias Corrected Spatial Disaggregation (BCSD) relies on fixed climatological analogues that may be invalid under strong future warming [22]. Maraun and Widmann provide the most comprehensive treatment of statistical downscaling limitations, identifying spatial coherence as the primary weakness that deep learning is well-positioned to address [36].
Convolutional Neural Network (CNN)-based downscaling was pioneered by DeepSD [6], which adapted the Super-Resolution CNN (SRCNN) architecture [37] to continental-scale precipitation fields. Sha et al. extended this to temperature downscaling over complex terrain [7]; Pan et al. applied CNNs to precipitation estimation at hourly timescales [8]; Bano-Medina et al. conducted a systematic intercomparison across Coordinated Regional Climate Downscaling Experiment (CORDEX) domains [9]. Leinonen et al. introduced stochastic super-resolution via Generative Adversarial Networks (GANs), showing that GAN-based downscaling recovers precipitation fine-structure statistics that deterministic CNNs smooth over [38].
Shi et al. introduced ConvLSTM [10], which preserves spatial structure during temporal integration and has become a standard backbone for spatiotemporal prediction tasks. Serifi et al. systematically evaluated temporal window length for ERA5 downscaling, finding marginal gains beyond 14-day windows in same-source settings [39] — consistent with our CNN-only ablation. Vaswani et al. introduced the Transformer with scaled multi-head self-attention [11]; Dosovitskiy et al. extended this to visual inputs with Vision Transformers (ViT) [40]. ClimaX pre-trained a Vision Transformer on heterogeneous CMIP6 outputs for downscaling tasks [12]. Quesada-Chacón et al. showed spatial attention U-Net architectures correctly concentrate learned weights on topographically complex areas [41].
Harder et al. introduced physics-constrained downscaling that enforces areal conservation of precipitation mass, reducing unphysical negative values [42]. Harris et al. demonstrated that probabilistic downscaling with a GAN-based generative model improves the reliability of extreme precipitation quantiles, which is particularly important for flood-risk assessment [43].
Regional climate projections for Central Asia rely on the modelling infrastructure provided by CMIP5 [44] and CMIP6 [2], as well as the CORDEX framework [46] for coordinated regional downscaling experiments. Mannig et al. performed the first systematic dynamical downscaling over Central Asia, finding pronounced warming over the Kazakh highlands [13]. Ozturk et al. projected substantial warming under RCP scenarios, but the regional model resolution (50 km) cannot resolve fine-scale orographic features critical for water resource assessment [14]. The ERA-Interim reanalysis [47] was the predecessor to ERA5 and served as the downscaling target in many earlier regional studies; ERA5's substantially improved resolution and archive length make it the current standard. Most recently, Fallah et al. compared dynamical (COSMO-CLM) and CNN-based downscaling over Central Asia, finding that while the dynamical model improves mountain precipitation, the CNN emulator fails to generalise across different driving GCMs [51] — a cross-GCM transferability limitation that our multi-GCM evaluation (Table 6) directly addresses. To our knowledge, no published work has applied CNN-LSTM-based statistical downscaling to CMIP6 output over the Kazakhstani domain with 0.25° daily projections through 2100.
This paper addresses two neural network research questions: (1) When does temporal sequence modelling add value over a purely spatial network? (2) How well does a learned mapping generalise across input distributions? We show the answers are task-dependent and tied to distribution shift between input and target.
Materials and Methods
Data
Three CMIP6 GCMs [2] are used as low-resolution (LR) inputs: MPI-ESM1-2-LR [15] (primary model, T63 spectral grid, ~1.9°), CESM2 [16] (~1° finite-volume, noleap calendar), and CanESM5 [17] (T63 spectral, different convection parameterisation). All are accessed from the Pangeo Google Cloud Storage zarr archive. Two variables are extracted: tas (near-surface air temperature, K → °C) and pr (precipitation flux, kg m⁻² s⁻¹, converted to mm/day by multiplying by 86,400 s day⁻¹, using the standard identity 1 kg m⁻² = 1 mm of liquid water). GCMs are known to systematically overestimate precipitation over continental interiors — a phenomenon documented across arid Central Asia in CMIP6 evaluations [52] — and our dataset confirms this: raw CMIP6 bilinear precipitation nRMSE reaches 1707% relative to ERA5 (Table 4).
ERA5 [3] reanalysis at native 0.25° resolution covers the Kazakhstan domain (39.5°–56.5°N, 45.5°–88.5°E), yielding a 64×168 grid. Its predecessor, ERA-Interim [47], covered only from 1979 at ~80 km resolution; ERA5's higher resolution, improved assimilation system, and extended archive (1940–present) make it the current standard for downscaling target data. The same two variables are used: t2m (2-m temperature) and tp (total daily precipitation, mm/day). ERA5 surface geopotential is converted to elevation by h = z/g where g = 9.81 m s⁻²; the 1° orography serves as an optional second input channel.
GHCN-Daily station observations [18] provide independent ground truth. 338 temperature stations and 143 precipitation stations with ≥30 valid days during the 2023–2025 test period are retained. Each station is matched to the nearest 0.25° model grid cell. Station observations represent point measurements while model grid cells represent 28 km area averages — this representativeness error is expected and accounted for in evaluation.
The 1991–2025 record is split chronologically to prevent data leakage (Table 1). A chronological split is used rather than random sampling because climate variables are temporally autocorrelated.
Table 1.
Dataset splits. Noleap GCMs use 10,950/1,095 days for train/test
|
Split |
Period |
Days |
Purpose |
|
Train |
1991–2020 |
10,958 |
Model training |
|
Validation |
2021–2022 |
730 |
Early stopping |
|
Test |
2023–2025 |
1,096 |
Final evaluation |
Model Architectures
All three architectures share an encoder–temporal–decoder pipeline (Fig. 1). A shared three-block convolutional encoder maps each frame of the 30-day input sequence to a 128-channel feature map: Conv2D(3×3) → ReLU, with channel progression C_in → 64 → 128 → 128. The temporal module (Long Short-Term Memory (LSTM), Transformer, or identity for CNN-only) integrates context across time. A PixelShuffle(4) decoder [19] upsamples 4× from 1° to 0.25°, with last-frame spatial features added before decoding.
/Kurmanov.files/image001.png)
Figure 1. Overview of the CNN-LSTM and CNN-Transformer downscaling architectures. A shared three-block spatial encoder extracts features from the low-resolution input sequence; the temporal module integrates 30-day context; a PixelShuffle decoder upsamples 4× to 0.25°.
CNN-LSTM: Globally average-pooled spatial features (B,T,128) are processed by a 2-layer LSTM (hidden=256). The final hidden state is broadcast to spatial dimensions and added to last-frame spatial features via a 1×1 projection convolution. Total parameters: 3.61M.
CNN-Transformer: Replaces the LSTM with a 2-layer Transformer encoder [11] (8 heads, feedforward=512, pre-norm). Sinusoidal positional encodings are added over the 30-day time axis. The final-day token is combined with last-frame spatial features identically to CNN-LSTM. Total parameters: 3.07M.
CNN-only ablation: Processes only the last day of the input sequence — no temporal modelling. Isolates whether temporal context provides measurable benefit. Total parameters: 2.66M.
Training Objective
All models minimise the Mean Squared Error (MSE) between predicted and ground-truth high-resolution fields in normalised space:
(1)
where H=64, W=168 are the ERA5 spatial dimensions, ŷ_hw is the predicted value at grid cell (h,w), and y_hw is the ERA5 target in normalised space. Temperature is standardised to zero mean and unit variance:
(2)
Raw precipitation is strongly right-skewed, so a log(1+p) transform is applied before normalisation to prevent the MSE loss from being dominated by extreme events:
(3)
where p_hw is in mm/day and the factor of 1000 converts from the ERA5 native unit (m/day). At inference the transform is inverted:
(4)
The max(0,·) clamps physically impossible negative precipitation values.
Table 2.
Training hyperparameters (all models)
|
Hyperparameter |
Value |
|
Optimizer |
Adam (β₁=0.9, β₂=0.999) [20] |
|
Learning rate |
1×10⁻³ |
|
Weight decay |
1×10⁻⁴ |
|
Batch size |
16 |
|
Sequence length |
30 days |
|
Max epochs |
100 |
|
Early stopping |
patience = 15 epochs |
|
LR scheduler |
ReduceLROnPlateau (factor=0.5) |
|
Gradient clipping |
max norm = 1.0 |
|
Hardware |
NVIDIA RTX 4070 (8 GB VRAM) |
Improvement Techniques
Six post-baseline techniques were implemented and evaluated. (i) Quantile mapping (QM): a standard bias-correction technique [21, 22] that aligns the cumulative distribution function (CDF) of CMIP6 input to ERA5 statistics before the neural network, reducing the systematic bias the encoder must correct internally. (ii) Multi-GCM ensemble training: concatenating training sequences from MPI-ESM1-2-LR, CESM2, and CanESM5 (32,771 sequences vs. 10,957 for single-GCM), improving cross-source robustness. (iii) Transfer learning (TL): pre-training the convolutional encoder on the ERA5 self-coarsened task and fine-tuning on CMIP6, with the encoder frozen for the first 15 epochs. (iv) Auxiliary atmospheric variables (Aux): adding sea-level pressure (psl) and near-surface specific humidity (huss) as extra input channels, extending C_in from 1 to 3. (v) Large model variants: encoder base channels 64→96, LSTM hidden 256→384, Transformer 2→3 layers (6.0–8.1M parameters). (vi) Diffusion decoder: a Denoising Diffusion Probabilistic Models (DDPM)-based UNet decoder [23] with T=1000 noise steps and Denoising Diffusion Implicit Models (DDIM) inference [24] with 50 deterministic steps.
Results and Discussion
ERA5 Self-Coarsened Experiment (Upper-Bound Baseline)
Table 3 reports the upper-bound performance when input and target share the same ERA5 data source. All neural models achieve a 3× reduction in temperature root mean square error (RMSE) and 22% reduction in precipitation RMSE over bilinear interpolation. Crucially, CNN-LSTM, CNN-Transformer, and CNN-only perform nearly identically (pairwise RMSE gap <0.006°C), confirming that temporal context adds negligible value when spatial patterns fully determine the target.
Table 3.
ERA5 self-coarsened test results (2023–2025). nRMSE = RMSE/σ_obs×100%. Bold = best
|
Method |
T RMSE (°C) |
T nRMSE (%) |
T r |
P RMSE (mm/d) |
P nRMSE (%) |
P r |
|
CNN-LSTM |
0.448 |
3.4% |
0.9969 |
0.039 |
29.6% |
0.930 |
|
CNN-Transformer |
0.449 |
3.4% |
0.9969 |
0.039 |
29.6% |
0.930 |
|
CNN-only |
0.454 |
3.5% |
0.9968 |
0.039 |
29.6% |
0.930 |
|
Bicubic |
1.208 |
9.2% |
0.9770 |
0.046 |
35.2% |
0.900 |
|
Bilinear |
1.357 |
10.4% |
0.9716 |
0.050 |
38.1% |
0.889 |
/Kurmanov.files/image006.png)
Figure 2. Spatial resolution gain from neural downscaling — temperature (°C), Kazakhstan, July 2023. From left: raw CMIP6 at 1° (coarse pixels); bilinear interpolation to 0.25°; CNN-LSTM downscaled to 0.25° (orographic contrasts recovered); ERA5 reference at 0.25°
/Kurmanov.files/image007.png)
Figure 3. Sample day temperature prediction from the ERA5 self-coarsened experiment. From left: bilinear interpolation; CNN-LSTM; CNN-Transformer; ERA5 ground truth. Neural models recover fine-scale orographic gradients that bilinear interpolation smooths over.
CMIP6 Cross-Model Experiment (MPI-ESM1-2-LR)
Table 4 reports the main experiment. Bilinear interpolation of raw GCM output yields 6.94°C temperature RMSE and 2.24 mm/day precipitation RMSE. All neural models dramatically reduce these: CNN-Transformer achieves the lowest temperature RMSE (4.60°C, 34% reduction), CNN-LSTM achieves the highest correlation (r=0.804), and all neural models reduce precipitation RMSE 19-fold (0.116 mm/day). CNN-only trails by 9–11% in temperature RMSE relative to temporal models (CNN-LSTM: 9.1%, CNN-Transformer: 11.0%), confirming temporal context is critical for cross-model bias correction.
The low precipitation Pearson r (~0.14) for all models is physically expected: a free-running GCM generates its own weather sequences with different timing than observations, so day-to-day storm events do not align between CMIP6 and ERA5. This sets a fundamental ceiling on correlation that no statistical method can overcome without atmospheric nudging [36, 51].
Table 4.
CMIP6 MPI-ESM1-2-LR test results (2023–2025). nRMSE = RMSE/σ_obs×100%. Bold = best.
|
Method |
T RMSE (°C) |
T nRMSE (%) |
T r |
P RMSE (mm/d) |
P nRMSE (%) |
P r |
|
CNN-LSTM |
4.701 |
35.9% |
0.804 |
0.117 |
89.0% |
0.142 |
|
CNN-Transformer |
4.601 |
35.2% |
0.795 |
0.116 |
89.0% |
0.135 |
|
CNN-only |
5.172 |
39.5% |
0.754 |
0.116 |
88.9% |
0.141 |
|
Bicubic |
6.954 |
53.1% |
0.577 |
2.333 |
1783% |
0.065 |
|
Bilinear |
6.941 |
53.0% |
0.576 |
2.235 |
1707% |
0.066 |
/Kurmanov.files/image008.png)
Figure 4. Training and validation MSE curves for the three architectures on the CMIP6 MPI-ESM1-2-LR task (temperature left, precipitation right). All models converge smoothly; early stopping triggers between epochs 27–63 for temperature and 41–97 for precipitation
/Kurmanov.files/image009.png)
Figure 5. Sample day temperature prediction from the CMIP6 MPI-ESM1-2-LR experiment. Both neural models recover orographic structure absent from the coarse input, though day-to-day alignment with ERA5 is limited by the free-running GCM's independent internal weather
/Kurmanov.files/image010.png)
Figure 6. Spatial mean absolute error (MAE) for CMIP6 MPI-ESM1-2-LR temperature downscaling. Bilinear interpolation (left) shows the highest errors. CNN-LSTM and CNN-Transformer (centre and right) substantially reduce errors, with residual error concentrated in the Tian Shan and Altai mountain ranges
Elevation-Aware Models
Table 5 compares elevation-aware models (ERA5 orography as second input channel, C_in=2) against base models. The results are counterintuitive: CNN-only improves (0.454→0.435°C), while temporal models degrade (CNN-LSTM: 0.448→0.599, CNN-Transformer: 0.449→0.534°C). The likely cause is that temporal modules already capture dominant spatial structure through 30-day context, so the added elevation channel competes for encoder capacity rather than complementing it; CNN-only benefits most because static elevation is its primary spatial prior.
Table 5.
Elevation-aware vs. base models (ERA5 self-coarsened, 2023–2025).
|
Model |
Elev. |
T RMSE (°C) |
T r |
P RMSE (mm/d) |
P r |
|
CNN-LSTM |
No |
0.448 |
0.9969 |
0.039 |
0.930 |
|
Yes |
0.599 |
0.9953 |
0.038 |
0.933 |
|
|
CNN-Trans. |
No |
0.449 |
0.9969 |
0.039 |
0.930 |
|
Yes |
0.534 |
0.9959 |
0.038 |
0.935 |
|
|
CNN-only |
No |
0.454 |
0.9968 |
0.039 |
0.930 |
|
Yes |
0.435 |
0.9971 |
0.038 |
0.935 |
Multi-GCM Generalisation: CESM2 and CanESM5
Table 6 compares model performance across three GCMs. CNN-only consistently trails both temporal models across all three GCMs, confirming robust generalisation of the temporal modelling benefit. The relative ranking between CNN-LSTM and CNN-Transformer varies by GCM: CNN-Transformer leads on MPI (4.601 vs. 4.701°C), while CNN-LSTM leads on CESM2 (4.544 vs. 4.602°C) and CanESM5 (4.623 vs. 4.697°C).
Table 6.
Multi-GCM test results (2023–2025). MPI = MPI-ESM1-2-LR, CAN = CanESM5. P nRMSE ≈89% for all neural models.
|
GCM |
Method |
T RMSE (°C) |
T nRMSE (%) |
T r |
P RMSE (mm/d) |
P r |
|
MPI |
CNN-LSTM |
4.701 |
35.9% |
0.804 |
0.117 |
0.142 |
|
CNN-Trans. |
4.601 |
35.2% |
0.795 |
0.116 |
0.135 |
|
|
CNN-only |
5.172 |
39.5% |
0.754 |
0.116 |
0.141 |
|
|
CESM2 |
CNN-LSTM |
4.544 |
34.7% |
0.798 |
0.116 |
0.155 |
|
CNN-Trans. |
4.602 |
35.2% |
0.796 |
0.117 |
0.153 |
|
|
CNN-only |
4.807 |
36.7% |
0.774 |
0.116 |
0.155 |
|
|
CAN |
CNN-LSTM |
4.623 |
35.3% |
0.791 |
0.116 |
0.142 |
|
CNN-Trans. |
4.697 |
35.9% |
0.783 |
0.117 |
0.134 |
|
|
CNN-only |
5.150 |
39.4% |
0.754 |
0.117 |
0.136 |
|
|
— |
Bilinear |
6.941 |
53.0% |
0.576 |
2.235 |
0.066 |
Improvement Techniques
Seven post-baseline variants across six techniques were evaluated on MPI-ESM1-2-LR test data (Table 7). Quantile mapping consistently reduces temperature error across all architectures; the most striking effect is on precipitation — bilinear nRMSE drops from 1707% to 117.6% after QM, confirming that the bulk of precipitation error originates from distributional mismatch rather than spatial resolution. Auxiliary variables (psl + huss) produce the largest precipitation correlation gains: Pearson r increases from 0.135–0.142 to 0.176–0.190. The best temperature result is CNN-LSTM+QM+Aux (T nRMSE 34.6%, P r=0.197); the best precipitation result is CNN-LSTM+TL+Aux and CNN-only+TL+Aux (P r=0.204, a 44% improvement over the single-variable baseline of r=0.142). Large model variants match QM performance for temperature but provide no benefit for precipitation, confirming the bottleneck is input information rather than model capacity.
Table 7.
Improvement techniques. CMIP6 MPI-ESM1-2-LR, 2023–2025. Bold = best per column.
|
Model |
Variant |
T RMSE (°C) |
T nRMSE (%) |
P r |
|
CNN-LSTM |
Baseline |
4.701 |
35.9% |
0.142 |
|
+QM |
4.582 |
35.0% |
0.162 |
|
|
+Ensemble |
4.615 |
35.3% |
0.136 |
|
|
+TL |
4.765 |
36.4% |
0.153 |
|
|
+Aux |
4.664 |
35.6% |
0.186 |
|
|
+Large |
4.642 |
35.5% |
0.136 |
|
|
+QM+Aux |
4.529 |
34.6% |
0.197 |
|
|
+TL+Aux |
4.776 |
36.5% |
0.204 |
|
|
CNN-Trans. |
Baseline |
4.601 |
35.2% |
0.135 |
|
+QM |
4.487 |
34.3% |
0.122 |
|
|
+Ensemble |
4.663 |
35.6% |
0.109 |
|
|
+TL |
4.907 |
37.5% |
0.157 |
|
|
+Aux |
4.895 |
37.4% |
0.176 |
|
|
+Large |
4.488 |
34.3% |
0.128 |
|
|
+QM+Aux |
4.548 |
34.8% |
0.183 |
|
|
+TL+Aux |
4.853 |
37.1% |
0.192 |
|
|
CNN-only |
Baseline |
5.172 |
39.5% |
0.141 |
|
+QM |
5.137 |
39.3% |
0.158 |
|
|
+Ensemble |
5.330 |
40.7% |
0.139 |
|
|
+TL |
5.347 |
40.9% |
0.151 |
|
|
+Aux |
4.940 |
37.8% |
0.190 |
|
|
+Large |
5.303 |
40.5% |
0.137 |
|
|
+QM+Aux |
4.932 |
37.7% |
0.194 |
|
|
+TL+Aux |
5.107 |
39.0% |
0.204 |
|
|
Bilinear |
Baseline |
6.941 |
53.0% |
0.066 |
|
+QM |
6.704 |
51.2% |
0.073 |
/Kurmanov.files/image011.png)
Figure 7. Summary of improvement techniques. Left: temperature nRMSE (%); right: precipitation Pearson r. Dashed red line = bilinear baseline. QM+Aux achieves the best temperature; TL+Aux achieves the best precipitation correlation
/Kurmanov.files/image012.png)
Figure 8. CNN-LSTM sample day predictions for a summer test day (temperature, °C). Top row: bilinear interpolation, CNN-LSTM baseline, +QM. Bottom row: +Aux (psl+huss), +QM+Aux (best combined), ERA5 target. The QM+Aux model recovers fine-scale orographic gradients closest to the target
/Kurmanov.files/image013.png)
Figure 9. Seasonal RMSE breakdown for baseline CMIP6 models. DJF = December–February (winter), MAM = March–May (spring), JJA = June–August (summer), SON = September–November (autumn). Temperature error peaks in winter (DJF); precipitation error is highest in spring (MAM) and summer (JJA)
GHCN Station Validation
The models are trained to produce gridded 0.25° fields, not to predict at point weather stations. GHCN-Daily observations serve as an independent sanity check — each station is compared to the nearest 0.25° grid cell, introducing unavoidable representativeness error. Table 8 confirms the models are well-calibrated: temperature RMSE of 1.247°C vs. 1.273°C for bilinear shows a small but consistent improvement. For precipitation, the models match bilinear closely (r ≈ 0.43–0.47); station-scale precipitation is dominated by mesoscale convective organisation that cannot be reconstructed from a 1° input.
Table 8.
GHCN station validation, 2023–2025
|
Variable |
Method |
n stations |
RMSE |
MAE |
r |
|
T₂ₘ (°C) |
CNN-LSTM |
338 |
1.247 |
0.956 |
0.996 |
|
CNN-Trans. |
338 |
1.247 |
0.956 |
0.996 |
|
|
Bilinear |
338 |
1.273 |
0.977 |
0.996 |
|
|
Precip. (mm/d) |
CNN-LSTM |
143 |
3.873 |
1.409 |
0.435 |
|
CNN-Trans. |
143 |
3.873 |
1.410 |
0.433 |
|
|
Bilinear |
143 |
3.844 |
1.408 |
0.467 |
/Kurmanov.files/image014.png)
Figure 10. GHCN station validation map for temperature. Station locations coloured by per-station RMSE for CNN-LSTM, CNN-Transformer, and Bilinear baselines, test period 2023–2025. RMSE is lowest in flat steppe regions and highest over the Tian Shan mountain range
Case Study: Spring 2024 Flood and Future Projections
The Spring 2024 flood in western Kazakhstan was the largest since 1956, affecting over 100,000 people. Fig. 11 shows precipitation at 0.25° on 26 March 2024 for both ERA5-trained and CMIP6-trained models. ERA5-trained models recover sub-grid flood gradients because the ERA5 coarsened input already carries the real flood signal. CMIP6-trained models show different spatial patterns because CMIP6 is a free-running GCM whose internal weather on this calendar date is unrelated to the actual 2024 event; yet the neural models correctly downscale whatever precipitation their CMIP6 input provides.
/Kurmanov.files/image015.png)
Figure 11. Spring 2024 western Kazakhstan flood (26 March 2024) at 0.25°. Top row — ERA5 self-coarsened: Bilinear | CNN-LSTM | CNN-Transformer. Bottom row — CMIP6→ERA5 main task: same layout. Red box: Ural/Atyrau flood basin. Colour scales independent per row
Fig. 12 directly demonstrates what neural downscaling contributes to future CMIP6 projections — something ERA5 cannot provide because it ends at the present. The figure shows July–August 2095 mean temperature under SSP5-8.5 at three levels of spatial detail. The raw CMIP6 grid assigns a single temperature to each 1° cell, so the Tian Shan mountains, lowland steppe, and river valleys within a cell all receive the same value. Bilinear interpolation smooths the boundaries but cannot generate spatial gradients that are absent in the coarse input. The CNN-LSTM model, trained on historical CMIP6–ERA5 pairs, recovers cool Tian Shan peaks, warm continental steppe, and valley temperature inversions at 0.25° resolution even for 2095 data that no observation dataset will ever cover.
/Kurmanov.files/image016.png)
Figure 12. Future temperature (°C) at three spatial resolutions — July–August 2095 mean, MPI-ESM1-2-LR SSP5-8.5. Left: raw CMIP6 at 1°. Centre: bilinear at 0.25°. Right: CNN-LSTM downscaled at 0.25°.
Fig. 13 compares March–April mean precipitation between the CMIP6-trained CNN-LSTM output and the ERA5 reference over the 2023–2025 test period. The raw CMIP6 bilinear output is omitted because CMIP6 systematically overestimates precipitation by roughly 20× relative to ERA5 (GCM wet bias [52]), which would saturate the colour scale. The CNN-LSTM correctly corrects this wet bias: its seasonal mean output is on the same order of magnitude as ERA5 and reproduces the broad spatial structure of the precipitation field. However, the downscaled seasonal mean is noticeably lower in amplitude than the ERA5 reference, particularly in the orographic belts (Tian Shan, Altai), because CMIP6 is free-running and its day-to-day weather events do not coincide with ERA5 real events — the model therefore predicts climatologically smooth, conservative values that minimise RMSE averaged over many days.
/Kurmanov.files/image017.png)
Figure 13. March–April mean precipitation (2023–2025 test period). Left: CNN-LSTM downscaled at 0.25° (CMIP6-trained, main task). Right: ERA5 high-resolution (HR) reference at 0.25°.
Fig. 14 shows projected spring (March–May) precipitation and temperature trends in the western Kazakhstan flood basin (50–68°E, 47–56.5°N) from MPI-ESM1-2-LR CMIP6 SSP scenarios through 2100. Under SSP5-8.5, spring precipitation increases modestly through mid-century before stabilising, while temperature rises consistently by 4–5°C. Higher spring temperatures accelerate snowmelt — the primary flood driver in this basin — so earlier, more intense snowmelt pulses onto still-frozen ground, increasing peak runoff regardless of whether total precipitation increases [54]. The SSP2-4.5 pathway shows a substantially lower warming trajectory. This analysis is only possible because the CMIP6-trained downscaling model can be applied to future scenarios — ERA5 provides no data beyond 2025.
/Kurmanov.files/image018.png)
Figure 14. Western Kazakhstan flood basin projected spring (March–May) precipitation and temperature, 2026–2100. Thin lines: annual values. Thick lines: 10-year rolling mean. Blue: SSP2-4.5. Red: SSP5-8.5.
Discussion
Dynamical downscaling via WRF [4] is the physically rigorous alternative, requiring approximately 1 week of wall-clock time on an HPC cluster per 30-year scenario [51]. Classical statistical methods (quantile mapping, Bias Corrected Spatial Disaggregation (BCSD) [21, 22]) are fast but operate grid point by grid point and cannot generate spatial structure absent in the coarse input. Neural networks occupy the productive middle ground: the convolutional encoder learns spatially coherent features at O(n) cost, the temporal module corrects GCM biases that carry multi-week structure, and the PixelShuffle decoder recovers sub-grid spatial detail. The result is a method 3–4 orders of magnitude faster than dynamical models, spatially aware in a way that classical statistical methods are not.
Temporal context is task-dependent: all three architectures are statistically indistinguishable in the ERA5 experiment (<0.006°C gap), but temporal models provide clear benefits in the CMIP6 experiment (9–11% relative RMSE reduction) because GCM biases carry temporal structure — persistent seasonal warm/cold anomalies and multi-week circulation regime errors — that temporal context can exploit.
Computational Costs
All experiments were conducted on a single consumer-grade GPU workstation (NVIDIA RTX 4070, 8 GB VRAM; AMD Ryzen 7 CPU; 32 GB RAM). Each architecture was trained separately for temperature and precipitation: training converged in 27–97 epochs with early stopping, taking 2.5–3.5 hours per architecture per variable (~18 hours total for all three architectures and both variables). Inference on the full 2023–2025 test period (1,096 daily maps) completes in under 30 seconds per model. Applying the trained model to a complete 2026–2100 SSP scenario (27,375 daily maps) takes approximately 12 minutes on the same hardware. By comparison, equivalent WRF dynamical downscaling via regional climate models over Central Asia requires approximately 1 week of wall-clock time on an HPC cluster per 30-year simulation [51] — making multi-scenario ensemble analysis computationally infeasible with dynamical methods alone [13, 14].
Required Accuracy for Downstream Applications
The accuracy required from a downscaling model depends on the downstream application. For hydrological snowmelt modelling — the primary driver of spring flooding in Kazakhstan — systematic temperature biases are more damaging than day-to-day random errors: Cho et al. showed that systematic cold biases of −2.8°C shift peak snow water equivalent timing by 36 days and cause substantial runoff underestimation [49]. For reference, state-of-the-art temperature downscaling over the adjacent Chinese Tian Shan — similar high-altitude terrain at comparable elevation — achieves 2.85°C RMSE after bias correction from ERA-Interim reanalysis, compared to 3.75°C without correction [50]. Our best model achieves a domain-wide T RMSE of 4.53°C averaged over all of Kazakhstan (flat steppe and mountains combined) on the CMIP6 cross-model task — comparable to uncorrected ERA-Interim over similar mountain terrain — with the remaining gap explained by the harder task of correcting across GCM physics rather than spatial resolution alone. Critically, our ERA5 self-coarsened experiment achieves 0.45°C RMSE with the same architecture, confirming the dominant error source is the free-running GCM's day-to-day weather mismatch, not spatial resolution. The QM step directly addresses this systematic distributional bias — eliminating the GCM's precipitation nRMSE from 1707% to 118% and recovering the correct seasonal distribution. This is particularly important given Cho et al.'s finding that systematic biases, rather than random day-to-day errors, are the primary driver of snowmelt model uncertainty [49]. For precipitation correlation, station-scale validation shows r ≈ 0.43–0.47 — sufficient for spatial climatological planning but insufficient for event-scale flood forecasting, which would require atmospheric nudging or a data-assimilating reanalysis as input.
Conclusion
We presented CNN-LSTM and CNN-Transformer surrogate networks for statistical downscaling of CMIP6 GCM output over Kazakhstan, mapping daily 1° fields to 0.25° for temperature and precipitation. The experiments show that temporal sequence modelling is task-dependent: CNN-LSTM and CNN-Transformer are indistinguishable from a CNN-only baseline in the same-source ERA5 experiment (<0.006°C gap), but provide 9–11% relative RMSE reduction in the cross-model CMIP6 experiment (CNN-LSTM: 9.1%, CNN-Transformer: 11.0%), where GCM biases carry exploitable temporal structure.
Six post-baseline improvements were evaluated. Quantile mapping reduces the best temperature nRMSE from 35.2% to 34.3% and eliminates the catastrophic precipitation bias in bilinear interpolation (nRMSE 1707% → 118%). Adding sea-level pressure and specific humidity raises precipitation Pearson r from 0.14 to 0.19. Combining QM and auxiliary variables (CNN-LSTM+QM+Aux) achieves T nRMSE 34.6% and P r=0.197; transfer learning with auxiliary variables further improves precipitation to r=0.204 — a 44% improvement over the single-variable baseline. Large models match QM for temperature but provide no precipitation benefit, confirming the bottleneck is input information rather than model capacity.
This is the first CMIP6 downscaling framework for Kazakhstan achieving a 34% reduction in temperature RMSE and 19× reduction in precipitation RMSE over bilinear interpolation, independently validated against 338 GHCN-Daily stations. Applied to SSP2-4.5 and SSP5-8.5 scenarios, the model produces 0.25° daily projections through 2100 and projects a 4–5°C spring warming in the western Kazakhstan flood basin under the high-end scenario, increasing snowmelt-driven flood risk. Kazakhstan's mean annual temperature has risen ~1.5°C since the 1960s [53]; the framework delivers sub-25 km daily fields for the full 21st century at the cost of a single graphics processing unit (GPU) workstation.
Future work should evaluate zero-shot cross-GCM transfer, extend the diffusion decoder with a larger training budget, incorporate wind components as additional forcing variables, and validate against the denser KazHydroMet station network.
References:
- Queipo N.V. et al. Surrogate-based analysis and optimization. // Progress in Aerospace Sciences. – 2005. – Vol. 41. – P. 1–28.
- Eyring V. et al. Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6). // Geoscientific Model Development. – 2016. – Vol. 9. – P. 1937–1958.
- Hersbach H. et al. The ERA5 global reanalysis. // Quarterly Journal of the Royal Meteorological Society. – 2020. – Vol. 146. – P. 1999–2049.
- Skamarock W.C. et al. A Description of the Advanced Research WRF Model Version 4.1. // NCAR Technical Note NCAR/TN-556+STR. – 2019.
- OCHA. Kazakhstan: Floods in Several Regions, Situation Report. // ReliefWeb, UN Office for the Coordination of Humanitarian Affairs. – 29 March 2024. URL: https://reliefweb.int/report/kazakhstan/kazakhstan-flood-03-2024-floods-several-regions-2024-03-29
- Vandal T. et al. DeepSD: Generating high resolution climate change projections through single image super-resolution. // Proceedings of KDD. – 2017.
- Sha Y. et al. Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. // Journal of Applied Meteorology and Climatology. – 2020. – Vol. 59. – P. 2075–2090.
- Pan B. et al. Improving precipitation estimation using convolutional neural network. // Water Resources Research. – 2019. – Vol. 55. – P. 2301–2321.
- Bano-Medina J. et al. Configuration and intercomparison of deep learning neural models for statistical downscaling. // Geoscientific Model Development. – 2020. – Vol. 13. – P. 2109–2124.
- Shi X. et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. // Advances in Neural Information Processing Systems. – 2015. – Vol. 28.
- Vaswani A. et al. Attention is all you need. // Advances in Neural Information Processing Systems. – 2017. – Vol. 30.
- Nguyen T. et al. ClimaX: A foundation model for weather and climate. // Proceedings of ICML. – 2023.
- Mannig B. et al. Dynamical downscaling of climate change in Central Asia. // Global and Planetary Change. – 2013. – Vol. 110. – P. 26–39.
- Ozturk T. et al. Projected changes in temperature and precipitation climatology of Central Asia CORDEX Region 8. // Climate Dynamics. – 2017. – Vol. 49. – P. 3187–3205.
- Mauritsen T. et al. Developments in the MPI-M Earth System Model version 1.2 (MPI-ESM1.2). // Journal of Advances in Modeling Earth Systems. – 2019. – Vol. 11. – P. 998–1038.
- Danabasoglu G. et al. The Community Earth System Model Version 2 (CESM2). // Journal of Advances in Modeling Earth Systems. – 2020. – Vol. 12. – e2019MS001916.
- Swart N.C. et al. The Canadian Earth System Model version 5 (CanESM5.0.3). // Geoscientific Model Development. – 2019. – Vol. 12. – P. 4823–4873.
- Menne M.J. et al. An overview of the Global Historical Climatology Network-Daily database. // Journal of Atmospheric and Oceanic Technology. – 2012. – Vol. 29. – P. 897–910.
- Shi W. et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. // Proceedings of CVPR. – 2016.
- Kingma D.P., Ba J. Adam: A method for stochastic optimization. // International Conference on Learning Representations. – 2015.
- Cannon A.J. et al. Bias correction of GCM precipitation by quantile mapping. // Journal of Climate. – 2015. – Vol. 28. – P. 6938–6959.
- Wood A.W. et al. Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs. // Climatic Change. – 2004. – Vol. 62. – P. 189–216.
- Ho J., Jain A., Abbeel P. Denoising Diffusion Probabilistic Models. // Advances in Neural Information Processing Systems. – 2020. – Vol. 33. – P. 6840–6851.
- Song J., Meng C., Ermon S. Denoising Diffusion Implicit Models. // International Conference on Learning Representations. – 2021.
- Brown T.B. et al. Language models are few-shot learners. // Advances in Neural Information Processing Systems. – 2020. – Vol. 33. – P. 1877–1901.
- Brunton S.L., Noack B.R., Koumoutsakos P. Machine learning for fluid mechanics. // Annual Review of Fluid Mechanics. – 2020. – Vol. 52. – P. 477–508.
- Noe F. et al. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. // Science. – 2019. – Vol. 365. – eaaw1147.
- Bi K. et al. Accurate medium-range global weather forecasting with 3D neural networks. // Nature. – 2023. – Vol. 619. – P. 533–538.
- Lam R. et al. Learning skillful medium-range global weather forecasting. // Science. – 2023. – Vol. 382. – P. 1416–1421.
- Reichstein M. et al. Deep learning and process understanding for data-driven Earth system science. // Nature. – 2019. – Vol. 566. – P. 195–204.
- Rasp S., Pritchard M.S., Gentine P. Deep learning to represent subgrid processes in climate models. // Proceedings of the National Academy of Sciences. – 2018. – Vol. 115. – P. 9684–9689.
- Brenowitz N.D., Bretherton C.S. Prognostic validation of a neural network unified physics parameterization. // Geophysical Research Letters. – 2018. – Vol. 45. – P. 6289–6298.
- Scher S. Toward data-driven weather and climate forecasting: Approximating a simple general circulation model with deep learning. // Geophysical Research Letters. – 2018. – Vol. 45. – P. 12616–12622.
- Fowler H.J., Blenkinsop S., Tebaldi C. Linking climate change modelling to impacts studies: recent advances in downscaling techniques for hydrological modelling. // International Journal of Climatology. – 2007. – Vol. 27. – P. 1547–1578.
- Thrasher B. et al. Bias correcting climate model simulated daily temperature extremes with quantile mapping. // Hydrology and Earth System Sciences. – 2012. – Vol. 16. – P. 3309–3314.
- Maraun D., Widmann M. Statistical Downscaling and Bias Correction for Climate Research. // Cambridge University Press. – 2018.
- Dong C. et al. Learning a deep convolutional network for image super-resolution. // European Conference on Computer Vision (ECCV). – 2014. – P. 184–199.
- Leinonen J. et al. Stochastic super-resolution for downscaling time-evolving atmospheric fields with a generative adversarial network. // IEEE Transactions on Geoscience and Remote Sensing. – 2020. – Vol. 59. – P. 7211–7223.
- Serifi A. et al. Spatio-temporal downscaling of climate data using convolutional and error-predicting neural networks. // Frontiers in Climate. – 2021. – Vol. 3. – P. 656479.
- Dosovitskiy A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. // International Conference on Learning Representations. – 2021.
- Quesada-Chacon D. et al. Repeatable high-resolution statistical downscaling through deep learning. // Geoscientific Model Development. – 2022. – Vol. 15. – P. 7353–7370.
- Harder P. et al. Physics-constrained deep learning for downscaling. // NeurIPS Workshop on Tackling Climate Change with Machine Learning. – 2022.
- Harris L. et al. Generative deep learning for probabilistic precipitation downscaling with accurate uncertainty quantification. // Geoscientific Model Development. – 2022. – Vol. 15. – P. 4177–4194.
- Taylor K.E., Stouffer R.J., Meehl G.A. An overview of CMIP5 and the experiment design. // Bulletin of the American Meteorological Society. – 2012. – Vol. 93. – P. 485–498.
- O'Neill B.C. et al. The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP6. // Geoscientific Model Development. – 2016. – Vol. 9. – P. 3461–3482.
- Giorgi F., Gutowski W.J. Coordinated experiments for projections of regional climate change. // Current Climate Change Reports. – 2015. – Vol. 1. – No. 4. – P. 256–264.
- Dee D.P. et al. The ERA-Interim reanalysis: configuration and performance of the data assimilation system. // Quarterly Journal of the Royal Meteorological Society. – 2011. – Vol. 137. – P. 553–597.
- Sillmann J. et al. Climate extremes indices in the CMIP5 multimodel ensemble. // Journal of Geophysical Research: Atmospheres. – 2013. – Vol. 118. – P. 1716–1733.
- Cho E. et al. Precipitation biases and snow physics limitations drive the uncertainties in macroscale modeled snow water equivalent. // Hydrology and Earth System Sciences. – 2022. – Vol. 26. – P. 5721–5735.
- Gao L. et al. A high-resolution air temperature data set for the Chinese Tian Shan in 1979–2016. // Earth System Science Data. – 2018. – Vol. 10. – P. 2097–2114.
- Fallah B. et al. Climate model downscaling in central Asia: a dynamical and a neural network approach. // Geoscientific Model Development. – 2025. – Vol. 18. – P. 161–180.
- Lei X. et al. Evaluation of CMIP6 models and multi-model ensemble for extreme precipitation over arid Central Asia. // Remote Sensing. – 2023. – Vol. 15. – No. 9. – P. 2376.
- Bayer-Altın T., Sadykova D., Türkeş M. Evolution of long-term trends and variability in air temperatures of Kazakhstan for the period 1963–2020. // Theoretical and Applied Climatology. – 2024. – Vol. 155. – P. 2601–2625.
- Zhang W.X. et al. Central Asian compound flooding in 2024 contributed by climate warming and interannual variability. // Advances in Atmospheric Sciences. – 2025. – Vol. 42. – P. 2195–2202.