AI-DRIVEN OPTIMIZATION IN COMPLEX SYSTEMS

ОПТИМИЗАЦИЯ СЛОЖНЫХ СИСТЕМ С ПОМОЩЬЮ ИСКУССТВЕННОГО ИНТЕЛЛЕКТА

Mikhailiuk M.S.

28.04.2026 301

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Mikhailiuk M.S. AI-DRIVEN OPTIMIZATION IN COMPLEX SYSTEMS // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22601 (дата обращения: 28.07.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.145.4.22601

Статья поступила в редакцию: 10.04.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

Metropolitan infrastructure systems form a single interdependent network: a state change in one subsystem affects others with a measurable time lag. Domain-specific machine learning optimizers ignore these dependencies and cannot close the resulting efficiency gap. This article presents APOSIS, which integrates three components: a topology-aware graph neural network for fault detection; an LSTM probabilistic forecasting module that passes uncertainty estimates to a reinforcement learning controller; and a causal dependency model, implemented as a time-lagged directed acyclic graph, that delivers expected cross-domain consequences of any state change directly to the controller. A 180-day pilot across 847 energy distribution nodes, 12,400 IoT devices, and three smart city subsystems achieved a 93% reduction in fault detection latency, a 64% reduction in load forecast error, and an 18.9 percentage point efficiency gain over the best single-domain optimizer. Ablation analysis identified the causal dependency model as the key differentiating component of the architecture.

АННОТАЦИЯ

Подсистемы умного города образуют единую взаимозависимую сеть: изменение состояния одной из них влияет на остальные с измеримым временны́м лагом. Методы машинного обучения, оптимизирующие каждую подсистему независимо, не учитывают эти зависимости и не устраняют возникающий разрыв в эффективности. В статье представлена система APOSIS, объединяющая три компонента: топологически-осведомлённую графовую нейронную сеть для обнаружения отказов; LSTM-модуль вероятностного прогнозирования нагрузки, передающий оценки неопределённости агенту управления с подкреплением; модель причинно-следственных зависимостей в форме ориентированного ациклического графа, которая вычисляет ожидаемые межсистемные последствия любого изменения состояния и передаёт их агенту в реальном времени. Пилотное развёртывание на 180 суток охватило 847 узлов энергосети, 12 400 IoT-устройств и три подсистемы умного города. Задержка обнаружения отказов снизилась на 93%, ошибка прогнозирования нагрузки — на 64%, интегральная эффективность выросла на 18,9 п.п. относительно лучшего однодоменного оптимизатора. Абляционный анализ идентифицировал модель причинно-следственных зависимостей как ключевой дифференцирующий компонент архитектуры.

Keywords: complex systems optimization, smart infrastructure, graph neural networks, safe reinforcement learning, anomaly detection, probabilistic forecasting.

Ключевые слова: оптимизация сложных систем, умная инфраструктура, графовые нейронные сети, безопасное обучение с подкреплением, обнаружение аномалий, вероятностное прогнозирование.

Introduction

The efficiency gap arising from isolated optimization of causally interdependent infrastructure systems has persisted despite substantial advances in domain-specific machine learning. This gap has persisted despite substantial advances in domain-specific machine learning. Long short-term memory networks demonstrate consistent advantages over classical time-series models for energy load forecasting [4, 6]. Graph neural networks outperform asset-level threshold detectors for fault localization in power and IoT network settings [8, 10]. Safe reinforcement learning provides principled frameworks for deploying learning-based controllers on physical infrastructure where constraint violation carries real operational consequences [3, 12]. Each of these methods performs well within its own domain. None provides a mechanism for propagating the causal implications of a state change in one domain to the actuation policy operating in another, because none was designed for that purpose. Composing them by running separate single-domain optimizers in parallel reproduces the same siloed architecture in machine learning form and does not close the efficiency gap.

A second structural constraint limits the design space for any realistic solution. Metropolitan infrastructure is operated by independent utilities, municipal agencies, and private network operators whose raw operational data cannot be shared across organizational boundaries without violating data-locality regulations. McMahan et al. (2017) established the theoretical basis for communication-efficient federated learning, and subsequent work addressed Byzantine fault tolerance and differential privacy in distributed training settings [1, 9]. Applying these techniques to closed-loop infrastructure control, where actuation decisions must be completed within fifty milliseconds and where a compromised model update poses a physical safety risk, requires a federated architecture specifically engineered for infrastructure constraints, one that prior literature has not described [5].

The present work develops APOSIS to address both problems. The architecture is motivated by three identified failure modes in prior art: the absence of explicit cross-domain causal context in reinforcement learning actuation; the absence of topology-aware propagation modelling in infrastructure fault detection; and the absence of a federated learning framework compatible with the safety and latency requirements of real-time closed-loop control.

Methods

The multi-domain infrastructure optimization problem can be stated formally as follows. Consider a metropolitan infrastructure system comprising K operational domains, indexed k =1,..., K, each characterized by a state vector and a set of controllable actuator states . Domain state transitions satisfy:

where denotes exogenous disturbances and is the causal lag from domain k′ to domain k. The optimization objective is [15]:

where π is a joint policy mapping the full system state to actions, is the per-domain reward function, and encodes per-domain hard safety constraints, including voltage band limits, line thermal ratings, and traffic signal phase timing minima. The cross-domain lag term is structurally absent from any single-domain optimizer, which implicitly treats for all cross-domain pairs. The gap between the optimal value of this joint problem and the sum of per-domain optima under the independence assumption constitutes the efficiency gap that APOSIS is designed to close.

The architecture addresses this problem through six functional layers, illustrated in Figure 1. Raw telemetry from heterogeneous infrastructure sources enters through a protocol-normalized ingestion layer and is projected into a shared 512-dimensional latent space by a pretrained transformer encoder. This shared representation feeds three concurrent inference modules: an LSTM-based probabilistic forecasting module whose uncertainty estimates gate actuation aggressiveness; a graph neural network anomaly detector that models fault propagation through the infrastructure topology; and a reinforcement learning actuation agent that integrates outputs from both. A separate cross-domain causal correlation engine, operating on the shared embeddings in parallel, computes the expected downstream effects of any state change across all connected infrastructure domains and supplies this causal context directly to the actuation agent. A federated learning orchestrator continuously refines the shared model parameters across all participating operators without centralizing raw data. Every actuation directive passes a physics-based feasibility check before being transmitted to physical actuators.

Figure 1. Functional architecture of APOSIS

Infrastructure telemetry arrives at heterogeneous temporal granularities: phasor measurement unit readings at 60 Hz, SCADA polling at one-second intervals, environmental sensor aggregates at five-minute resolution, and meteorological observations at one-hour resolution. The heterogeneous data ingestion layer implements protocol adapters for IEC 61968, IEC 61970, MQTT, CoAP, and OPC-UA, buffering all streams in a time-indexed engine with Apache Kafka ordering guarantees. Domain-specific feature vectors, including electrical phasors, vehicular flow densities, and waste-bin fill levels, occupy incompatible representational spaces. A transformer encoder with twelve attention heads and a 512-dimensional output, pretrained on 4.7 billion infrastructure observation records across energy, IoT, traffic, water, and environmental domains, projects these vectors into a shared latent geometry in which cross-domain thermodynamic, electrical, and behavioral couplings manifest as geometric proximity. The encoder receives no domain-membership labels during pretraining; the clustering of correlated variables from different domains is an emergent property of co-occurrence structure in the training corpus.

The forecasting module employs a long short-term memory network with attention-weighted input gating, trained over a thirty-day sliding window of latent embedding vectors [4]. Forecasts are parameterized as Gaussian mixture models with up to six components, minimizing the negative log-likelihood:

The mixture parameterization serves a specific operational role: the predictive variance of each forecast target is passed to the actuation agent as a gating signal for policy aggressiveness. Under high uncertainty the agent preserves infrastructure margin; under low uncertainty it pursues tighter efficiency targets. This variance-conditioned switching prevents the over-commitment failure mode that deterministic controllers exhibit during demand-regime transitions, when forecast errors are elevated but unquantified [6].

The anomaly detection module operates on a dynamic infrastructure topology graph G = (V,E), in which V indexes registered assets and edges in E encode physical connectivity, logical control dependencies, and empirically learned co-failure correlations updated through Bayesian online learning [10]. Consider three nodes: a photovoltaic inverter , a distribution transformer , and a building HVAC controller . An edge encodes the physical power flow dependency; an edge encodes a logical control dependency learned from historical co-failure events. When the anomaly score exceeds threshold τ, the propagation risk vector signals high fault-transfer probability to v₂ and low probability to . The actuation agent receives this vector before any threshold exceedance appears in the local time series of or , enabling pre-emptive protective switching.

A message-passing graph neural network with gated recurrent units computes, per asset , an anomaly score and a propagation risk vector , whose components quantify fault transfer probability to each topological neighbor. A fault is flagged when exceeds a dynamically adjusted threshold τ calibrated to an operator-specified false positive rate. The propagation risk vector enables pre-emptive protective action at neighbors of a flagged asset before the fault manifests in their own local time series, a capability structurally absent from any detector that processes assets independently without representing the infrastructure topology [8].

The actuation agent employs proximal policy optimization with a digital twin of the managed infrastructure serving as the world model [12]. The policy network maps the concatenation of the current latent state embedding, forecast distribution parameters, and anomaly scores and propagation risk vectors to a probability distribution over the joint action space. Safety is enforced at two levels. At the policy level, updates are bounded by a trust-region constraint on the Kullback-Leibler divergence between successive policy iterations, preventing behavioral drift during deployment. At the interface level, a physics-based feasibility verification module rejects any action whose predicted physical consequences violate the hard constraints , and falls back to a constrained re-query of the policy with the violated constraint encoded in the state input [3]. Offline training draws on synthetic N-2 contingency perturbation scenarios; online refinement uses a replay buffer populated with live operational experience.

The causal dependency model is maintained as a directed acyclic graph over domain state variables, learned via a time-lagged variant of the Peter-Clark algorithm applied to historical multi-domain operational data [14]. The time-lagged extension recovers edges of the form , with lag τ estimated from data, distinguishing instantaneous coupling from delayed causal influence. The graph is retrained on a rolling ninety-day window at structural significance α=0.01 to limit spurious edge inclusion. At runtime, any detected or forecast state change is propagated through the causal graph to compute expected downstream effects across all connected domains; these computed effects are appended to the state embedding received by the actuation agent. This mechanism directly instantiates the term in the joint optimization problem stated above, which single-domain architectures structurally omit.

Each participating operator node in the federated learning orchestrator trains locally for a configurable number of gradient steps and transmits a compressed gradient update produced by top-k sparsification, retaining the 10,000 largest-magnitude components and reducing transmission volume by approximately 97% relative to the full gradient [9]. Gaussian noise is injected prior to transmission, with magnitude calibrated by the moments accountant method to bound per-round privacy loss ε with high-probability guarantee δ under the differential privacy composition theorem [1]. The aggregation server applies Byzantine-robust filtered federated averaging, excluding submissions whose component-wise deviation from the trimmed mean exceeds three standard deviations [2]. Automatic rollback to the previous validated checkpoint is triggered by a statistically significant performance drop detected over a twenty-four-hour monitoring window using a two-tailed paired t-test (p<0.05) on the composite KPI score.

Simulation model coefficients for the digital twin are calibrated through recursive Bayesian estimation assimilating incoming sensor observations. For assets below a minimum historical data threshold, physics-informed twin predictions substitute for LSTM outputs until sufficient operational data accumulates, directly addressing the cold-start problem in data-driven infrastructure optimization [11]. The adversarial input detection module quarantines sensor readings whose Mahalanobis distance from the jointly trained normal-operation distribution exceeds a threshold calibrated to a 0.1% false positive rate, substitutes twin predictions in all downstream computations, and logs quarantine events with cryptographic provenance signatures.

Results

The deployment comprised a metropolitan energy distribution network of 847 nodes, an IoT sensor network of 12,400 devices, and three smart city subsystems, operated continuously for 180 days. The rule-based threshold detector, ARIMA load forecasting model, and per-domain single-variable controllers constituting the baselines ran in parallel on the same telemetry streams during the first thirty days, establishing calibrated baseline performance under statistically equivalent conditions verified by a Kolmogorov-Smirnov test on the distribution of operating conditions (p = 0.41 for load profiles, p = 0.38 for weather conditions).

Fault detection latency decreased from 24.6 seconds to 1.8 seconds, a 93% reduction. The mechanism is traceable to a specific architectural property: the propagation risk vector computed by the anomaly detection module generates an alert at the origin asset while the fault is still propagating through the topology graph, before threshold-exceedance becomes observable at downstream nodes in their local time series. The rule-based baseline, evaluated per-asset against its own measurements, produced alerts only after the anomalous condition had diffused sufficiently to exceed the local threshold, incurring the full propagation lag. Fault localization accuracy improved from 71.2% to 94.7%, reflecting the graph neural network's capacity to distinguish the fault origin from the propagation wavefront through the joint score , structural information unavailable to any per-asset detector [8]. The false positive anomaly rate fell from 3.4% to 0.8%, a 76% reduction attributable to the Bayesian-updated topology graph suppressing sensor noise events that lack topological plausibility.

Load forecast MAPE at a one-hour horizon decreased from 5.8% under ARIMA to 2.1% under the LSTM module, a 64% reduction consistent with published benchmarks for deep sequence models on energy forecasting tasks [6]. The Gaussian mixture uncertainty intervals covered realized demand in all fourteen weather transition events occurring during the deployment period, precisely the conditions under which point-forecast baselines accumulate their largest systematic errors, confirming the practical value of calibrated uncertainty quantification under distribution shift.

The cross-domain efficiency gain over the best single-domain optimizer was 18.9 percentage points. The attribution methodology applied ablation of the causal correlation engine: cross-domain context propagation was disabled while all other APOSIS components remained active, and the resulting performance decrement was measured across the full deployment period. Under ablation, system efficiency fell to within measurement uncertainty of the best single-domain baseline, confirming that the causal correlation engine is the principal differentiating component. Partial attribution within the 18.9 percentage point total, estimated by sequentially re-enabling individual cross-domain causal links and measuring incremental efficiency recovery, allocates approximately 43% to coordinated demand-response and renewable dispatch across the electrical and building energy domains, 31% to coordinated pre-cooling and traffic signal timing adjustments triggered by solar irradiance forecasts, and 26% to predictive maintenance scheduling exploiting correlated failure patterns across IoT and electrical asset domains. These proportions reflect the specific distribution of weather and demand conditions during the 180-day deployment period and are not advanced as stable structural constants generalizable across deployments.

Actuation latency at the 95th percentile was 42 milliseconds, satisfying the 50-millisecond ceiling for voltage regulation commands under IEC 61968. The federated model reached 98.3% of fully centralized accuracy, with the 1.7 percentage point deficit attributable to gradient compression noise and differential privacy injection, a tradeoff that the regulatory infeasibility of centralized training across independent operators makes unavoidable in practice [7]. Energy waste relative to the unmanaged infrastructure baseline decreased by 31.4%, a composite figure integrating improvements across electrical dispatch, building energy management, and traffic-correlated pedestrian infrastructure.

Discussion

The pilot evidence is consistent with the central architectural claim that explicit causal cross-domain context provision is a necessary condition for recovering the efficiency losses that single-domain optimization produces, but the scope of that consistency requires qualification. The deployment covered one metropolitan network, one climate zone, and one 180-day operational period. The distribution of weather, demand patterns, and fault events over this period determined the relative salience of each cross-domain coupling that APOSIS exploited; a deployment in a different climate or on infrastructure with different physical topology would likely produce different partial attributions within the overall gain, even if the total remained comparable.

The computational overhead of maintaining the full architecture warrants explicit acknowledgment. The transformer encoder pretraining at 4.7 billion observation records is a one-time fixed cost that no single metropolitan operator could bear independently, requiring data-sharing arrangements that preceded the technical deployment. The digital twin recalibration through recursive Bayesian estimation and the causal graph retraining on a rolling ninety-day window impose continuous computational demands whose scaling properties above several thousand nodes have not been characterized in this deployment. The architecture's modularity permits component replacement as more computationally efficient alternatives emerge, but the integration cost of such replacements in a live closed-loop system is not negligible.

The Byzantine-robust gradient filter captures both sensor-faulted nodes and unsophisticated adversarial participants; two such nodes were identified and excluded during the pilot. It does not, however, address a patient adversary whose per-round submissions remain individually within the detection threshold while collectively drifting the global model over many federation rounds. The rollback governance provides retrospective correction once cumulative performance degradation becomes statistically detectable, but detection latency is a function of the adversary's poisoning rate, which can be calibrated adversarially. Extending the Byzantine detection mechanism to track gradient distribution drift over temporal windows, rather than per-round deviation from a contemporaneous mean, is a concrete direction for future work [13].

The reinforcement learning policy's long-run stability under infrastructure non-stationarity is the most fundamental open question raised by this work. The trust-region KL-divergence constraint and the digital twin's continuous Bayesian recalibration together address gradual equipment aging and incremental demand shifts. A discontinuous topology change, such as the commissioning of a large-scale generation asset or the decommissioning of a transmission corridor, places the system in a qualitatively different operating regime that neither mechanism is designed to handle. The physics-informed twin provides structural priors for newly commissioned assets, but the policy's learned value estimates for states adjacent to the new asset may require more rapid updating than the trust-region constraint permits. Safe reinforcement learning under abrupt topology change remains an active research problem [3].

Conclusion

The cross-domain efficiency gap in metropolitan infrastructure is a structural consequence of treating causally interdependent systems as independent optimization targets. Closing it requires an architecture that represents time-lagged causal dependencies across domain boundaries, propagates probabilistic forecast uncertainty to the actuation policy, models fault propagation through the infrastructure topology, and satisfies the data-locality and safety constraints that make centralized and unconstrained approaches impractical. The APOSIS architecture implements each of these requirements through components with theoretical grounding in the existing literature, and the pilot deployment provides empirical evidence, supported by ablation analysis, that the causal correlation engine is the necessary differentiating component rather than an incremental refinement.

The open questions identified above, covering long-run policy stability under topology discontinuities, temporal Byzantine detection, computational scaling above several thousand nodes, and the institutional prerequisites for pretraining corpus construction, define the research agenda for subsequent phases of this work.

References:

Abadi M., Chu A., Goodfellow I., McMahan H. B., Mironov I., Talwar K., Zhang L. Deep learning with differential privacy // Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. – 2016. – P. 308-318.
Blanchard P., El Mhamdi E. M., Guerraoui R., Stainer J. Machine learning with adversaries: Byzantine tolerant gradient descent // Advances in Neural Information Processing Systems. – 2017. – Vol. 30. – P. 119-129.
Brunke L., Greeff M., Hall A. W., Yuan Z., Zhou S., Panerati J., Schoellig A. P. Safe learning in robotics: From learning-based control to safe reinforcement learning // Annual Review of Control, Robotics, and Autonomous Systems. – 2022. – Vol. 5. – P. 411-444.
Hochreiter S., Schmidhuber J. Long short-term memory // Neural Computation. – 1997. – Vol. 9, № 8. – P. 1735-1780.
Kazmi S. A. A., Khalid M., Imran M. A., Anpalagan A. Multi-objective optimization for smart cities: A comprehensive review of algorithms, applications, and open challenges // IEEE Transactions on Smart Grid. – 2025. – Vol. 16, № 1. – P. 45-67.
Kong W., Dong Z. Y., Jia Y., Hill D. J., Xu Y., Zhang Y. Short-term residential load forecasting based on LSTM recurrent neural network // IEEE Transactions on Smart Grid. – 2019. – Vol. 10, № 1. – P. 841-851.
Li T., Sahu A. K., Talwalkar A., Smith V. Federated learning: Challenges, methods, and future directions // IEEE Signal Processing Magazine. – 2020. – Vol. 37, № 3. – P. 50-60.
Marfo W., Agyemang E., Diyawu S., Frimpong K., Javed A. R. Enhancing network anomaly detection using graph neural networks on IoT systems // Energy Informatics. – 2024. – Vol. 7, № 1. – Art. 42.
McMahan B., Moore E., Ramage D., Hampson S., Arcas B. A. Communication-efficient learning of deep networks from decentralized data // Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. – 2017. – Vol. 54. – P. 1273-1282.
Protogerou A., Papadopoulos S., Drosou A., Tzovaras D., Refanidis I. A graph neural network method for distributed anomaly detection in IoT // Evolving Systems. – 2021. – Vol. 12, № 1. – P. 19-36.
Raissi M., Perdikaris P., Karniadakis G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations // Journal of Computational Physics. – 2019. – Vol. 378. – P. 686-707.
Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O. Proximal policy optimization algorithms // arXiv preprint. – 2017. – arXiv:1707.06347.
Siniosoglou I., Lagkas T., Sarigiannidis A., Sarigiannidis P. Federated learning models in decentralized critical infrastructure protection // Security and Resilience in Cyber-Physical Systems / Ed. Y. Xiao. – Cham : Springer. – 2024. – P. 211-238.
Spirtes P., Glymour C., Scheines R. Causation, prediction, and search. – Cambridge : MIT Press. – 2000. – 543 p.
Stoyanova M., Monti A. Cross-domain optimization strategies for interdependent urban infrastructure systems // IEEE Transactions on Industrial Informatics. – 2019. – Vol. 15, № 8. – P. 4571-4582.