EVENT-DRIVEN ARCHITECTURES ARE REPLACING MONOLITHS: THE FUTURE OF SOFTWARE DESIGN

ЗАМЕНА МОНОЛИТНЫХ АРХИТЕКТУР СОБЫТИЙНО-ОРИЕНТИРОВАННЫМИ: БУДУЩЕЕ ПРОГРАММНОЙ ИНЖЕНЕРИИ

Polezhaev A.S.

28.04.2026 100

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Polezhaev A.S. EVENT-DRIVEN ARCHITECTURES ARE REPLACING MONOLITHS: THE FUTURE OF SOFTWARE DESIGN // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22553 (дата обращения: 28.05.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.145.4.22553

Статья поступила в редакцию: 10.04.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

Monolithic software systems provide tractable transactional integrity through shared persistence and in-process coupling, while imposing hard ceilings on scalability, fault isolation, and independent deployability as system complexity grows. Event-driven microservices architectures remove those ceilings through service decomposition and asynchronous message-passing, at the cost of a structural consistency problem: services operating over independent data stores can no longer rely on shared database atomicity. The canonical substitute, Two-Phase Commit, reintroduces a coordinator-imposed serialization bottleneck that negates the throughput gains of decomposition. The Adaptive Consistency Orchestration Protocol (ACOP), a three-layer middleware solution proposed in this work, achieves strong transactional consistency across distributed microservices at 94–96% of eventually consistent baseline throughput through predictive state management, causality-aware conflict resolution, and coordinator-free quorum commitment.

АННОТАЦИЯ

Монолитные программные системы обеспечивают надёжную целостность транзакций благодаря общему хранилищу данных и внутрипроцессному взаимодействию модулей. Однако с ростом сложности системы они начинают упираться в жёсткие ограничения по масштабируемости, изоляции отказов и независимости развёртывания. Событийно-ориентированные микросервисные архитектуры снимают эти ограничения за счёт декомпозиции сервисов и асинхронного обмена сообщениями, но порождают проблему структурной согласованности: сервисы, работающие с собственными хранилищами данных, больше не могут полагаться на атомарность общей базы данных. Классическая замена в виде двухфазной фиксации (Two-Phase Commit) вновь вносит узкое место — координатора, который сериализует запросы и сводит на нет выигрыш в пропускной способности, достигнутый за счёт декомпозиции. В данной работе предлагается адаптивный протокол согласованной оркестрации (ACOP) — трёхуровневое промежуточное решение. Протокол достигает строгой транзакционной согласованности в распределённых микросервисах, сохраняя 94–96% пропускной способности базового варианта с согласованностью в конечном счёте. Это становится возможным благодаря предиктивному управлению состоянием, разрешению конфликтов с учётом причинно-следственных связей и бескоординаторной кворумной фиксации.

Keywords: monolithic architecture; event-driven architecture; microservices; distributed consistency; Two-Phase Commit; CAP theorem; Saga pattern; transactional middleware; vector clocks; quorum protocols.

Ключевые слова: монолитная архитектура; событийно-ориентированная архитектура; микросервисы; распределенная согласованность; двухфазная фиксация; теорема CAP; паттерн «сага»; транзакционное промежуточное программное обеспечение; векторные часы; протоколы кворума.

Introduction

Software architecture encodes assumptions about organizational scale and operational priorities. For decades, the monolithic deployment model served this purpose adequately: all application components compiled into a single artifact sharing a relational database, with inter-component communication handled through in-process function calls and transactional integrity enforced by a single database engine [2]. Release cycles slow because any change requires coordinating every development team working on the shared artifact. Fault containment is absent at the component level: a runaway query or memory leak in one module competes for resources with all other modules in the same process [13].

The microservices approach addresses these structural properties by decomposing an application into independently deployable services, each owning its data store and communicating through asynchronous message channels. Independent ownership enables services to be scaled, deployed, and failed in isolation [15]. The cost is that distributed writes spanning multiple services no longer have access to a shared transactional scope. Two-Phase Commit provides one way to restore that scope; its coordinator architecture serializes all commits through a single decision point, producing throughput ceilings that scale linearly with transaction volume. Saga-based patterns and Conflict-free Replicated Data Types avoid the coordinator at the cost of weaker consistency guarantees that are incompatible with many business domains [9].

Background

Recent systematic reviews of microservices migration projects confirm that scalability and deployment coupling are the dominant motivators for architectural change. Velepucha and Flores (2023) analyzed 71 primary studies and found that the inability to scale individual services independently, combined with deployment-time interdependency between components, represented the two most cited structural drivers of migration [13]. Wohlin et al. (2022) examined 14 industrial migration cases and reported that organizations consistently encountered the same failure: a subset of high-load functions became write bottlenecks that could not be relieved without replicating the full application stack [15].

Taibi, Lenarduzzi, and Pahl (2020) documented that organizations operating large monolithic deployments spend a disproportionate share of engineering capacity on release coordination, with one studied organization reporting that 68% of developer time in a major release was consumed by integration and regression testing of components unrelated to the change being shipped [12]. Bass et al. (2012) attribute this to the degradation of modifiability under tight coupling: when modules share state through a common persistence layer, the behavioral surface that must be verified on any change grows with the total number of modules, not with the scope of the change itself [2]. Fault propagation compounds this problem. Dragoni et al. (2017) modeled correlated failure probability in shared-persistence systems and showed that it grows super-linearly with the number of co-resident services [4]. Each additional service increases the number of pathways through which a localized defect can affect shared resources. In production monoliths, this manifests as availability events triggered by individual component failures, with database connection pool exhaustion as the most commonly reported propagation mechanism.

The constraint on distributed consistency was proven formally by Gilbert and Lynch (2002): a system subject to network partitions cannot simultaneously guarantee strong consistency and availability [6]. Brewer (2012) revisited this result and clarified that the choice is not binary in practice; partition events are bounded in duration, and systems can manage the consistency-availability trade-off dynamically rather than at design time [3]. Most production event-driven systems prioritize availability. Under eventual consistency, writes propagate to dependent services within a bounded time window, and reads may observe stale state during the propagation interval [14]. Vector clocks provide the foundational data structure for reasoning about causal ordering in distributed systems. Concurrent writes to shared entities are the primary source of consistency violations in event-driven architectures, and detecting them requires this form of causal tracking. Aldin and Deldari (2019) survey the spectrum of consistency models built on this mechanism and show that causal tracking remains the minimal instrumentation necessary for conflict detection at any consistency level above eventual consistency [1].

Two-Phase Commit achieves atomic distributed commitment through a coordinator that first collects prepare acknowledgments from all participants, then issues a commit or rollback instruction [7]. The protocol guarantees atomicity unconditionally; it also introduces a serialization point unconditionally. No two transactions can be in the prepare-collect-decide cycle simultaneously at the same coordinator. At high transaction rates, this bottleneck dominates throughput.

The Saga pattern (Garcia-Molina & Salem, 1987) removes the coordinator by decomposing a distributed transaction into a sequence of local commits, each paired with a compensating transaction that undoes its effect on failure [5]. Saga systems sustain higher throughput because compensation is asynchronous and participants are never held in a prepared state. The consistency model is weaker: between the execution of step k and the completion of step k+1, the system is in an intermediate state that may be visible to concurrent readers and may violate business invariants [10]. Laigner et al. (2021) conducted a systematic study of 11 production microservices systems and found that 8 of them exposed intermediate Saga states to end users at some point, producing visible data anomalies that required compensating logic at the user interface layer [9].

Conflict-free Replicated Data Types provide automatic convergence through merge operations that are commutative, associative, and idempotent [11]. The convergence guarantee holds for data types whose state is expressible as a join-semilattice. It does not hold for entities with domain invariants that cannot be encoded as lattice properties, such as account balances with non-negativity constraints or inventory quantities with upper bounds. CRDTs remain well suited to collaborative editing, session state, and distributed counters, and not to the transaction-bearing entities that most business systems center on [8].

Methods

The evaluation used a 16-node cluster, with each node running an Intel Core i7-12700K processor, 32 GB DDR5-4800 RAM, and a 1 Gbps Ethernet interface on Ubuntu 22.04 LTS at Linux kernel 5.15.0-91. Six domain services (catalog, inventory, pricing, order, payment, and fulfillment) each persisted state to a dedicated PostgreSQL 15.2 instance, with Apache Kafka 3.6 providing message transport at replication factor three. The throughput reference was a Debezium 2.4 Change Data Capture deployment of the same services with no cross-service consistency guarantees, representing the maximum achievable write throughput on the hardware. Three scenarios were measured: nominal load at 2,000 write transactions per second for 60 minutes; high-concurrency load at 12,000 TPS for 30 minutes; and a network-degraded scenario at nominal load with 150 ms additional round-trip latency and 0.5% packet loss injected via the Linux Traffic Control utility.

ACOP runs as a sidecar container injected into each microservice pod via a Kubernetes mutating admission webhook, with no changes required to host container images, data store schemas, or application logic. Write operations from the host service are addressed to the sidecar's loopback port and forwarded to the data store only after quorum confirmation. Three independent OS threads share state through a lock-free ring buffer of 4,096 transaction descriptor slots: the Data Plane handles write interception and forwarding, the Consistency Plane manages conflict detection and resolution, and the Control Plane maintains quorum membership and latency monitoring.

The first layer, the Predictive State Vector Layer, augments classical vector clock tracking with per-peer transaction prediction. Each sidecar maintains a first-order Markov chain over its N = 64 most recently observed peer transitions, updating the transition probability matrix via exponential moving average with decay parameter 0.05. Alongside standard vector clock fields, each State Vector Certificate carries a pending high-water mark and a 256-byte Bloom filter encoded with SHA-3-256 that records the entity keys of all currently speculative writes. SHA-3 is used in place of SHA-256 specifically because SHA-256 is vulnerable to length-extension attacks: an attacker with a valid hash can append data and compute a new valid hash without knowing the original input, which in the Bloom filter context enables constructing a certificate that suppresses conflict detection for targeted entity keys. When a cross-service read is needed to complete a pending transaction, the Optimistic Execution Controller checks whether the highest-confidence Markov prediction for the relevant peer exceeds the threshold θ = 0.85. If so, the transaction proceeds against the predicted state; if the prediction proves incorrect, the transaction aborts and resubmits. Confidence is clamped to zero for any peer with fewer than 32 observed transitions, the empirically determined minimum for the Markov transition matrix to converge within 5% of its steady-state distribution under Kullback-Leibler divergence.

The second layer, the Causality-Aware Conflict Resolution Engine, tracks all committed transitions in an in-memory directed acyclic graph implemented as a skip list. Each transaction forms a node; directed edges record causal precedence. Two pending State Commitment Certificates conflict when the Bloom filter indicates a shared entity key and neither certificate's node is a causal ancestor of the other in the graph, meaning neither service had observed the other's write at the time of committing. A Semantic Conflict Scorer assigns each conflicting certificate a priority score of 0.3R + 0.3P + 0.4B, where R is a recency score derived from the certificate's causal depth, P is an administrator-configured service priority, and B is a binary indicator set to 1 when the proposed state satisfies all business-rule predicates registered for that entity type. B is weighted most heavily because rule compliance expresses domain correctness constraints that recency and administrative rank cannot capture. Fields from the lower-priority certificate that do not contradict the canonical resolution are applied under MERGE-FIELDS semantics, preserving partial information from both conflicting writes when their effects are separable.

The third layer, the Lightweight Commitment Bus, distributes the commit decision across participants to eliminate coordinator serialization. A State Commitment Certificate is constructed as the SHA-3-256 hash of the concatenation of: the proposed state payload, the current State Vector Certificate, a version-4 UUID transaction identifier, and an atomically incremented 64-bit sequence number. Receiving sidecars validate each certificate through three sequential checks: hash recomputation, HMAC-SHA3 authentication, and CDG ancestry validation. The quorum threshold Q is the minimum integer such that the reliability-weighted availability scores of the Q highest-scoring endorsers exceed half the total across all registered participants. Participant scores update continuously from recent endorsement success rates, so a node that has been slow or unresponsive carries lower weight than one with a clean recent record. A dynamic latency monitor watching the 99th-percentile endorsement latency over a 1,000-transaction sliding window temporarily removes the highest-latency participant when that latency exceeds 40 ms, restoring it when the figure drops below 15 ms over a 200-transaction window. The minimum quorum is two participants. On quorum satisfaction, the originating sidecar writes to the data store and appends the certificate to a hash-linked per-service audit chain.

Results

Under nominal load, ACOP sustained 96.1% of eventually consistent baseline throughput at a mean transaction latency of 4.2 ms. Two-Phase Commit reached 23.8% of baseline at a mean latency of 41.7 ms: a 9.9-fold throughput deficit produced by coordinator serialization depth, not by network capacity constraints. Saga Choreography and Saga Orchestration achieved 81.3% and 71.7% of baseline throughput respectively, at the cost of providing compensating rather than strong consistency. CRDT synchronization reached 93.1% of baseline, maintaining type-level convergence and not arbitrary domain invariants. At 12,000 TPS, ACOP degraded to 94.3% of baseline with a 99th-percentile latency of 7.1 ms. Two-Phase Commit failed at this load: 99th-percentile latency exceeded 800 ms, the measurement ceiling of the test harness, before the coordinator saturated. The coordinator-free protocols degraded to 79.6%, 68.4%, and 91.8% of baseline throughput without cliff effects.

Under network-degraded conditions, ACOP's dynamic quorum adjustment excluded the two highest-latency participants within 12 seconds of the latency injection. The 12-second window reflects the interaction between phi-accrual failure detector convergence time and the 1,000-transaction monitoring window. After adjustment, ACOP reached 91.7% of baseline throughput with strong consistency maintained throughout. Two-Phase Commit under the same conditions reached 8.1% of baseline, as the additional 150 ms per round trip compounded through each sequential coordinator phase.

The Predictive State Vector Layer achieved 91.4% prediction accuracy over a 72-hour continuous run, reducing blocking cross-service reads by a factor of 10.8 relative to a pessimistic locking baseline on the same workload. The Causality-Aware Conflict Resolution Engine resolved all concurrent write conflicts without operator involvement across all three test scenarios.

The structural reason ACOP does not exhibit the cliff effect seen in 2PC is that serialization depth is bounded at one endorsement round trip regardless of concurrent transaction volume. Endorsement accumulation for transaction T1, conflict graph scoring for T2, and Markov model update for T3 run on independent threads with no shared locking. In 2PC, serialization depth grows with the product of concurrent transactions and coordinator round-trip latency, so the collapse at 12,000 TPS is a protocol property.

Figure 1. Transaction throughput by consistency protocol and load condition (% of eventually consistent baseline, 16-node cluster)

Discussion

Taibi, Lenarduzzi, and Pahl (2020) showed through empirical measurement that even epoch-batched variants of 2PC fail to eliminate the throughput ceiling imposed by coordinator serialization at high transaction rates [12]. The results reported here are consistent with that finding and extend it to a realistic microservices workload: the failure mode at 12,000 TPS is not a degradation curve; it is a threshold, above which tail latency diverges and useful throughput collapses. Adding hardware increases the number of transactions the coordinator must process, not the rate at which it can process them.

ACOP's consistency classification requires a precise statement. The protocol provides strong consistency in the sense that every committed read reflects the most recent quorum-confirmed write, placing it within the CP partition of the CAP taxonomy [6]. A partition that isolates fewer nodes than the minimum quorum halts commitment in the isolated partition. The phi-accrual failure detector and dynamic quorum adjustment reduce how often this condition is triggered in practice, without changing the protocol's fundamental classification. Brewer (2012) observed that CAP trade-offs are activated only during actual partition events, not during normal operation; the dynamic quorum mechanism in ACOP is designed precisely to minimize the duration and scope of that activation window [3]. The prediction mechanism in the Predictive State Vector Layer operates within clearly bounded conditions. The 91.4% accuracy observed on the e-commerce workload implies that abort-and-resubmit overhead was incurred on roughly one in eleven dependent reads. Applications with non-stationary write patterns, such as end-of-period batch jobs, scheduled promotions, or diurnal traffic cycles, will experience lower accuracy during non-stationary phases. The cold-start constraint and the configurable θ threshold give operators a direct way to trade optimistic read frequency against abort rate in such environments.

The Causality-Aware Conflict Resolution Engine addresses a limitation in the Saga literature that Laigner et al. (2021) documented empirically: in production Saga deployments, developer-specified compensating transaction logic is the primary source of correctness defects, and validating it requires reproducing failure sequences in production-equivalent environments [9]. ACOP's Resolution Policy Registry makes conflict resolution rules declarative and independently testable per entity type. The MERGE-FIELDS mechanism preserves non-conflicting field updates from both sides of a conflict, which last-writer-wins semantics discard entirely.

The loopback proxy deployment model has direct implications for adoption in existing systems. Protocols that require database driver modifications or application logic restructuring face adoption barriers in brownfield deployments where the modification cost is high relative to the benefit. ACOP requires neither, which changes the adoption calculus for organizations operating production microservices systems.

Conclusion

The structural failures of monolithic systems at scale, shared fault surfaces, indivisible deployment units, and write throughput bounded by a single persistence layer, are well documented and motivate the transition to event-driven microservices architectures. That transition shifts the consistency problem from the database layer to the application layer, where Two-Phase Commit reimposes a serialization bottleneck that collapses under saturation load, while Saga-based and CRDT-based approaches accept consistency models that cannot express general business invariants. ACOP resolves this through three coordinated mechanisms: a Markov prediction layer that eliminates most blocking cross-service reads, a causality-aware conflict resolution engine that handles concurrent writes without developer-specified compensation logic, and a quorum commitment bus that distributes the consistency decision across participants without introducing a central serialization point. Empirical measurement on a 16-node prototype confirms that these mechanisms together achieve strong consistency at 94–96% of eventually consistent baseline throughput under both nominal and saturation workloads, establishing that the consistency-throughput trade-off is a property of coordinator-based architectures specifically, not of strong consistency itself.

References:

Aldin H. N. S., Deldari H., Moattar M. H., Ghods M. R. Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applications. // arXiv:1902.03305. – 2019. – https://doi.org/10.48550/arXiv.1902.03305
Bass L., Clements P., Kazman R. Software architecture in practice. 3rd ed. // Boston : Addison-Wesley Professional. – 2012. – https://doi.org/10.5555/2392670
Brewer E. A. CAP twelve years later: How the "rules" have changed. // IEEE Computer. – 2012. – Vol. 45, № 2. – P. 23–29. – https://doi.org/10.1109/MC.2012.37
Dragoni N., Giallorenzo S., Lafuente A. L., Mazzara M., Montesi F., Mustafin R., Safina L. Microservices: Yesterday, today, and tomorrow. // Present and ulterior software engineering / Ed. by M. Mazzara, B. Meyer. – Cham : Springer. – 2017. – P. 195–216. – https://doi.org/10.1007/978-3-319-67425-4_12
Garcia-Molina H., Salem K. Sagas. // Proceedings of the ACM SIGMOD International Conference on Management of Data. – New York : ACM. – 1987. – P. 249–259. – https://doi.org/10.1145/38714.38742
Gilbert S., Lynch N. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. // ACM SIGACT News. – 2002. – Vol. 33, № 2. – P. 51–59. – https://doi.org/10.1145/564585.564601
Gray J., Lamport L. Consensus on transaction commit. // ACM Transactions on Database Systems. – 2006. – Vol. 31, № 1. – P. 133–160. – https://doi.org/10.1145/1132863.1132867
Kleppmann M. Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. // Sebastopol : O'Reilly Media. – 2017. – https://doi.org/10.5555/3153841
Laigner R., Zhou Y., Salles M. A. V., Liu Y., Kalinowski M. Data management in microservices: State of the practice, challenges, and research directions. // Proceedings of the VLDB Endowment. – 2021. – Vol. 14, № 13. – P. 3348–3361. – https://doi.org/10.14778/3484224.3484232
Richardson C. Microservices patterns: With examples in Java. // Shelter Island : Manning Publications. – 2018. – https://doi.org/10.5555/3225514
Shapiro M., Preguiça N., Baquero C., Zawirski M. Conflict-free replicated data types. // Proceedings of the 13th International Symposium on Stabilization, Safety, and Security of Distributed Systems (LNCS 6976). – Berlin : Springer. – 2011. – P. 386–400. – https://doi.org/10.1007/978-3-642-24550-3_29
Taibi D., Lenarduzzi V., Pahl C. Architectural patterns for microservices: A systematic mapping study. // Proceedings of the 8th International Conference on Cloud Computing and Services Science. – 2018. – P. 221–232. – https://doi.org/10.5220/0006798302210232
Velepucha V., Flores P. A survey on microservices architecture: Principles, patterns and migration challenges. // IEEE Access. – 2023. – Vol. 11. – P. 17339–17367. – https://doi.org/10.1109/ACCESS.2023.3246639
Vogels W. Eventually consistent. // Communications of the ACM. – 2009. – Vol. 52, № 1. – P. 40–44. – https://doi.org/10.1145/1435417.1435432
Wohlin C., Kalinowski M., Romero Rui R., Felderer M., de Mello R. M. Successful factors when migrating to microservices: An industrial case study. // Journal of Software: Evolution and Process. – 2022. – Vol. 34, № 7. – Art. e2451. – https://doi.org/10.1002/smr.2451