Master's Student, School of Information Technology and Engineering, Kazakh-British Technical University (KBTU), Kazakhstan, Almaty
COMPARATIVE ANALYSIS OF MARKOV CHAIN AND MACHINE LEARNING APPROACHES FOR CYBERATTACK DETECTION: A STUDY ON SQL INJECTION AND DDOS ATTACKS
ABSTRACT
Automatic detection of cyberattacks needs to be accurate‚ fast‚ and interpretable․ In this paper‚ we investigate if Markov chain models‚ a probabilistic model built by exploring sequential patterns‚ can be competitive and more interpretable than traditional machine learning models for detecting cyberattacks․ We perform a comparative study of first‚ second‚ and third-order Markov chain models based on sequential data and six classical classifiers (Naive Bayes‚ Logistic Regression‚ Decision Tree‚ Random Forest‚ XGBoost‚ CatBoost) based on aggregated data on two fundamentally different attack types: an SQL injection based on sequences within query syntax and a DDoS based on statistical anomalies in aggregated traffic․ Empirical results on public benchmark datasets confirm our hypothesis: Markov chains achieve good performance on SQL injection attacks‚ but not DDoS attacks‚ where the input has no temporal dependency․ We propose hybrid architectures and an explainability method that identifies suspicious token transitions in queries classified as attacks․
АННОТАЦИЯ
Автоматическое обнаружение кибератак должно быть точным, быстрым и интерпретируемым. В данной работе мы исследуем, могут ли модели цепей Маркова – вероятностные модели, построенные на основе анализа последовательных шаблонов, – быть конкурентоспособными и более интерпретируемыми, чем более распространенные модели машинного обучения для обнаружения кибератак. Проводится сравнительное исследование моделей цепей Маркова первого, второго и третьего порядка, основанных на последовательных данных, и шести классических классификаторов (наивный байесовский классификатор, логистическая регрессия, дерево решений, случайный лес, XGBoost, CatBoost), основанных на агрегированных данных. Исследуются два принципиально разных типов атак: SQL-инъекции, основанные на последовательностях в синтаксисе запроса, и DDoS-атаки, основанные на статистических аномалиях в агрегированном трафике. Эмпирические результаты на общедоступных эталонных наборах данных подтверждают нашу гипотезу: цепи Маркова демонстрируют хорошую производительность при атаках SQL-инъекций, но не при DDoS-атаках, где входные данные не имеют временной зависимости. Мы предлагаем гибридные архитектуры, а также метод объяснимости, который выявляет подозрительные переходы токенов в запросах, классифицируемых как атаки.
Keywords: Markov chains, SQL injection detection, DDoS detection, intrusion detection, machine learning, cybersecurity, explainability.
Ключевые слова: цепи Маркова, обнаружение SQL-инъекций, обнаружение DDoS-атак, обнаружение вторжений, машинное обучение, кибербезопасность, объяснимость.
Introduction
In contemporary cyber security‚ the number of attacks is growing‚ and the associated costs and disruptions are high‚ with threats posed to critical infrastructures‚ intellectual property‚ and personal privacy [1]․ SQLi (SQL Injection)‚ as one of the top web application security vulnerabilities‚ is a type of attack that allows an opponent to inject arbitrary SQL code into a database query via improperly validated input [2; 3] to bypass authentication‚ exfiltrate sensitive data‚ and potentially compromise a system․ In contrast to DDoS (Distributed Denial-of-Service) attacks‚ SQL injection attacks rely on the ordering and construction of characters in a specific syntactic order which respectively results in semantically valid SQL statements [2; 4]․ DDoS detection therefore segregates the traffic of volumetric attack from benign traffic by examining the statistics on flows that include the packet rates‚ byte counts‚ flow durations‚ and protocol distributions‚ rather than examining the character sequences in packets as done in SQL injection detection [5]․
Unlike signature-based intrusion detection‚ machine learning can detect general malicious patterns․ Cybersecurity applications using machine learning have obtained a higher-than-95% detection rate using ensemble classifiers such as Random Forest‚ XGBoost and CatBoost‚ depending on the features being extracted from the data [4; 6; 7]․
One challenge to developing machine learning models to detect cyberattacks is that most models act as "black boxes" [8]․ They predict a label, but they cannot explain the reason behind the prediction. The solution to this problem is to embed explainability. There are many ways to incorporate explainability into intrusion detection systems. SHAP, or SHapley Additive exPlanations, evaluates the contribution of each feature by utilizing game theory [9]. LIME (Local Interpretable Model-Agnostic Explanations) generates human-interpretable explanations by building a simple model for each individual prediction [9]․ Attention mechanisms display which tokens within a chunk of data were most influential to the model's output‚ by evaluating which tokens had more influence than others [10]․
However, these methods share a common limitation – they act as add-ons, the explainability features are applied after training by acting external to the model’s internal logic. In short, they don’t make models more transparent. So we incorporate the Markov chain to tackle this limitation, as the Markov chain offers the transition probability matrix that can provide an intrinsic explainability [8]. Each state transition presents an event that can be measured and interpreted. This property makes it possible to trace a classification decision back to specific token sequences or traffic state progressions, all without any additional instrumentation.
But interpretability is not the only leverage of Markov chains. Markov chains have computational simplicity, meaning no iterative training loop nor hyperparameter search, no risk of non-convergence as well [11]. Training them is a single pass of counting and normalizing. This makes Markov chains deployable in the environments where LSTMs and Transformers are impractical. For example, in edge devices, IoT sensors, real-time network monitors, etc. [12]. Another advantage of Markov chains is their theoretical transparency. They make explicit assumptions, stating that the future depends only on the present state, or the last n states for order-n [13]. This allows us to reason whether the assumptions we got hold true to our data. Neural sequential models are non-transparent in this regard [14].
The aim of this study is to evaluate whether Markov chain models can serve as a competitive and interpretable alternative to classical ML classifiers for cyberattack detection, using SQL injection and DDoS attacks as two structurally distinct test cases.
Materials and methods
The figures in this section illustrate the Markov chain models of higher order applied to two types of cyberattack‚ namely SQLi and DDoS․ Each attack type is modeled using a discrete-time Markov chain on a finite state space S‚ where the state denotes the behavioral phase of the attack lifecycle․
/Metall.files/image001.jpg)
Figure 1. Markov Chain Models for SQLi & DDoS Detection, Order 1
For SQL Injection, the state space is defined as:
(1)
For DDoS:
(2)
Both state spaces capture the whole attack progression‚ starting from normal network traffic‚ through the establishment of malicious behavior‚ to successful exfiltration/exhaustion‚ or defense․ Each state corresponds to a single‚ observable behavioral phase that can be identified based on the features of the network traffic․
Order 1 – Classical (First-Order) Markov Chain
The top row shows the standard first-order Markov chain, where the probability of the next state depends only on the current state. The Markov property is formally stated as:
(3)
The transition probability matrix
is row-stochastic, which satisfies
.
The directed edges from nodes to nodes represent non-zero transitions (over the display threshold)‚ and are drawn from node i to node j with weights proportional to
․ Dashed self-loops are self-transitions‚ denoting the probability that a node maintains the same state‚ or
․ The probabilities of self-loops for terminal attack states like Exfil (SQLi) and Flood (DDoS) are higher than other states to signify their permanence․
The major difference between the two Order-1 graphs is in the most probable transitions‚ which for SQLi form a rich directed chain with branching (Normal → Recon → Inject → Auth Bypass → Exfil)‚ and thus reflect a query-level attack grammar․ In contrast‚ DDoS also has Flood and Exhaust feedback loops‚ but as volumetric attacks‚ they only have one phase of high intensity rather than distinct phases․
Order 2 — Second-Order Markov Chain
The second row extends the model in a way that the transition probability is now conditioned on the two most recent states:
(4)
In the diagram, this was represented as a sequence of bigram state nodes, each colored box encodes a pair (
,
), and the arrows between boxes represent the transition to the next observed state.
/Metall.files/image012.jpg)
Figure 2. Markov Chain Models for SQLi & DDoS Detection, Order 2
An Order-2 chain over S is equivalent to an Order-1 chain over the expanded state space
, which has
possible states. For |S| = 6, this yields up to 36 composite states.
Order 3 — Third-Order Markov Chain
The third row refers back to three prior states:
(5)
Each node now encodes a trigram (
,
,
). The theoretical state space size (in composite states) is
= 216‚ but the diagram shows just a representative attack trajectory․ Transition probabilities would be even larger‚ since a longer context window would further disambiguate the next state․
/Metall.files/image018.jpg)
Figure 3. Markov Chain Models for SQLi & DDoS Detection, Order 3
High-order models perform best on SQLi datasets where the attacks follow a formal grammar and trigram histories provide useful information․ DDoS‚ by contrast‚ is distributed over longer time intervals‚ so the additional temporal information of Orders 2 and 3 is less useful․ Instead‚ the discriminative signal for DDoS attacks is better captured at the granularity of feature statistics than of trigrams․ This is why ML classifiers performed well on DDoS datasets and Markov chains performed well on SQLi datasets․
Results and discussions
We compare the Markov chain models to six classical classifiers and two hybrid classifiers for the SQL injection and DDoS attacks datasets․ All classifiers are tested by using 30% of the dataset as a test set‚ such that the chosen test set is sufficient to stabilize the values of metrics‚ especially for the DDoS sequence dataset․ These numbers and their implications are discussed below‚ with the complete range of results provided in the tables below․
SQL Injection Detection
The SQL injection dataset has 30‚919 different query strings‚ of which 21643 form the training set and 9276 form the test set․ Customarily‚ the classifiers were based on TF-IDF feature vectors․ These are constructed based on the words found in the query and the frequency with which they occur․ Markov chain models built the models according to the order of the tokens‚ modeling the transitions of one token to another in attack traffic and normal traffic․
Table 1.
Performance comparison — SQL injection detection
|
Model |
Accuracy |
Precision |
Recall |
F1-Score |
|
Naive Bayes |
0.7094 |
0.5590 |
0.9968 |
0.7163 |
|
Logistic Regression |
0.9916 |
0.9947 |
0.9824 |
0.9885 |
|
Decision Tree |
0.9918 |
0.9958 |
0.9818 |
0.9888 |
|
Random Forest |
0.9950 |
0.9997 |
0.9868 |
0.9932 |
|
XGBoost |
0.9936 |
0.9985 |
0.9842 |
0.9913 |
|
CatBoost |
0.9942 |
0.9985 |
0.9857 |
0.9920 |
|
Markov Chain (order=1) |
0.9906 |
0.9903 |
0.9842 |
0.9872 |
|
Markov Chain (order=2) |
0.9815 |
0.9833 |
0.9660 |
0.9746 |
|
Markov Chain (order=3) |
0.9587 |
0.9862 |
0.9004 |
0.9414 |
|
Hybrid (Markov + RF) |
0.9950 |
0.9994 |
0.9871 |
0.9932 |
Most of the models performed well on this task‚ because SQL injection has a set of syntactic patterns that can be easily recognized by both frequency-based and sequence-based models․
The first-order Markov chain achieved an F1-score of 0․9872‚ only 0․60 percentage points below than Random Forest. All despite the Random Forest being trained on the 1‚000 feature TF-IDF representation‚ a much higher dimensional space․ The Markov chain used the token sequences directly‚ as well as only being trained on a portion of the training set‚ as 15% of training data was held-out to determine the threshold․ Because the two approaches produce similar results‚ we can conclude that most of the discriminative power of SQL queries is contained in the fact that SQL queries are sequential‚ and that a model that captures this property can work quite well in adverse conditions․
One reason is that performance may always get worse when moving from order=1 to order=2 and from order=2 to order=3‚ because Markov models must keep track of an exponentially growing number of states (which is a multiple of order)․ By order=3‚ there are tens of thousands of different combinations of three tokens which have been seen only a handful of times‚ insufficient to estimate their probabilities reliably․ This is where the model must make confident guesses based on very little evidence‚ and where performance degrades․ Order=1 avoids this problem because its state space remains small during the search․
DDoS Traffic Classification
In the case of the DDoS dataset‚ preprocessing did not consist of generating a sequence of queries․ Instead‚ we grouped the raw network flow records into 5-second intervals and treated each one as a sequence of states․ A state is a discretized vector of 13 network flow features (packet/byte rate‚ packet count‚ SYN‚ ACK‚ RST‚ PSH‚ FIN)․ After subsampling‚ this resulted in 2690 balanced sequences with an average of 32 states per sequence․
The main difference to the SQL injection task we described above is that by chunking the traffic into time windows‚ we compress and hide the shape of the traffic over time: instead of having the packets ordered by timestamp‚ the model observes a statistical summary of the traffic in a 5 seconds window․ This is the information that is exploited by Markov chains․
Table 2.
Performance comparison — DDoS traffic classification
|
Model |
Accuracy |
Precision |
Recall |
F1-Score |
|
Naive Bayes |
0.8067 |
0.7250 |
0.9876 |
0.8361 |
|
Logistic Regression |
0.9616 |
0.9306 |
0.9975 |
0.9629 |
|
Decision Tree |
0.9703 |
0.9501 |
0.9926 |
0.9709 |
|
Random Forest |
0.9765 |
0.9638 |
0.9901 |
0.9767 |
|
XGBoost |
0.9765 |
0.9593 |
0.9950 |
0.9769 |
|
CatBoost |
0.9802 |
0.9685 |
0.9926 |
0.9804 |
|
Markov Chain (order=1) |
0.9777 |
0.9730 |
0.9826 |
0.9778 |
|
Markov Chain (order=2) |
0.9715 |
0.9680 |
0.9752 |
0.9716 |
|
Markov Chain (order=3) |
0.9009 |
0.8358 |
0.9975 |
0.9095 |
|
Hybrid (Markov + RF) |
0.9690 |
0.9725 |
0.9653 |
0.9689 |
|
Hybrid (Markov + XGBoost) |
0.9591 |
0.9512 |
0.9677 |
0.9594 |
The first-order Markov chain did surprisingly well here‚ outperforming four of the six classical classifiers and coming in second only to CatBoost among all models․ This is somewhat unexpected given the argument about the data structure above‚ but suggests that even with aggregations to 5-s windows there are enough sequence signals to be learned by an order=1 model․ The key point again is that order=1: for this task the degradation from order=1 to order=3 is sharper than for the SQL injection task as we expected because this sequential signal is weaker․ The more complex the model‚ the more parameters an algorithm has to estimate‚ and when the underlying sequential structure is weak this has a negative effect․
In the DDoS task‚ both hybrid systems performed worse than the Markov order=1 model․ This was the opposite of what was expected from a hybrid model‚ which suggests that reasoning about the performance gap‚ rather than the result itself‚ might be more insightful․
In fact‚ this underperformance can be understood as a combination of two effects․ First‚ Markov-derived features are intrinsically lossy compression of the model's probabilistic knowledge into scalar summaries‚ even when they are more informative than any single log-likelihood score․ This is due to two reasons․ The tree-based classifier does not have access to the structure of the transition probability․ Second‚ since the two systems were not jointly trained‚ the Markov models were frozen before the ML training stage‚ meaning that the ML classifier was not trained with respect to the hybrid objective․ In this case‚ the architecture was not able to exploit the complementarity of the sequential and the statistical knowledge representations․
These results suggest that effective hybrid architectures for sequential detection tasks require the joint optimization of their sequential encoder and classification head components‚ as is done in recurrent neural networks‚ and that feature-level model stacking of individually trained systems is not sufficient in generally applicable hybrid architectures with radically different sequential encoder and classification head functions․
Conclusion
Comparing the performance on the two tasks gives us something even more defined: a Markov chain's relative performance compared to more customary classifiers benefits from a sequential approach when the dimensions have not yet been flattened into bags of features‚ and suffers when they have․ While this is not surprising in hindsight‚ it is a useful understanding in telling us when to reach for a sequential approach․
In the case of SQL injection‚ where we preserve literals as raw token sequences‚ Markov order=1 and the best customary classifier differ by less than one percentage point F1․ In the case of DDoS‚ where we consider 5 second windows of traffic dynamics‚ the gap is larger and customary models are competitive en masse․ So the lesson for system design is clear․ If your data pipeline involves some aggregation or summary steps before modeling then you're probably not going to get much out of a Markov chain․ When they come together‚ though‚ and the sequence can be maintained‚ it's a serious and interpretable competitor․
That point about interpretability has to be separated from the point about accuracy‚ because accuracy is not the only thing that matters when deploying a model․ In cybersecurity‚ detection is only the first step․ Because of the need for explanation to defend the detection mechanism to concerned parties and to adapt to evolving attack patterns‚ analysts will prefer models that label traffic as malicious and provide an understandable and auditable explanation than models that achieve slightly better accuracy as black boxes․ In particular‚ the transition-level explanations of Markov chains are in no way an afterthought of the model‚ and require no extra computation to produce․
Thus‚ these results suggest that the comparison between using Markov chains and customary machine learning classifiers in cyberattack detection should not be analyzed through the lens of comparing accuracy statistics‚ but through the lens of comparing the fit of the model assumptions to the data structure and the fit of the model output to operational needs․
References:
- Riggs H., Tufail S. Impact, vulnerabilities, and mitigation strategies for cyber-secure critical infrastructure // Sensors. – 2023. – Vol. 23, № 8.
- Al-olaqi M., Al-gailani A. Comprehensive study of SQL injection attacks mitigation methods and future directions // Journal of Cyber Security and Risk Auditing. – 2025. – Vol. 2025, № 4. – P. 347–365.
- Kim J., Lee H. Analysis of SQL injection attacks in the cloud and in web applications // Security and Privacy. – 2025. – Vol. 8, № 1. – Article e370.
- Alqahtani H., Maple C. Prevention techniques against distributed denial of service attacks in heterogeneous networks: A systematic review // Security and Communication Networks. – 2022.
- Mondal P., Kabir M. A. et al. A review of DDoS attack detection and defense technologies in Software-Defined Networking // ACM Computing Surveys. – 2024. – Vol. 56, № 1. – P. 1–42.
- Ferrag M. A., Maglaras L., Moschoyiannis S., Janicke H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study // Journal of Information Security and Applications. – 2020. – Vol. 50.
- Putra P. P. et al. Enhancing the decision tree algorithm to improve intrusion detection system performance // Intensif: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi. – 2024. – Vol. 8, № 2. – P. 77–86.
- Abdallah M., Guntur T., Arreche O. et al. E-XAI: Evaluating black-box explainable AI frameworks for network intrusion detection // IEEE Access. – 2024. – Vol. 12. – P. 151234–151252.
- Gaspar D., Silva P., Silva C. Explainable AI for intrusion detection systems: LIME and SHAP applicability on multi-layer perceptron // IEEE Access. – 2024. – Vol. 12. – P. 30164–30175.
- Kuang B. et al. Interpretable intrusion detection for IoT environments using a self-attention-based deep neural network with learnable feature gating // Scientific Reports. – 2025. – Vol. 15.
- Li Y., Zhang X., Zhou H. A regression-based procedure for Markov transition probability estimation in land change modeling // Land. – 2020. – Vol. 9, № 11.
- Liao X., Zhang Y., Wang J. et al. Federated learning with LSTM for intrusion detection in IoT-based wireless sensor networks: A multi-dataset analysis // PeerJ Computer Science. – 2025. – Vol. 11.
- Karpov A. V., Kotenko I. V. A Markov model of non-mutually exclusive cyber threats and its applications for selecting an optimal set of information security remedies // Modeling and Analysis of Information Systems. – 2020. – Vol. 27, № 1. – P. 108–123.
- Barnard M., Khan R. et al. A systematic review on the integration of explainable artificial intelligence in intrusion detection systems to enhancing transparency and interpretability in cybersecurity // Frontiers in Artificial Intelligence. – 2025. – Vol. 8.