EFFICIENCY ANALYSIS OF MACHINE LEARNING MODELS FOR IoT INTRUSION DETECTION USING FEATURE SELECTION TECHNIQUES

ПРИМЕНЕНИЕ АЛГОРИТМОВ МАШИННОГО ОБУЧЕНИЯ ДЛЯ ВЫЯВЛЕНИЯ И ПРЕДОТВРАЩЕНИЯ АТАК НА IoT: АНАЛИЗ ЭФФЕКТИВНОСТИ

Kubantov D.

28.05.2026 256

5(146)

10. Информатика, вычислительная техника и управление

Цитировать:

Kubantov D. EFFICIENCY ANALYSIS OF MACHINE LEARNING MODELS FOR IoT INTRUSION DETECTION USING FEATURE SELECTION TECHNIQUES // Universum: технические науки : электрон. научн. журн. 2026. 5(146). URL: https://7universum.com/ru/tech/archive/item/22700 (дата обращения: 28.07.2026).

Прочитать статью:

Статья поступила в редакцию: 21.04.2026

Принята к публикации: 14.05.2026

Опубликована: 28.05.2026

УДК 0004.8

ABSTRACT

Conventional intrusion detection is insufficient for resource-constrained IoT systems due to the rapid proliferation of Internet of Things (IoT) devices across important domains, which has increased the network attack surface. Using a unified pipeline that includes StandardScaler normalization, SMOTE class-imbalance correction, and SelectKBest (ANOVA F-statistic) feature selection to reduce 41 features to 20, this paper presents a systematic efficiency analysis of five supervised machine learning classifiers on the NF-ToN-IoT benchmark: Logistic Regression, Random Forest, Support Vector Machine (SVM, RBF kernel), K-Nearest Neighbors (KNN), and XGBoost. A stratified sample of 100,000 records is used in the experiments. With a weighted F1-score of 0.9899 (accuracy 98.99%), Random Forest outperforms XGBoost (F1 = 0.9889) and KNN (F1 = 0.9754). SVM somewhat improves after selection (ΔF1 = +0.0020), but XGBoost declines the most (ΔF1 = -0.0074). While XGBoost trains in 1.84 seconds, SVM takes 389.95 seconds to train, compared to 19× Random Forest (20.45 seconds). With clear implications for lightweight IoT IDS deployment, these results show that ensemble tree approaches reach performance within 1% of state-of-the-art deep learning at orders of magnitude reduced computing cost.

АННОТАЦИЯ

Обычное обнаружение вторжений недостаточно для систем Интернета вещей с ограниченными ресурсами из-за быстрого распространения устройств Интернета вещей (IoT) в важных доменах, что увеличивает вероятность сетевых атак. Используя унифицированный конвейер, который включает нормализацию стандартного масштаба, коррекцию дисбаланса классов SMOTE и выбор функции SelectKBest (ANOVA F-статистика) для сокращения 41 функции до 20, в этой статье представлен систематический анализ эффективности пяти контролируемых классификаторов машинного обучения в тесте NF-ToN-IoT: Логистической регрессии, случайного леса, Метод опорных векторов (SVM, ядро RBF), K-ближайших соседей (KNN) и XGBoost. В экспериментах использовалась стратифицированная выборка из 100 000 записей. Со взвешенным показателем F1, равным 0,9899 (точность 98,99%), Random Forest превосходит XGBoost (F1 = 0,9889) и KNN (F1 = 0,9754). SVM несколько улучшается после выбора (ΔF1 = +0,0020), но XGBoost снижается больше всего (ΔF1 = -0,0074). В то время как XGBoost обучается за 1,84 секунды, SVM требуется 389,95 секунды для обучения, по сравнению с 19× Random Forest (20,45 секунды). Эти результаты, имеющие очевидное значение для внедрения облегченных IDS Интернета вещей, показывают, что при использовании методов комплексного дерева производительность достигает 1% от уровня современного глубокого обучения при снижении вычислительных затрат на порядки.

Keywords: IoT; NIDS; Machine Learning; Feature Selection; Random Forest; XGBoost; NF-ToN-IoT; SMOTE; Anomaly Detection.

Ключевые слова: IoT; NIDS; машинное обучение; выбор признаков; случайный лес; XGBoost; NF-ToN-IoT; SMOTE; обнаружение аномалий.

1. Introduction

IoT devices projected to exceed 29 billion by 2030 [1] are characterized by limited processing resources, heterogeneous protocols, and absent security mechanisms — conditions that invalidate conventional enterprise-grade intrusion detection systems and demand purpose-built, data-driven alternatives. Network Intrusion Detection Systems (NIDS) for IoT face three compounding challenges: multi-class attack taxonomies with complex, non-linearly separable traffic distributions; severe class imbalance where benign flows dominate observation sets; and deployment constraints that preclude GPU-intensive deep learning on edge infrastructure. Existing surveys confirm that no established security standard framework adequately addresses these IoT-specific constraints [2, 9], reinforcing the need for empirically validated ML approaches.

Although ensemble and deep learning methods consistently achieve high detection rates on IoT benchmarks [4, 6, 14–16], few studies perform strictly controlled multi-classifier comparisons under a shared preprocessing pipeline while simultaneously quantifying training-time efficiency — a critical dimension for resource-aware deployments. Without such controlled comparisons, reported accuracy advantages cannot be reliably separated from preprocessing artefacts, and practical deployment implications remain unclear.

This paper addresses that gap through a unified comparative evaluation of five classifiers on the NF-ToN-IoT dataset, with identical normalization, SMOTE resampling, and SelectKBest feature selection applied uniformly. Results are assessed across weighted F1-score, accuracy, and wall-clock training time. The principal contributions are: (1) A reproducible multi-metric comparison of five classifiers under a unified preprocessing framework; (2) Quantification of per-classifier sensitivity to SelectKBest dimensionality reduction; (3) A training-time efficiency analysis enabling performance-cost tradeoff assessment for IoT deployment; (4) Contextualization of results against deep learning benchmarks demonstrating competitive accuracy at substantially lower computational cost.

2. Related work

A. IoT Security Surveys and Challenges

Liao et al. [11] documented that IoT devices average 25% exploitable vulnerabilities, attributable to resource limitations and absent security standards. Adam et al. [2] analyzed security across a three-layer IoT architecture (device, network, application), identifying cross-layer threats requiring adaptive countermeasures. Karie et al. [9] reviewed 80 ISO/IEC standards, 32 ETSI standards, and 37 assessment frameworks, concluding that none was designed for IoT-specific operational constraints — a finding that directly motivates data-driven ML detection approaches. Abuserrieh and Alalfi [1] surveyed program analysis tools for IoT safety verification, identifying significant gaps for heterogeneous event-driven IoT interactions.

B. Machine Learning-Based NIDS

Fraihat et al. [14] proposed an IoT NetFlow NIDS using a modified Arithmetic Optimization Algorithm for feature selection, reducing 43 features to 7 and training Random Forest and Extra Trees classifiers, achieving 99% binary and 98% multi-class accuracy while cutting prediction time by 84% — directly motivating the feature selection dimension of the present study. Lawal et al. [10] evaluated classification algorithms for network anomaly mitigation on UNSW-NB15, identifying high false-positive rates and scalability as persistent IoT challenges. The survey of Abdulkareem et al. [15] (2018–2024) confirmed Random Forest and SVM as dominant performers across IoT NID studies, while noting that dataset heterogeneity complicates direct cross-study comparison. Lu et al. [13] coupled multi-model ML detection (PCA reduction, six classifiers, UNSW-NB15 and CICIoT2023) with ANN-based secure key distribution, jointly optimizing detection and communication security.

C. Deep Learning Approaches

Gaber et al. [6] achieved 99.8% accuracy and FNR < 0.2 on BoT-IoT/ToN-IoT using CNN combined with KPCA (Metaverse-IDS). Hizal et al. [7] evaluated DNN, CNN, LSTM, and RNN architectures on CICIoT2023, finding two-stage hierarchical models superior for IoT DDoS detection. Cherfi et al. [4] reached up to 99.97% accuracy on ToN-IoT using CNN with K-means-based class balancing. Yasarathna et al. [18] surveyed deep learning AAD in SDN-IoT, reporting methods exceeding 99% accuracy but identifying scalability and resource constraints as primary deployment barriers — the principal motivation for comparing classical ML efficiency in the present study.

D. Ensemble, Hierarchical, and Federated Approaches

Chang et al. [3] proposed HADIoT, a three-tier framework using GRU at edge servers for local anomaly detection and CRF at the cloud for global correlation, outperforming three baselines on ISCX 2012. Toony et al. [16] introduced MULTI-BLOCK for SDN-enabled IoT, achieving 99.75% accuracy, 99.53% F1, and 2.11 ms detection time on X-IIoTID, TON_IoT, and Edge-IIoTset — the highest published benchmark on these datasets. Wakili and Bakkali [17] combined Balanced Random Forest, LightGBM, CatBoost, Extra Trees, and XGBoost with hybrid feature selection on CICIoT2023, obtaining 99.16% accuracy and MCC = 0.9908; SHAP analysis identified inter-arrival time and TCP features as most discriminative. Fusco et al. [5] addressed federated tinyML IDS via Siamese Neural Networks with epsilon-greedy gradient sparsification on CSE-CIC-IDS2018, enabling few-shot training under bandwidth constraints. Zachos et al. [19] demonstrated lightweight IoMT detection using OCSVM, LOF, and IsoForest on Raspberry Pi hardware at less than 1% CPU utilization.

Across this literature four gaps persist: (i) absence of controlled multi-classifier comparisons under identical preprocessing; (ii) inconsistent class-imbalance handling; (iii) near-universal omission of training-time measurement; (iv) insufficient analysis of per-model feature selection sensitivity. The present study directly addresses all four.

3. Methodology

A. Dataset: NF-ToN-IoT

NF-ToN-IoT is a NetFlow-based representation of the ToN-IoT dataset (UNSW Canberra), providing 41 flow-level features and ground-truth labels for ten traffic classes: normal and nine attacks (Backdoor, DDoS, DoS, Injection, MITM, Password, Ransomware, Scanning, XSS). The native dataset exhibits severe class imbalance, with several attack categories constituting less than 1% of total observations. A stratified random sample of 100 000 records was drawn, partitioned 80:20 (training/test) using stratified splitting. This sampling scale is justified on three grounds: (i) statistical power — at n = 20 000 test instances, the 95% CI width for F1 ≈ 0.99 is less than ±0.14 pp, rendering inter-model differences statistically meaningful; (ii) faithful preservation of the native class-imbalance structure through stratification; and (iii) computational tractability for SVM, whose O(n²) kernel complexity would render training on the full multi-million-record corpus infeasible without GPU acceleration.

B. Preprocessing

1) Feature Normalization

StandardScaler transforms each feature xⱼ using parameters computed exclusively on the training partition:

zⱼ = (xⱼ − μⱼ) / σⱼ (1)

preventing test-set leakage and ensuring scale invariance — essential for SVM and KNN, whose decision boundaries are governed by Euclidean distances in the feature space. Normalization is applied uniformly across all five classifiers to prevent scale-sensitivity from confounding inter-model comparisons.

2) Class Imbalance Correction via SMOTE

SMOTE [21] is applied to the training partition only. For each minority-class instance xᵢ, a synthetic sample is generated by interpolation with a randomly selected minority-class neighbor x̂:

xₛʸⁿₜ = xᵢ + λ(x̂ − xᵢ), λ ∼ U[0,1] (2)

Unlike naive oversampling, SMOTE densifies minority-class decision boundary regions with novel instances, enabling classifiers to learn more generalizable minority-class rules. This is critical given that some NF-ToN-IoT attack categories represent less than 1% of observations, where classifiers trained without resampling would optimize for majority-class performance while achieving near-random minority detection rates.

3) Supervised Feature Selection via SelectKBest

SelectKBest retains the K=20 features with highest ANOVA F-statistics, quantifying per-feature between-class versus within-class variance separation:

Fⱼ = (SS_Bⱼ / (C−1)) / (SS_Wⱼ / (n−C)) (3)

where SS_B and SS_W are between- and within-class sums of squares, C=10 is the class count, and n the sample size. This reduces dimensionality by 51.2% (41 → 20 features), lowering inference latency and overfitting risk for models without intrinsic regularization, while improving interpretability. SelectKBest was preferred over wrapper methods for computational symmetry across all five classifiers.

C. Performance Metrics

Primary metrics are overall accuracy and weighted F1-score, computed on the held-out 20% test set. Weighted F1 provides class-imbalance-aware assessment:

F1_w = Σ_c w_c · F1_c, w_c = n_c/n (4)

F1_c = 2·Precision_c·Recall_c / (Precision_c + Recall_c) (5)

where Precision_c = TP_c/(TP_c+FP_c) and Recall_c = TP_c/(TP_c+FN_c). Wall-clock training time is additionally recorded under baseline conditions for computational efficiency analysis.

D. Classifier Configurations

Five classifiers were evaluated: (1) Logistic Regression (LR): L2 regularization, max_iter = 1000 — linear reference baseline; (2) Random Forest (RF): 100 bootstrap-aggregated decision trees (n_estimators = 100); (3) SVM with RBF kernel K(x,x′)=exp(−γ‖x−x′‖²), nonlinear maximum-margin classification; (4) KNN: k = 5, Euclidean distance, uniform weighting; (5) XGBoost: regularized gradient boosting, default hyperparameters. All implementations use scikit-learn v1.x and the xgboost library with fixed random seeds for reproducibility.

4. Experimental results

A. Baseline Performance (41 Features)

Table 1 reports accuracy and weighted F1-score for all five classifiers under the full-feature baseline. Figure 1 visualizes the F1-score distribution.

Table 1.

Baseline Classifier Performance on NF-ToN-IoT (41 features, SMOTE, n = 100 000).

Classifier	Accuracy	Weighted F1	Rank	FS Robustness	Ref.
Logistic Regression	0.7094	0.7246	5th	Low	[LR]
Random Forest	0.9899	0.9899	1st ★	High	[RF]
SVM (RBF)	0.9663	0.9664	4th	Improves	[SVM]
KNN (k=5)	0.9753	0.9754	3rd	Moderate	[KNN]
XGBoost	0.9889	0.9889	2nd	Moderate	[XGB]

★ Best performer. FS Robustness: direction of F1 change after feature selection (Table 2). Alt. rows shaded.

Figure 1. Comparison of machine learning models based on baseline weighted F1-score (NF-ToN-IoT, 41 features, SMOTE-balanced).

Random Forest achieves the highest performance (F1 = 0.9899), followed by XGBoost (0.9889), KNN (0.9754), and SVM (0.9664). Logistic Regression trails substantially at F1 = 0.7246 — a 26.5 pp deficit that quantitatively confirms the inadequacy of linear decision boundaries for non-linearly separable IoT multi-class traffic distributions (detailed in Section V.B).

B. Post-Feature-Selection Performance (20 Features)

Table 2 presents F1-scores after dimensionality reduction and Fig. 2 provides paired visual comparison of baseline versus post-selection performance.

Table 2.

Weighted F1-Score Before and After SelectKBest (K = 20, ANOVA F-statistic)

Classifier	Baseline F1	Post-FS F1	Δ F1	% Δ	Interpretation
Logistic Regression	0.7246	0.7188	−0.0058	−0.80%	Marginal linear signal lost
Random Forest	0.9899	0.9866	−0.0033	−0.33%	Robust; intrinsic FS
SVM (RBF)	0.9664	0.9684	+0.0020	+0.21%	Kernel noise reduced
KNN (k=5)	0.9754	0.9689	−0.0065	−0.67%	Neighbor set shifted
XGBoost	0.9889	0.9815	−0.0074	−0.75%	Multivariate signal lost

Δ F1 = Post-FS F1 − Baseline F1. Positive values indicate improvement. Alt. rows shaded.

Figure 2. Performance comparison before and after SelectKBest feature selection (K = 20). Blue: baseline; orange: post-selection.

SVM is the sole model to benefit from feature reduction (ΔF1 = +0.0020); Random Forest sustains minimal degradation (−0.0033), retaining 98.66% of baseline performance at 51.2% lower feature dimensionality. XGBoost incurs the largest decline (−0.0074) despite its intrinsic feature importance mechanism, indicating that univariate F-score ranking fails to capture multivariate boosting-stage dependencies. All five classifiers retain more than 97% of their baseline F1 after feature reduction, validating SelectKBest’s overall feature selection quality.

C. Training Time and Computational Efficiency

Table 3 and Fig. 3 present wall-clock training times and efficiency metrics across all classifiers.

Table 3.

Training Time and Efficiency Classification (41 features, 80 000 training instances, CPU).

Classifier	Train Time (s)	vs. RF	F1	F1/log(T)	Deployment Tier
Logistic Regression	6.58	0.32×	0.7246	0.88	Impractical (low F1)
Random Forest	20.45	1.00×	0.9899	0.76	Optimal (F1 + speed)
SVM (RBF)	389.95	19.07×	0.9664	0.38	Costly; edge-limited
KNN (k=5)	0.01	<0.01×	0.9754	N/A	Slow inference at scale
XGBoost	1.84	0.09×	0.9889	3.76	Best efficiency ratio

F1/log(T): efficiency proxy (higher = better). KNN training = index construction only; inference scales O(n·d). vs. RF: normalized to Random Forest.

Figure 3. Wall-clock training time comparison (seconds, CPU, 80 000 training instances). Note the logarithmic scale difference between SVM and remaining classifiers

SVM’s 389.95 s training cost — a 19× overhead over Random Forest and 200× over XGBoost — renders it operationally impractical for IoT deployments requiring periodic model retraining. XGBoost achieves the superior efficiency ratio (F1/log₁₀T = 3.76), completing training in 1.84 s at F1 = 0.9889. KNN’s apparent training-time advantage is deceptive: as a lazy learner, inference scales O(n·d), becoming prohibitive at IoT monitoring scale.

5. Discussion

A. Random Forest: Structural Basis for Superior Performance

Random Forest’s consistent top performance (F1 = 0.9899 baseline; 0.9866 post-selection) reflects properties intrinsically aligned with IoT multi-class traffic classification. Bootstrap aggregation across 100 trees reduces variance relative to any single tree, while per-node random feature subsampling of size √d decorrelates estimators. The aggregated majority vote produces stable predictions even for geometrically irregular, non-convex attack-class regions in the 41-dimensional NetFlow feature space that no single decision boundary can adequately separate. Critically, Random Forest performs implicit feature selection by assigning higher split frequency to features that maximize Gini impurity reduction, making it intrinsically noise-robust and explaining the minimal post-selection decline (−0.0033). This result corroborates Fraihat et al. [14] (RF top performer across IoT NetFlow benchmarks) and Abdulkareem et al. [15] (RF dominant in 2018–2024 IoT NID survey). At 20.45 s training time with CPU-only execution and native feature importance output supporting SHAP-compatible explainability [17], Random Forest represents the optimal production choice for edge IoT NIDS.

B. Logistic Regression: Linear Hypothesis Class Mismatch

Logistic Regression’s F1 of 0.7246 — a 26.5 pp deficit vs. Random Forest — is a structural consequence, not a tuning failure. The model estimates class scores via the linear transformation s_c = w_cᵀ x + b_c, producing hyperplane decision boundaries that are geometrically incapable of separating the non-convex, interleaved attack-class distributions of IoT NetFlow traffic. DDoS, MITM, and ransomware flows occupy fundamentally different, overlapping regions of feature space that no linear boundary can simultaneously disentangle. The post-selection decline (−0.0058) further confirms that some discarded features carried marginal linear discriminative signal; conversely, non-linear models were unaffected. SMOTE and L2 regularization improve minority-class gradient signal and overfitting control, respectively, but cannot remedy the fundamental geometric mismatch. Logistic Regression is appropriate as a performance lower bound but should be excluded from production IoT IDS consideration.

C. Feature Selection: Model-Specific Sensitivity Analysis

The divergent responses in Table 2 and Fig. 2 expose model-architecture-specific interactions with the SelectKBest criterion. SVM’s improvement (ΔF1 = +0.0020) reflects kernel sensitivity: the RBF kernel distance K(x,x′)=exp(−γ‖x−x′‖²) is diluted by irrelevant feature dimensions inflating ‖x−x′‖²; removing 21 low-F-score features sharpens the kernel geometry. KNN’s decline (−0.0065) arises because nearest-neighbor composition shifts when features contributing to borderline-instance disambiguation are removed — a direct consequence of Euclidean distance sensitivity to feature subset changes. XGBoost’s largest decline (−0.0074) is mechanistically distinct: XGBoost’s sequential tree construction creates higher-order inter-feature dependencies where features with low univariate F-scores may be critical for correcting residual errors at later boosting stages — a multivariate structure invisible to univariate SelectKBest. For gradient boosting models, wrapper-based or model-embedded feature selection (e.g., XGBoost importance-guided RFE) would better preserve these dependencies, a direction for future work.

D. Efficiency–Accuracy Tradeoff and Comparison with Deep Learning

The multi-dimensional comparison across Tables 1–3 reveals a nuanced tradeoff landscape. XGBoost provides the best efficiency ratio (F1 = 0.9889 at 1.84 s), making it optimal for time-critical retraining scenarios. Random Forest offers the best absolute accuracy at moderate training cost. SVM’s 19× overhead over Random Forest and moderate F1 make it unsuitable for periodic-retraining deployments despite its feature-reduction benefit.

Comparing with deep learning literature requires methodological caution: Gaber et al. [6] (CNN, 99.8%), Toony et al. [16] (MULTI-BLOCK, 99.75%), and Cherfi et al. [4] (CNN+K-means, 99.97%) all employed different dataset versions, preprocessing pipelines, and class-imbalance handling. Within those constraints, the accuracy gap between Random Forest in the present study and the best published CNN system is approximately 1 percentage point. However, these deep learning systems require GPU-accelerated training (typically tens of minutes to hours vs. 20.45 s for RF on CPU), specialized frameworks, and in MULTI-BLOCK’s case, dedicated SDN infrastructure. Yasarathna et al. [18] explicitly identify these resource requirements as the primary barrier to deep learning deployment on IoT edge hardware. Ensemble tree methods thus represent the pragmatically optimal solution for production IoT NIDS deployments: near-competitive accuracy, CPU-only operation, sub-minute retraining, and interpretable feature importance.

6. Limitations

Four limitations qualify the scope of this study. (1) Single-dataset evaluation: all results derive from NF-ToN-IoT; generalizability to other IoT environments (Bot-IoT, UNSW-NB15, CICIoT2023, N-BaIoT) with different traffic distributions and attack compositions is empirically unverified. (2) Default hyperparameter configuration: no systematic optimization (Bayesian search, grid search) was performed; reported results represent lower bounds on achievable per-model performance. (3) Univariate feature selection criterion: SelectKBest’s ANOVA F-statistic evaluates features independently, failing to capture multivariate inter-feature dependencies that are particularly consequential for XGBoost’s sequential boosting mechanism; model-aware or wrapper-based selection may yield superior results. (4) No within-framework deep learning comparison: deep learning architectures (CNN, LSTM, autoencoder) were not evaluated under the same experimental pipeline, precluding controlled accuracy-cost quantification — a methodologically rigorous extension for future work. Additionally, adversarial robustness and concept drift were not modeled; classifiers were trained and evaluated on a static dataset snapshot, potentially overestimating performance relative to online deployment scenarios.

7. Conclusion

For multi-class IoT network intrusion detection on NF-ToN-IoT, this research offered a methodical, repeatable efficiency investigation of five supervised machine learning classifiers: Logistic Regression, Random Forest, SVM (RBF), KNN, and XGBoost. Fair inter-model comparison across weighted F1-score, accuracy, and wall-clock training time was made possible by a uniform pipeline (StandardScaler, SMOTE, SelectKBest K = 20) applied to 100,000 stratified records.

Random Forest offers the best performance-efficiency balance, with F1 = 0.9899 baseline, F1 = 0.9866 post-selection (retaining 99.7% of peak performance at 51.2% lower dimensionality), and 20.45 s CPU training. XGBoost has the highest efficiency ratio (F1 = 0.9889 at 1.84 s) for retraining important deployments. SVM improves with feature reduction, but the 390-second training overhead prevents quick retraining. Logistic Regression is limited to linear decision boundaries, resulting in F1 = 0.7246, making it a theoretical lower bound rather than a feasible production classifier. The difference in accuracy between Random Forest and cutting-edge deep learning (CNN, MULTI-BLOCK) is less than 1 percentage point, however ensemble methods eliminate GPU needs, reduce training time by orders of magnitude, and give native feature importance for explainable alert production.

For IoT security professionals, these findings offer four deployment recommendations: (i) Use Random Forest with SMOTE and SelectKBest (K=20) as the default edge IoT NIDS on CPU-based infrastructure. (ii) Use XGBoost for sub-second retraining scenarios. (iii) Use weighted F1 as the primary metric for imbalanced IoT traffic benchmarks. (iv) Exclude Logistic Regression from multi-class IoT detection tasks. Future work will focus on cross-dataset generalization, Bayesian hyperparameter optimization, model-aware feature selection (XGBoost-guided RFE, mutual information), within-framework deep learning comparison, and federated learning adaption [5] for privacy-preserving edge IoT deployments.

References:

L. Abuserrieh and M. H. Alalfi, “A Survey on Verification of Security and Safety in IoT Systems,” IEEE Access, vol. 12, pp. 138627–, 2024.
M. Adam, M. Hammoudeh, R. Alrawashdeh, and B. Alsulaimy, “A Survey on Security, Privacy, Trust, and Architectural Challenges in IoT Systems,” IEEE Access, vol. 12, pp. 57128–, 2024.
H. Chang, J. Feng, and C. Duan, “HADIoT: A Hierarchical Anomaly Detection Framework for IoT,” IEEE Access, vol. 8, pp. 154530–154543, 2020.
S. Cherfi, A. Boulaiche, A. Lemouari, and A. Abouaissa, “Enhancing intrusion detection in IoT: CNN integration with K-means for efficient and balanced classification,” Expert Systems with Applications, vol. 299, p. 130122, 2026.
P. Fusco, F. Palmieri, and M. Ficco, “Combining epsilon-greedy reinforcement learning based gradient sparsification and siamese neural networks for few-shot federated tinyML intrusion detection in IoT,” Internet of Things, vol. 34, p. 101820, 2025.
T. Gaber et al., “Metaverse-IDS: Deep learning-based intrusion detection system for Metaverse-IoT networks,” Internet of Things, vol. 24, p. 100977, 2023.
S. Hizal, U. Cavusoglu, and D. Akgun, “A novel deep learning-based intrusion detection system for IoT DDoS security,” Internet of Things, vol. 28, p. 101336, 2024.
A. H. Janabi, T. Kanakis, and M. Johnson, “Survey: Intrusion Detection System in Software-Defined Networking,” IEEE Access, vol. 12, pp. 164097–, 2024.
N. M. Karie, N. M. Sahri, W. Yang, C. Valli, and V. R. Kebande, “A Review of Security Standards and Frameworks for IoT-Based Smart Environments,” IEEE Access, vol. 9, pp. 121975–, 2021.
M. A. Lawal, R. A. Shaikh, and S. R. Hassan, “Security Analysis of Network Anomalies Mitigation Schemes in IoT Networks,” IEEE Access, vol. 8, pp. 43355–, 2020.
B. Liao, Y. Ali, S. Nazir, L. He, and H. U. Khan, “Security Analysis of IoT Devices by Using Mobile Computing: A Systematic Literature Review,” IEEE Access, vol. 8, pp. 120331–, 2020.
I. Lorenzo-Fonseca, F. Maciá-Pérez, A. Maciá-Fiteni, and L. Arnau-Muñoz, “AI-Driven Multiagent IoT System for Energy Consumption Anomaly Detection,” IEEE Internet of Things Journal, vol. 13, no. 2, pp. 2196–, 2026.
J. Lu, A. Bhar, A. Sarkar, A. Noorwali, and K. M. Othman, “Enhancing real-time intrusion detection and secure key distribution using multi-model machine learning approach,” Internet of Things, vol. 28, p. 101377, 2024.
S. Fraihat, S. Makhadmeh, M. Awad, M. A. Al-Betar, and A. Al-Redhaei, “Intrusion detection system for large-scale IoT NetFlow networks using machine learning with modified Arithmetic Optimization Algorithm,” Internet of Things, vol. 22, p. 100819, 2023.
S. A. Abdulkareem, C. H. Foh, M. Shojafar, F. Carrez, and K. Moessner, “Network Intrusion Detection: An IoT and Non IoT-Related Survey,” IEEE Access, vol. 12, pp. 147167–, 2024.
A. A. Toony, F. Alqahtani, Y. Alginahi, and W. Said, “MULTI-BLOCK: A novel ML-based intrusion detection framework for SDN-enabled IoT networks,” Internet of Things, vol. 26, p. 101231, 2024.
A. Wakili and S. Bakkali, “A resilient IoT intrusion detection system using hybrid feature selection and explainable ensemble learning,” Results in Engineering, vol. 28, p. 107392, 2025.
T. L. Yasarathna, M. Liyanage, and N.-A. Le-Khac, “Deep Learning-Based Autonomous Anomaly Detection for Security in SDN-IoT Networks,” IEEE Open Journal of the Communications Society, vol. 6, pp. 8007–, 2025.
G. Zachos et al., “Anomaly-Based Intrusion Detection for IoMT Networks: Design, Implementation, Dataset Generation, and ML Algorithms Evaluation,” IEEE Access, vol. 13, pp. 41994–, 2025.
S. Bin Hulayyil, S. Li, and N. Saxena, “Explainable AI-based intrusion detection in IoT systems,” Internet of Things, vol. 31, p. 101589, 2025.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.