Master's Student, School of IT and Engineering, Kazakh-British Technical University, Kazakhstan, Almaty
DEVELOPMENT OF AN AUTOMATED SYSTEM FOR AUDITING AND MONITORING THE SECURITY OF SERVER CONFIGURATIONS
УДК: 004.056
ABSTRACT
Server misconfiguration remains the leading cause of cybersecurity incidents globally, with an average breach cost of USD 4.45 million (IBM, 2023). Existing open-source solutions — Lynis and OpenSCAP — operate as batch processes requiring manual execution with no real-time monitoring, automated alerting, or centralized multi-server management. The objective of this study is to design and develop a research prototype of an automated, agentless system for continuous auditing and real-time monitoring of Linux server security configurations, targeting production-grade Linux server fleets. This paper follows the Design Science Research (DSR) methodology and presents the design and prototype implementation of an automated, agentless system intended for production-grade Linux server fleets. The rule engine implements 40 Center for Internet Security (CIS) Benchmark v8 checks across seven security categories via Secure Shell (SSH)-based remote scanning. Each check is scored using the Common Vulnerability Scoring System (CVSS) v4.0 base metric calculator applied to the specific vulnerability characteristics of each control. Real-time change detection is implemented via the Linux inotify kernel subsystem. The prototype achieves a Mean Time to Detect (MTTD) of under 10 seconds and provides a centralized multi-server web dashboard with Telegram, email, and webhook notifications. Experimental evaluation of the prototype in a Docker-based environment with two Ubuntu 22.04 servers confirms correct detection of all 40 pre-configured CIS violations, accurate bidirectional compliance state tracking, and a measured MTTD of 8 seconds from file modification to alert delivery. Production deployment on real server fleets is identified as future work.
АННОТАЦИЯ
Неправильная конфигурация серверов является ведущей причиной инцидентов информационной безопасности; средняя стоимость утечки данных составляет 4,45 млн долл. США (IBM, 2023). Существующие инструменты с открытым исходным кодом — Lynis и OpenSCAP — работают в пакетном режиме без поддержки непрерывного мониторинга, автоматических оповещений и централизованного управления несколькими серверами. Цель данной работы — разработать исследовательский прототип автоматизированной безагентной системы непрерывного аудита и мониторинга безопасности конфигураций Linux-серверов, предназначенный для серверных парков промышленного класса. Работа выполнена в рамках методологии проектного исследования (Design Science Research, DSR) и представляет проектирование и прототипную реализацию системы, предназначенной для серверных парков промышленного класса; правила реализованы на основе стандарта CIS Benchmark v8 для Ubuntu Linux 22.04 с удалённым сканированием по протоколу SSH (Secure Shell). Каждая проверка оценивается по базовым метрикам CVSS v4.0 (Common Vulnerability Scoring System), применяемым к характеристикам уязвимости соответствующего элемента управления. Обнаружение изменений в реальном времени реализовано через подсистему inotify ядра Linux. Прототип реализует 40 проверок в семи категориях безопасности и достигает среднего времени обнаружения (MTTD) менее 10 секунд, обеспечивая централизованный мониторинг с уведомлениями через Telegram, электронную почту и webhook. Экспериментальная оценка прототипа в среде Docker с двумя серверами Ubuntu 22.04 подтверждает корректное обнаружение всех 40 нарушений конфигурации и точное двунаправленное отслеживание состояния соответствия; MTTD составил 8 секунд от изменения файла до доставки оповещения. Развёртывание на реальных серверных парках обозначено как направление дальнейшей работы.
Keywords: server security, configuration management, CIS Benchmark, automated auditing, MTTD, CVSS v4.0, agentless scanning.
Ключевые слова: безопасность серверов, управление конфигурациями, CIS Benchmark, автоматизированный аудит, MTTD, CVSS v4.0, безагентное сканирование.
Introduction
Server security configuration management is a critical component of organizational cybersecurity. According to the Verizon Data Breach Investigations Report 2023 [1], misconfiguration accounts for the largest share of security incidents globally. The IBM Cost of a Data Breach Report 2023 [2] reports an average breach cost of USD 4.45 million — the highest figure ever recorded. Linux powers approximately 96% of the world’s top web servers according to industry surveys [14], making it the most relevant platform for configuration security research.
The core challenge is the Mean Time to Detect (MTTD) — the gap between when a misconfiguration occurs and when it is detected. Manual audits are conducted quarterly or annually, leaving servers vulnerable for months. Existing tools — Lynis [3] and OpenSCAP [4] — are batch processes: they run once, produce a report, and terminate. They cannot monitor configuration changes continuously, cannot send automated notifications, and do not support centralized multi-server management. Open-source agent-based platforms address some of these limitations but require software installation on each monitored host, increasing deployment complexity and the attack surface; commercial agent-based and cloud-based platforms add licensing cost and reduce transparency of the scanning logic (the alternatives are surveyed in the Related Work section). The objective of this study is to design and develop a research prototype of an automated, agentless system for continuous auditing and real-time monitoring of Linux server security configurations via Secure Shell (SSH), targeting production-grade Linux server fleets and capable of detecting violations within seconds and providing centralized multi-server management without any software installation on target hosts. The contribution of this work is a validated architecture and a reference implementation; production deployment on real server fleets is explicitly framed as future work.
Related Work
Three classes of solutions address server configuration security in production environments: commercial vulnerability and policy management platforms, open-source agent-based security platforms, and open-source agentless audit tools. This section positions the proposed system against representative examples of each class. The broader academic discourse on automated security configuration analysis is summarised in recent surveys [8, 9], and the operational guidance for security-focused configuration management is consolidated in NIST SP 800-128 [10]; the work presented here is positioned within that landscape as an applied prototype rather than a theoretical contribution.
Qualys Vulnerability Management, Detection and Response (VMDR) and Qualys Policy Compliance [12] are commercial cloud-based platforms providing CIS Benchmark scanning, continuous vulnerability assessment, and centralized multi-server management with real-time alerting through cloud agent or scanner appliance deployment. They are closed-source proprietary subscription products with significant licensing costs; the scanning logic, rule definitions, and severity scoring methodology are not openly auditable; deployment requires either installing the Qualys Cloud Agent on each monitored host or routing scan traffic through Qualys-controlled cloud infrastructure.
Rapid7 InsightVM [13] is a commercial agent-based vulnerability management platform with broader scope than configuration auditing, covering vulnerability scanning, patch verification, and risk prioritization. The Insight Agent must be installed on each monitored host, and the platform shares the same licensing-cost and closed-source characteristics as Qualys. Configuration policy coverage is delivered through proprietary policy templates rather than openly modifiable rule files.
Wazuh [11] is an open-source security platform combining host-based intrusion detection, log analysis, file integrity monitoring, and policy compliance against CIS Benchmarks. It supports real-time event delivery, centralized multi-server management, and free-of-charge deployment. However, Wazuh follows an agent-based architecture: a Wazuh agent must be installed and maintained on every monitored host, increasing the operational footprint, the trusted code base, and the attack surface of the audited servers. Wazuh’s CIS coverage is delivered through a SCAP-derived ruleset rather than the lightweight YAML schema proposed here.
Table 1 summarises the capability comparison across the proposed system and the five existing solutions discussed above.
Table 1.
Capability comparison: proposed prototype vs. existing tools
|
Capability |
Proposed |
Lynis |
OpenSCAP |
Wazuh |
Qualys |
Rapid7 |
|
Open-source |
✓ |
✓ |
✓ |
✓ |
✗ |
✗ |
|
Free of charge |
✓ |
✓ |
✓ |
✓ |
✗ |
✗ |
|
Agentless deployment |
✓ |
local only |
local only |
✗ |
✗ |
✗ |
|
Multi-server centralized scan |
✓ |
✗ |
partial |
✓ |
✓ |
✓ |
|
Real-time change detection |
✓ |
✗ |
✗ |
✓ |
✓ |
partial |
|
Automated notifications |
✓ |
✗ |
✗ |
✓ |
✓ |
✓ |
|
CIS Benchmark alignment |
✓ |
partial |
✓ |
✓ |
✓ |
partial |
|
CVSS v4.0 severity scoring |
✓ |
✗ |
partial |
partial |
✓ |
✓ |
|
Extensible rule engine |
YAML |
Bash |
SCAP XML |
SCAP-based |
proprietary |
proprietary |
|
MTTD |
< 10 s |
hours |
hours |
seconds |
minutes |
minutes |
The proposed system occupies a niche not fully covered by any of the above: open-source and freely modifiable (versus Qualys and Rapid7), agentless with no installation on target hosts (versus the agents required by Wazuh, Rapid7, and Qualys), with event-driven real-time detection through Linux inotify (versus all three, which depend on agent-side schedulers or external scan intervals), and with a human-readable 8-line YAML rule schema lowering the entry barrier for custom checks. The trade-off is narrower scope: the proposed system focuses on configuration auditing only, whereas Wazuh, InsightVM, and Qualys VMDR provide broader security platforms covering vulnerability scanning, log management, and incident response.
Materials and Methods
The research follows the Design Science Research (DSR) methodology [5]. The rule base uses CIS Benchmarks v8 [6] for Ubuntu Linux 22.04 — selected over DISA STIGs (Defense Information Systems Agency Security Technical Implementation Guides) for public availability and widespread adoption. 40 checks were implemented across seven security categories using a YAML rule schema with six required fields: CIS identifier, title, severity, CVSS v4.0 score [7], check specification, and remediation instruction. The YAML format reduces the effort to add a new check from 60+ lines of Security Content Automation Protocol (SCAP) XML to 8 lines, making the rule base accessible without SCAP expertise.
Core implementation stack: Python 3.9+, paramiko (SSH scanning), watchdog (inotify monitoring), Flask (web dashboard), sqlite3 (findings storage). Experimental evaluation used two Ubuntu 22.04 Docker containers configured with deliberate CIS violations; the capability gap motivating this work is summarised in Table 1 (Related Work).
Reproducibility
The prototype is implemented on a minimal Python stack: Python 3.9 or later as the runtime; paramiko (version 3.4 or later) for Secure Shell (SSH) client functionality; watchdog (version 4.0 or later) as a cross-platform wrapper around Linux inotify; Flask (version 3.0 or later) as the web framework for the centralized dashboard; and the sqlite3 module from the Python standard library for findings storage. The scanner host runs Ubuntu 22.04 LTS or Debian 12; monitored hosts require only an OpenSSH server and no installed agent.
A complete rule definition occupies eight lines of YAML. Listing 1 shows the rule enforcing CIS Benchmark control 5.2.7 (disallow SSH root login).
id: CIS-5.2.7
title: Disable SSH root login
severity: CRITICAL
cvss_v4: 8.6
check:
file: /etc/ssh/sshd_config
parameter: PermitRootLogin
expected: “no”
remediation: Set PermitRootLogin to no and reload sshd.
Listing 1. YAML rule definition for CIS Benchmark control 5.2.7
The hybrid monitoring loop combines event-driven detection through inotify with a scheduled-scan safety net. Listing 2 presents the procedure executed by the scanner host for each monitored target.
on inotify event (path P modified on target T):
old_content := cached_config[T][P]
new_content := ssh_read(T, P)
diff := parameter_diff(old_content, new_content)
if diff is empty: return // noise event
findings := run_rules(rules_for(P), new_content)
update_database(T, P, findings, diff)
if max_severity(findings) == CRITICAL:
send_alert_immediate(T, diff, findings)
else:
enqueue_daily_digest(T, diff, findings)
cached_config[T][P] := new_content
every scheduled_interval (default 24 hours):
for each target T in servers.yaml:
for each rule R in full_rule_set:
update_database(T, R, evaluate(R, T))
Listing 2. Hybrid monitoring loop pseudocode (inotify-driven event handler and scheduled scan)
The experimental topology is defined in a single docker-compose.yml file with three services: the scanner host (image: python:3.9-slim with the prototype mounted as a volume), Server 1 — Web (image: ubuntu:22.04 with deliberate SSH and firewall misconfigurations), and Server 2 — DB (image: ubuntu:22.04 with deliberate file permission and user account misconfigurations). All three services share an isolated Docker bridge network; the scanner host has SSH key-based access to both targets through dedicated audit user accounts with restricted sudoers entries. The complete YAML rule set covering all 40 implemented checks with their CIS identifiers, the docker-compose topology, and the prototype source code are available from the corresponding author upon reasonable request.
Results and Discussion
Table 2 shows the distribution of 40 implemented checks. CVSS v4.0 base scores for each check were derived by applying the CVSS v4.0 base metric calculator [7] to the specific vulnerability characteristics of each CIS control: Attack Vector (Network/Local), Attack Complexity, Privileges Required, User Interaction, and Confidentiality/Integrity/Availability impact. The average CVSS v4.0 score across all rules is 7.3. User Account Security (avg. 8.9) and Unnecessary Services (avg. 8.3) contain the highest-severity checks, reflecting their direct exposure to privilege escalation and lateral movement threats.
Table 2.
Distribution of CIS Benchmark checks across security categories
|
Security Category |
Rules |
CVSS Avg. |
Key Checks |
|
SSH Configuration |
8 |
7.8 |
PermitRootLogin, MaxAuthTries, X11Forwarding |
|
File Permissions |
5 |
8.2 |
/etc/shadow (640), /etc/passwd (644) |
|
User Account Security |
6 |
8.9 |
Empty passwords, UID 0, NOPASSWD sudo |
|
Firewall & Network |
4 |
6.4 |
ufw active, iptables DROP policy |
|
Kernel Parameters |
8 |
5.3 |
IP forwarding, ICMP redirects |
|
Logging & Audit |
5 |
6.2 |
auditd enabled, rsyslog installed |
|
Unnecessary Services |
4 |
8.3 |
telnet, rsh, xinetd, FTP absent |
|
TOTAL |
40 |
7.3 |
|
Abbreviations used in Table 2: UID — User Identifier; IP — Internet Protocol; ICMP — Internet Control Message Protocol; FTP — File Transfer Protocol.
Figure 1 illustrates the three-layer architecture. The Data Layer collects configuration data via SSH (paramiko) and stores findings in SQLite. The Engine Layer evaluates all 40 rules, assigns CVSS severity, and computes parameter-level diffs. The Presentation Layer delivers results via Flask dashboard and notification dispatcher. Adding a new server requires one entry in servers.yaml — no software installation on the target host.
/Ait.files/image001.png)
Figure 1. Three-Layer Architecture of the Proposed System
The hybrid monitoring model combines Linux inotify (primary) and scheduled scans (secondary). When inotify detects a file change, the system reads the updated file via SSH, computes a parameter-level diff (old value → new value), runs a full scan, and routes a notification by severity. Figure 2 shows the MTTD comparison. The MTTD values for Lynis and OpenSCAP in Figure 2 represent execution as a scheduled job at the minimum practical interval (one hour), consistent with official documentation [3, 4] and common deployment practice; the proposed system value of 8 seconds was measured experimentally as described in the preceding paragraph. The proposed system achieves under 10 seconds — this is not a quantitative improvement over a slower scheduled scan: Lynis and OpenSCAP are batch processes that cannot provide event-driven detection regardless of how frequently they are scheduled.
/Ait.files/image002.png)
Figure 2. Mean Time to Detect (MTTD) Comparison Across Security Auditing Approaches
Experimental evaluation: Server 1 (Web) scored 47.5% compliance (19/40 passed); Server 2 (DB) scored 50.0% (20/40). After progressive remediation on Server 2, the system correctly recorded each fixed check as passing — demonstrating bidirectional change tracking. The real-time test recorded MTTD of 8 seconds: file modification to Telegram alert, including SSH read, diff computation, full scan, and notification dispatch. This represents two orders of magnitude shorter detection latency than the typical scheduled-scan interval of batch tools (hours). CRITICAL findings were delivered immediately; MEDIUM/LOW were batched into daily digests.
Performance Evaluation
A formal quantitative benchmark of the prototype under production load was beyond the scope of this study; only the MTTD metric was measured experimentally. The architectural performance bounds, however, can be characterized analytically and inform future evaluation. The duration of a single full scan of one monitored target is bounded by one SSH session establishment, N file read operations over SSH (where N is the number of configuration files referenced by active rules; N ≤ 12 in the present rule set), and the local rule evaluation time, which is dominated by string and integer comparisons against in-memory YAML rules. Memory consumption is bounded by the Python interpreter footprint plus the loaded rule set and a per-target configuration cache; paramiko sessions are short-lived and released after each read. CPU usage is dominated by SSH cryptographic operations during file reads and is negligible during idle inotify wait. Network traffic per scan is bounded by the cumulative size of the configuration files read, which remains below 100 KB per target per full scan for the prototype rule set. Dashboard rendering latency is bounded by a single SQLite read on the findings table plus Flask template rendering. Quantitative production-load measurements — including 95th-percentile scan duration, peak memory under N ≥ 100 monitored targets, and dashboard concurrency limits — are deferred to future work in a real-world deployment.
False Positive Analysis
The CIS Benchmark rules implemented in the prototype are deterministic boolean checks on configuration file parameters: each rule reads a single named parameter from a single named file and compares its value against an expected literal or numeric range. This determinism eliminates the class of probabilistic false positives that affect heuristic and signature-based scanners. In the controlled experimental evaluation, all 40 pre-configured CIS violations were detected (true positives = 40, false negatives = 0), and no compliant configuration was incorrectly flagged as a violation (false positives = 0). The 40/40 detection rate is a direct property of rule construction rather than a measure of generalization. In production environments, legitimate organizational policies may deviate from CIS defaults — for example, a designated service account with NOPASSWD sudo for automation purposes is a CIS violation by definition but may be a deliberate operational decision. The proposed system supports this case through per-rule and per-host exception entries in servers.yaml, suppressing the finding while logging the exception for audit traceability. The long-term ratio of true violations to legitimately suppressed findings in real production deployments is an open question for future evaluation.
Security Considerations
A centralized audit system introduces a new component that, if compromised, would grant an attacker visibility into the security posture of every monitored host. The prototype incorporates the following mitigations. Authentication to monitored hosts uses SSH public-key cryptography exclusively; password-based authentication is disabled. Each monitored host carries a dedicated audit user whose sudoers entry permits only the specific read commands required by the active rule set (for example, cat /etc/ssh/sshd_config, iptables -L, auditctl -l), denying both write operations and unrelated read operations. All SSH operations performed by the scanner are read-only with respect to the target host’s file system and service state. Credentials and SSH private keys on the scanner host are stored in an encrypted vault file accessible only to the scanner process at runtime. The Flask dashboard requires authenticated session access, and the scanner host is expected to be deployed inside an internal management network without direct external exposure. For production deployments, integration with HashiCorp Vault is planned to replace static SSH keys with short-lived ephemeral certificates, reducing the impact of scanner-host compromise.
Limitations
This study has several limitations that should be addressed in subsequent work. First, the experimental evaluation was conducted in a Docker-based environment with two Ubuntu 22.04 containers; this validates the correctness of the rule engine, the monitoring loop, and notification dispatch, but does not demonstrate behavior under production workloads, network latency variability, or fleets larger than two hosts. Second, the rule base covers 40 of the more than 200 controls defined in the full CIS Benchmark v8 for Ubuntu Linux; coverage gaps remain in advanced auditing, AppArmor and SELinux mandatory access control configuration, and disk-encryption-related controls. Third, quantitative measurements of scan duration, memory footprint, network bandwidth, and dashboard concurrency under realistic load were not collected and are required before production deployment recommendations can be made. Fourth, the prototype was tested on a single Linux distribution; portability to Red Hat Enterprise Linux (RHEL) and CentOS-family systems requires distribution-specific rule variants. Finally, long-running stability under sustained inotify event streams and the behavior of the system under partial network partitions between scanner and targets have not been evaluated.
Conclusion
This paper presented the design and prototype implementation of an automated, agentless system for auditing and monitoring Linux server security configurations on production-grade Linux server fleets. The prototype demonstrates a combination of properties not concurrently supported by the surveyed open-source tools: (1) agentless SSH multi-server monitoring with inotify-based real-time detection (MTTD < 10 sec), parameter-level diff reporting, CVSS v4.0 severity scoring, and integrated notifications; (2) a YAML-based rule engine reducing new-check authoring from 60+ lines of SCAP XML to 8 lines of YAML; (3) a horizontally scalable agentless architecture deployable through the docker-compose tool. Production deployment on real server fleets requires the work outlined in the Limitations section: validation under production workloads, expansion of rule coverage to the full CIS Benchmark (200+ checks), distribution-specific rule variants for Red Hat Enterprise Linux (RHEL) and CentOS-family systems, quantitative performance measurements, and integration with HashiCorp Vault for ephemeral SSH certificate management.
References:
- Verizon. Data Breach Investigations Report 2023. // Verizon Communications. — 2023. — 87 p.
- IBM Security. Cost of a Data Breach Report 2023. // IBM Corporation. — 2023. — 64 p.
- CISOfy. Lynis — Open Source Security Auditing Tool. — URL: https://cisofy.com/lynis/ (accessed: 01.05.2026).
- Red Hat. OpenSCAP Security Compliance. — URL: https://www.open-scap.org/ (accessed: 01.05.2026).
- Hevner A.R., March S.T., Park J., Ram S. Design Science in IS Research // MIS Quarterly. — 2004. — Vol. 28, No. 1. — P. 75-105.
- Center for Internet Security. CIS Benchmarks for Ubuntu Linux 22.04 LTS, v1.0.0. // CIS. — 2022. — URL: https://www.cisecurity.org/benchmark/ubuntu_linux (accessed: 01.05.2026).
- FIRST. Common Vulnerability Scoring System v4.0: Specification Document. // Forum of Incident Response and Security Teams. — 2023. — URL: https://www.first.org/cvss/v4-0/specification-document (accessed: 01.05.2026).
- Bringhenti D. et al. Automated Security Configuration Analysis // ACM Computing Surveys. — 2023. — Vol. 55, No. 14.
- Wunder M. et al. Automated Configuration Analysis // IEEE Security & Privacy. — 2024. — Vol. 22, No. 2. — P. 45-57.
- NIST SP 800-128: Guide for Security-Focused Configuration Management. // NIST. — 2019. — 57 p.
- Wazuh. Wazuh — The Open Source Security Platform. — URL: https://wazuh.com/ (accessed: 01.05.2026).
- Qualys. Qualys VMDR & Policy Compliance. — URL: https://www.qualys.com/ (accessed: 01.05.2026).
- Rapid7. InsightVM — Vulnerability Management Software. — URL: https://www.rapid7.com/products/insightvm/ (accessed: 01.05.2026).
- W3Techs. Usage Statistics of Operating Systems for Websites. — URL: https://w3techs.com/technologies/overview/operating_system (accessed: 01.05.2026).