МЕТОДЫ УМЕНЬШЕНИЯ СЕТЕВЫХ ЗАДЕРЖЕК В LINUX-СИСТЕМАХ

This article is available in Russian only.
Цитировать:
Otkidach I. METHODS FOR REDUCING NETWORK LATENCY IN LINUX SYSTEMS // Universum: технические науки : электрон. научн. журн. 2026. 5(146). URL: https://7universum.com/en/tech/archive/item/22760 (дата обращения: 29.05.2026).
Прочитать статью:
Статья поступила в редакцию: 01.05.2026
Принята к публикации: 13.05.2026
Опубликована: 28.05.2026

 

УДК 004.7

ABSTRACT

This article examines engineering approaches to reducing network latency in Linux systems. The architecture of the Linux operating system network stack is considered, and the main stages of packet processing are analyzed, including the interaction between network interfaces, device drivers, queues, interrupt mechanisms, and kernel protocol layers. Special attention is paid to the sources of latency that arise during data transfer between kernel space and user space, as well as to the effect of buffering, data copying, and context switching on the performance of network applications. Modern mechanisms for optimizing the Linux networking subsystem are considered, with a focus on shortening the packet processing path and reducing system overhead. Programmable packet-handling technologies at the kernel level are explored, including tools such as extended Berkeley Packet Filter and eXpress Data Path, which enable early processing of network traffic. Architectures for kernel-bypass packet processing built on the Data Plane Development Kit and designed for high-performance network traffic processing in user space are also examined. The approaches are compared in terms of their architectural features, areas of application, and impact on network latency.

АННОТАЦИЯ

В данной статье исследуются инженерные методы уменьшения сетевых задержек в Linux-системах. Рассматривается архитектура сетевого стека операционной системы Linux и анализируются основные этапы обработки пакетов, включая взаимодействие сетевых интерфейсов, драйверов устройств, очередей, механизмов прерываний и протокольных уровней ядра. Отдельное внимание уделяется источникам задержек, возникающим при передаче данных между пространством ядра и пользовательским пространством, а также влиянию буферизации, копирования данных и переключения контекста на производительность сетевых приложений. Рассматриваются современные механизмы оптимизации сетевой подсистемы Linux, направленные на сокращение пути обработки пакетов и снижение системных накладных расходов. Исследуются программируемые технологии работы с пакетами на уровне ядра, включая такие инструменты как extended Berkeley Packet Filter и eXpress Data Path, обеспечивающие раннюю обработку сетевого трафика. Также изучаются архитектуры обработки пакетов с обходом ядра, построенные на базе Data Plane Development Kit и ориентированные на высокопроизводительную обработку сетевого трафика в пользовательском пространстве. Выполнено сопоставление указанных подходов с точки зрения их архитектурных особенностей, области применения и влияния на задержки сетевого взаимодействия.

 

Keywords: Linux network stack, network latency reduction, extended Berkeley Packet Filter, eXpress Data Path, Data Plane Development Kit, kernel bypass networking, packet processing.

Ключевые слова: сетевой стек Linux, снижение сетевых задержек, extended Berkeley Packet Filter, eXpress Data Path, Data Plane Development Kit, обработка сетевых пакетов.

 

Introduction

Contemporary computing systems increasingly depend on the efficiency of network interaction. The growth of cloud infrastructures, distributed computing environments, microservice architectures, and big data processing systems puts more pressure on the network throughput and responsiveness. One of the main indicators that affects the efficiency of such systems is network latency. It determines the time required to transfer data between components of a distributed system.

The Linux operating system is widely used as a base platform for server and network infrastructures due to its flexibility, high scalability, and well-developed networking subsystem. The Linux network stack implements a universal architecture for processing network packets and supports a wide range of protocols and network devices. However, the traditional packet processing model in the network stack assumes that data passes through several layers of software processing, including network interface drivers, protocol layers, and mechanisms for interaction with user space. The presence of many processing stages can lead to higher system overhead and increased latency when processing network traffic. The goal of this research is to analyze methods for reducing network latency in Linux systems through optimization of the operating system network stack. To achieve this goal, the study focuses on describing the Linux network stack, identifying the main sources of latency, examining kernel-level optimization mechanisms such as extended Berkeley Packet Filter and eXpress Data Path, and considering kernel-bypass packet processing based on the Data Plane Development Kit.

Materials and methods

The methodological basis of the study consists of a review and comparative analysis of recent scientific publications, technical reports, and documentation related to Linux networking, packet processing, and low-latency network technologies. The research focuses on the structure of the Linux network stack, the path of packet processing, and the mechanisms that may increase or reduce network latency.

To compare different approaches to latency reduction, the study applies a comparative assessment of kernel-level packet processing technologies and user-space networking frameworks. The considered methods include extended Berkeley Packet Filter, eXpress Data Path, and Data Plane Development Kit. These technologies are examined in terms of their architectural features, packet processing stage, interaction with the standard Linux network stack, and applicability to high-performance network systems.

Results

Architecture and performance characteristics of the Linux network stack

Such areas as distributed computing, high-frequency trading, telecommunication systems, cloud infrastructures, and real-time systems require minimal response time, high throughput, and predictable network operations. Because of this, special attention is given to the architecture of the operating system network stack and the mechanisms used to optimize it. In most server infrastructures, Linux is used as the base operating system. However, the universality and flexibility of its architecture are inevitably accompanied by additional overhead during packet processing.

The Linux network stack is a multi-layer system for packet processing implemented inside the operating system kernel. Its architecture is based on the classical Transmission Control Protocol/Internet Protocol (TCP/IP) model and includes several sequential stages of data processing, starting from interaction with the network interface and ending with delivering data to the user application. At the transport layer, communication between applications is enabled by the TCP and the User Datagram Protocol (UDP) (fig. 1).

 

Figure 1. Simplified overview of the Linux networking stack [1]

 

When a network packet arrives at the interface, it is first processed by the network interface card driver (Network Interface Card, NIC). At this exact stage, the hardware interacts with the operating system, involving interrupt handling, packet distribution across queues, and transferring data to the kernel networking subsystem. After these actions, the packet passes through the kernel network interface layer, where initial processing and preparation of the data for further routing take place [2].

Subsequently, the packet is routed through the protocol layers of the network stack, where operations related to network and transport protocol processing are carried out. This involves verifying the headers of the packet and processing various protocols such as IP, TCP, and UDP. Additionally, connection management, error management, and buffering are implemented. After these processes are completed, the data is then transferred from kernel space to user space, where it is received by the network application. This packet processing path includes several layers of software logic and involves operations such as copying data between buffers, context switching between user and kernel modes, and synchronization between threads and Central Processing Unit (CPU) cores. Each of these operations contributes to the overall latency of network packet processing.

One of the key factors that affects the performance of the Linux network stack is the interrupt handling mechanism. Traditionally, network cards use hardware interrupts to notify the operating system about the arrival of new packets. However, at high data transmission speeds, a large number of interrupts can create a significant load on the processor. To reduce this effect, Linux uses several optimization mechanisms, including interrupt coalescing and the NAPI (New API) technology [3].

Another salient aspect of the Linux network stack architecture is the buffering and packet queue system. As packets move through the network stack, they may wait in driver-level, kernel, and application socket buffers. Each of these waiting points can become a source of additional latency, notably under conditions of heavy network load.

At the same time, the buffering mechanism has two contradictory roles. On the one hand, it helps to smooth out short-term variations in network traffic and prevent packet loss. Conversely, it results in an increase in the waiting time perceived by the packets that are waiting. Hence, the buffering parameters and queueing algorithm appear to hold a major place in determining the performance of the networking subsystem.

Thus, the architecture of the Linux network stack plays a key role in determining the performance characteristics of network applications. The multi-layer structure of packet processing provides flexibility and universality of the system, but at the same time it creates additional overhead that can lead to increased latency under high load conditions. For this reason, modern research focuses on developing methods for optimizing the network stack, including programmable packet processing mechanisms inside the kernel, early network traffic filtering, and the use of kernel bypass technologies.

Methods for reducing network latency at the Linux kernel level

As the performance requirements for network applications increase and the bandwidth of network interfaces continues to grow, it becomes clear that the traditional architecture of the Linux network stack cannot always provide minimal packet processing latency. To reduce these overheads, new mechanisms for programmable network traffic processing have been developed in recent years.

These mechanisms make it possible to perform operations on packets at earlier stages of their processing. Among the most significant technologies are eBPF (extended Berkeley Packet Filter) and XDP (eXpress Data Path), which allow flexible and high-performance packet processing directly inside the Linux kernel [4].

The eBPF technology represents a mechanism for enabling the secure execution of programs provided by users in the operating system kernel. Unlike traditional kernel modules, where compilation and privileged loading into the kernel are required, eBPF provides a virtual machine for the execution of specialized programs after a security verification process. Before the execution of these programs, an intrinsic kernel verifier is used to examine them and verify the correctness of the operations performed. It also ensures that there are no potentially dangerous actions and enforces memory access restrictions. On account of this, eBPF makes it possible to extend the functionality of the kernel without modifying its source code while significantly reducing the risk to system stability (fig. 2).

 

Figure 2. Architecture of the classic BPF virtual machine and the extended eBPF virtual machine [5]

 

In the Linux networking subsystem, eBPF can be used for different tasks, including packet filtering, load balancing, monitoring network activity, and traffic analysis. eBPF programs can be attached to different points of the network stack, such as socket handlers, network interface entry points, or traffic control mechanisms.

One of the most important advantages of eBPF is the ability to programmatically control packet processing at early stages of the network stack [6]. This makes it possible to implement different optimization mechanisms, including early traffic filtering and packet redirection before the packet goes through the full network stack.

Despite the wide capabilities of eBPF, the greatest reduction in packet processing latency is achieved when using the XDP technology. XDP is a high-performance packet processing mechanism that allows eBPF programs to run directly at the level of the network interface driver. Unlike traditional packet processing in the network stack, XDP can operate at the earliest stages of processing incoming packets, in some cases immediately after the data is received from the network card. This means that a packet can be processed or dropped before kernel structures used in the standard network stack are allocated, which significantly reduces processing overhead.

The fundamental idea behind XDP is that it aims to reduce the number of operations performed on a network packet before determining the action taken with that packet. A packet in XDP may be dropped, sent out a different interface, dispatched to a user-space application, or forwarded to the conventional Linux networking stack (table 1).

Table 1

Main XDP actions in packet processing [7]

XDP Action

Description

Main purpose

XDP_DROP

The packet is immediately discarded by the XDP program.

Used to filter unwanted or malicious traffic at the earliest stage.

XDP_PASS

The packet is passed to the standard Linux network stack.

Allows normal processing by the kernel networking subsystem.

XDP_TX

The packet is transmitted back through the same network interface.

Used for simple forwarding or reflection mechanisms.

XDP_REDIRECT

The packet is redirected to another network interface, CPU, or socket.

Commonly used for load balancing or traffic distribution.

 

Because of this approach, XDP can be used to build high-performance systems for traffic filtering, load balancing, and protection against network attacks such as distributed denial-of-service (DDoS) attacks. From an architectural point of view, XDP can operate in different modes, and in some cases eBPF programs run directly in the context of the network interface driver.

This makes it possible to avoid many costly operations that are typical for the traditional network stack, including allocation of sk_buff structures and processing of protocol layers. In addition, XDP supports several operating modes, including driver mode (native mode), generic mode, and hardware offload mode.

Programmable packet processing mechanisms implemented in eBPF and XDP represent an important direction in the development of the Linux networking subsystem. These technologies make it possible to move packet processing to earlier stages of the network path, reducing the number of operations performed in the traditional layers of the network stack. As a result, packet processing time is reduced, system throughput increases, and latency variability decreases.

User-space networking frameworks for ultra-low latency: architecture and application of DPDK

One of the most radical approaches to reducing network latency in Linux systems is the use of packet processing architectures that bypass the operating system kernel (kernel bypass). In the traditional network interaction model, a network packet passes through several layers of the kernel network stack, including the network interface driver, packet management subsystem, network and transport protocol layers, as well as buffering mechanisms and interaction with user space.

Each of these stages adds certain overhead related to interrupt handling, memory structure allocation, data copying, and context switching between kernel mode and user mode [8]. In high-speed networks, such overhead becomes a significant factor that limits the performance of network applications and increases data transmission latency. Because of this, in recent years software architectures have been actively developed that allow packet processing directly in user space, bypassing the standard operating system network stack.

One of the most widely used solutions in this area is the Data Plane Development Kit (DPDK), which is a set of libraries and drivers for high-performance packet processing in user space. It was developed by Intel and represents a software platform designed for building network applications with extremely high throughput and very low latency requirements. The main principle of DPDK is that the application gets direct access to the network interface and performs packet processing itself using specialized user-space drivers [9].

A defining aspect of the DPDK architecture is the use of a polling model for network interfaces instead of the conventional interrupt mechanism. With the standard Linux approach, packet arrival is signaled by a hardware interrupt from the network card. The CPU handles this event and then continues execution of the user program. At high traffic rates, this creates extra CPU load and may increase latency because such switches happen too often.

An important element of the DPDK architecture is the use of the Direct Memory Access (DMA) mechanism, which allows the network interface to write packets directly into the memory of the user application [10]. To improve performance, DPDK also uses large memory pages (hugepages), which reduce memory management overhead and improve CPU cache efficiency.

Another important aspect of the DPDK architecture is the efficient use of multi-core processors. In contemporary server systems, network traffic processing can be distributed across several CPU cores, which substantially increases overall operational functionality. DPDK provides mechanisms for binding packet processing threads to specific CPU cores and managing the distribution of network traffic between them. The main architectural components of the DPDK framework and their role in reducing network latency are presented in table 2.

Table 2

Architectural components of the DPDK framework and their role in latency reduction

Component

Purpose

Role in latency reduction

Poll Mode Drivers (PMD)

User-space drivers for direct interaction with network interface cards.

Eliminate interrupt overhead and reduce packet processing latency.

Memory Manager (Hugepages)

Manages large contiguous memory regions for packet buffers.

Reduces memory management overhead and improves CPU cache efficiency.

Ring buffers

Queues used for packet transfer between processing threads.

Enable high-speed packet exchange between cores without locks.

Mempool

Pool of preallocated packet buffers.

Avoids dynamic memory allocation during packet processing.

Environment Abstraction Layer (EAL)

Initializes hardware resources such as CPU cores, memory and NICs.

Ensures efficient hardware utilization and low-level optimization.

 

Practical studies show that the use of DPDK can significantly increase the throughput of network systems and reduce packet processing latency compared to the traditional Linux network stack. Experimental works demonstrate the ability to process tens of millions of packets per second on a single server when using optimized network applications based on DPDK. Such performance makes this technology especially attractive for building software routers, network function virtualization (NFV) systems, load balancing systems, and high-performance proxy servers [11, 12].

Thus, packet processing architectures that bypass the kernel represent an important direction in the development of high-performance networking systems. The use of the DPDK framework makes it possible to significantly shorten the packet processing path by eliminating many operations of the traditional Linux network stack. Due to direct access to the network interface, the use of the polling model, and efficient memory management, such systems can provide extremely high throughput and minimal network packet processing latency.

Discussion

The main architectural factors that influence network latency in Linux systems were identified. The traditional Linux network stack provides flexibility and compatibility with a wide range of protocols and applications, but its multi-layer structure creates additional overhead during packet processing. The analysis shows that latency can be reduced by shortening the packet processing path and moving part of the processing to earlier stages of the network stack.

Kernel-level technologies such as extended Berkeley Packet Filter and eXpress Data Path provide programmable mechanisms for early packet processing and can reduce the load on the standard networking subsystem. Meanwhile, user-space packet processing frameworks, such as DPDK, help to bypass the kernel network stack and achieve higher throughput in systems with strict latency necessities. The comparison of these approaches shows that the choice of optimization method depends on the required balance between performance, compatibility with existing applications, and also implementation complexity.

Conclusion

The architecture of the Linux network stack plays a key role in shaping the performance characteristics of network applications. The multi-layer packet processing structure provides flexibility and universality of the networking subsystem, but at the same time it introduces additional overhead related to interrupt handling, packet buffering, context switching, and memory management. In high-speed networks, these factors become a significant source of latency and response time variability, which requires the use of specialized optimization methods at the operating system level.

Modern approaches to improving the efficiency of network systems include the use of programmable packet processing mechanisms inside the kernel and architectures that reduce the path of network data processing. Technologies such as eBPF and XDP make it possible to perform operations on packets at early stages of the network stack, reducing the load on traditional layers of network processing. Kernel bypass architectures implemented through the DPDK framework allow packet processing directly in user space, which significantly reduces processing latency. The development of such technologies contributes to improving the performance of network applications and forms the basis for further advancement of high-load network infrastructures.

 

References:

  1. Engine: Flexible research infrastructure for reliable and scalable time sensitive networks / F. Rezabek, M. Bosk, T. Paul [et al.] // Journal of Network and Systems Management. – 2022. – Vol. 30(4), № 74. – DOI: 10.1007/s10922-022-09686-0. – EDN: CBHCHG.
  2. Kapoor R., Anastasiu D.C., Choi S. ML-NIC: accelerating machine learning inference using smart network interface cards // Frontiers in Computer Science. – 2025. – Vol. 6. – Art. 1493399.
  3. Moving down the stack: performance evaluation of packet processing technologies for stateful firewalls / K. Dietz, N. Gray, M. Wolz [et al.] // 2023 IEEE/IFIP Network Operations and Management Symposium (InNOMS 2023). – 2023. – P. 1-7.
  4. Efficient network monitoring applications in the kernel with eBPF and XDP / M. Abranches, O. Michel, E. Keller, S. Schmid // 2021 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN). – 2021. – P. 28-34.
  5. Fast packet processing with eBPF and XDP: concepts, code, challenges, and applications / M.A. Vieira, M.S. Castanho, R.D. Pacífico [et al.] // ACM Computing Surveys. – 2020. – Vol. 53, № 1. – P. 1-36.
  6. Holik F., Cook M.M., Pezaros D. Resilient Network Architecture with eBPF-based Programmability and Centralised Orchestration // InIEEE INFOCOM 2025-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). – 2025. – P. 1-6. IEEE.
  7. A high-performance IPv6 fragment evasion threat detection method based on eBPF and XDP / B. Lin, L. Zhang, Y. Guo [et al.] // 2024 IEEE International Conference on High Performance Computing and Communications (HPCC). – 2024. – P. 1484-1491.
  8. Belkhiri A., Pepin M., Bly M., Dagenais M. Performance analysis of DPDK-based applications through tracing // Journal of Parallel and Distributed Computing. – 2023. – Vol. 173. – P. 1-9. – DOI: 10.1016/j.jpdc.2022.10.012. – EDN: KKLQZD
  9. User-level network programmability: a scalability study for data center infrastructure / P.L. Izolan, I.M. Júnior, E.S. Oribes [et al.] // 2024 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW). – 2024. – P. 42-49.
  10. Li D., Zhang W., Dong M., Ota K. Dma-assisted i/o for persistent memory // IEEE Transactions on Parallel and Distributed Systems. – 2024. – Vol. 35, № 5. – P. 829-43. – DOI: 10.1109/tpds.2024.3373003. – EDN: RZYEVR
  11. Performance analysis of DPDK-based applications through tracing / A. Belkhiri, M. Pepin, M. Bly, M. Dagenais // Journal of Parallel and Distributed Computing. – 2023. – Vol. 173. – P. 1-9. – DOI: 10.1016/j.jpdc.2022.10.012. – EDN: KKLQZD.
  12. Hierarchical planning for dynamic resource allocation in smart and connected communities / G. Pettet, A. Mukhopadhyay, M.J. Kochenderfer, A. Dubey // ACM Transactions on Cyber-Physical Systems. – 2022. – Vol. 6, № 4. – P. 1-26.
Информация об авторах

инженер DevOps, ООО «ВБ ТЕХ», РФ, Коледино

ISSN 2311-5122. Article metadata is hosted on the eLIBRARY.RU platform.
Publisher — LLC «MCNO»
Editor-in-Chief - Marina Yu. Zvezdina.
Top