RISK-BASED RELIABILITY MANAGEMENT OF ENGINEERING SYSTEMS IN OIL AND GAS PRODUCTION BASED ON THE DISPATCHING OF ALARM SIGNALS AND EVENTS IN BMS SYSTEMS

РИСК-ОРИЕНТИРОВАННОЕ УПРАВЛЕНИЕ НАДЕЖНОСТЬЮ ИНЖЕНЕРНЫХ СИСТЕМ НЕФТЕГАЗОВОГО ПРОИЗВОДСТВА НА ОСНОВЕ ДИСПЕТЧЕРИЗАЦИИ АВАРИЙНЫХ СИГНАЛОВ И СОБЫТИЙ В СИСТЕМАХ BMS
Kirgizbayev D.
Цитировать:
Kirgizbayev D. RISK-BASED RELIABILITY MANAGEMENT OF ENGINEERING SYSTEMS IN OIL AND GAS PRODUCTION BASED ON THE DISPATCHING OF ALARM SIGNALS AND EVENTS IN BMS SYSTEMS // Universum: технические науки : электрон. научн. журн. 2026. 3(144). URL: https://7universum.com/ru/tech/archive/item/22217 (дата обращения: 28.03.2026).
Прочитать статью:
DOI - 10.32743/UniTech.2026.144.3.22217

 

ABSTRACT

The article examines the methodological aspects of risk-based reliability management of engineering systems in oil and gas production based on the dispatching of alarm signals and events within Building Management Systems (BMS). The relevance of shifting from traditional alarm prioritization to the dynamic formation of an event risk profile, taking into account the probability of failure and the severity of consequences, is substantiated. A model of risk-based dispatching is developed, highlighting the role of the operator and digital decision-support tools within the system. A methodology for assessing the impact of dispatching on the reliability and controllability of engineering systems is proposed. It is shown that the implementation of a risk-based approach reduces information overload, improves the validity of decision-making, and ensures more stable operation of engineering systems under conditions of digital transformation.

АННОТАЦИЯ

В статье рассматриваются методические аспекты риск-ориентированного управления надежностью инженерных систем нефтегазового производства на основе диспетчеризации аварийных сигналов и событий в системах BMS (от англ. Building Management Systems). Обоснована актуальность перехода от традиционной приоритизации тревог к динамическому формированию риск-профиля события с учетом вероятности отказа и тяжести последствий. Разработана модель риск-ориентированной диспетчеризации, с выделением в ней роли оператора и цифровых инструментов поддержки решений. Предложена методика оценки влияния диспетчеризации на надежность и управляемость инженерных систем. Показано, что применение риск-ориентированного подхода позволяет уменьшить информационную перегрузку, повысить обоснованность принимаемых решений и обеспечить более стабильное функционирование инженерных систем в условиях цифровизации.

 

Ключевые слова: риск-ориентированное управление, надежность инженерных систем, BMS, аварийная сигнализация, диспетчеризация событий, управление рисками, цифровые инструменты.

Keywords: risk-based management, reliability of engineering systems, BMS, alarm management, event dispatching, risk management, digital tools.

 

Introduction. A prevailing trend in the development of modern engineering systems in oil and gas production (including, for example, energy supply, ventilation and air-conditioning, fire protection, technological monitoring subsystems, and infrastructure subsystems) is the increasing interconnectedness and overall system complexity. This trend is driven by the growing number of sensors, automation tools, and interconnections between subsystems, as well as by the expanding share of software- and algorithm-based solutions used in system control. Consequently, reliability becomes an essential technical and operational characteristic of equipment. In this context, reliability manifests itself as a property of controllability, that is, the ability of the operator and the control system to timely detect deviations, identify significant events, and implement proactive control actions.

Operational practice of digital monitoring systems demonstrates that the primary limitation is the excess of signals associated with information overload, false alarms, and the gradual degradation of operator trust in alarm systems. For example, this issue is noted in studies on data-driven diagnostics, where the authors, using the HVAC domain (Heating, Ventilation, and Air Conditioning) as an example, emphasize that the systemic effects of faults may significantly influence the functioning of engineering systems. At the same time, cyclic diagnostics of system conditions must necessarily include procedures for data collection, cleaning, preprocessing, and interpretation as a unified process; otherwise, the resulting solutions cannot be effectively scaled to real-world operation of engineering systems [3]. A separate body of research devoted to the analysis of false alarms in SCADA systems (Supervisory Control and Data Acquisition) shows that false signals lead to increased downtime and higher maintenance costs; therefore, alarm filtering and contextual recognition of alarms are considered essential elements in ensuring the reliability of engineering systems [10]. From the perspective of Building Management Systems (BMS), which represent an integrated set of software and hardware tools designed for monitoring, automation, and optimization of engineering systems, an additional challenge arises in the form of a gap between static object models and real-time dynamics, since in the absence of reliable integration of telemetry streams into the operational control environment, system management inevitably remains reactive and responses to alarms become delayed [12].

Consequently, the relevance of risk-based dispatching of alarm signals within BMS is determined by a range of factors:

1) the increasing complexity and interdependence of engineering systems;

2) the limitations of traditional reliability approaches focused on periodicity and “average” operating conditions;

3) the phenomenon of alarm overload and the resulting loss of controllability in emergency situations.

The objective of the study is to develop methodological aspects of risk-based reliability management based on events within BMS systems.

Research methodology. The research material consisted of publications addressing risk-based reliability management, alarm management, diagnostics and/or alarm filtering, as well as digital tools used in industrial operations, on the basis of which it was possible to identify:

  • typical alarm-related problems (for example, alarm flood, nuisance alarms, and false alarms) that affect the reliability of control and the safety of engineering systems;
  • conceptual relationships discussed in the literature within the “risk–reliability–failure–event” framework, as well as approaches to risk-based maintenance;
  • methodological solutions for dispatching and event analysis, including those associated with the dynamic updating of the risk profile;
  • the capabilities of digital tools (analytics, diagnostic systems, and digital twins) in supporting operator decision-making.

In preparing the study, theoretical and general scientific methods were employed, including the analysis of scientific literature and the systematization of sources, clarification of concepts, functional and structural modeling, as well as comparative analysis. The application of these methods resulted in the proposed conceptual model of risk-based dispatching of BMS alarm signals and a methodology for assessing its impact on the reliability and controllability of engineering systems, presented in the form of formalized stages and indicators.

Results and discussion. In accordance with the fundamental principles of the risk-based approach, the reliability of an engineering system can be interpreted as its ability to maintain functioning within acceptable limits in the presence of disturbances, including latent defects, sensor errors, and rare combinations of operating conditions [4; 8; 13]. Within the terminology of risk-based management of engineering systems, it is advisable to distinguish several categories (see Table 1):

Table 1.

Elements of risk-based management of engineering systems, compiled by the author

Category

Definition

1

Failure

The realization of a defect and/or malfunction of a component or subsystem that leads to a disruption of its function

2

Emergency event

A recorded deviation that has operational significance (including pre-emergency conditions) and requires a response

3

Risk

A function of the probability of failure (PoF) and the consequence of failure (CoF)

4

Reliability

The resulting capability of the engineering system and its control system to minimize the probability and consequences of failures throughout the lifecycle

 

It should be noted that under conditions of digitalization, the “source of risk” often shifts into the informational domain, since an event may represent either a genuine indicator of system degradation or a false artifact (noise, an outlier, or an incorrect setpoint). Therefore, risk-based reliability management should include not only equipment maintenance but also the management of signal and event quality, ensuring that the decision-support subsystem is not compromised by information overload.

It should be noted that the detection and elimination of reliability issues in engineering systems in their traditional form are becoming impractical at the present stage, since in practice the total number of signals often exceeds the operator’s ability to respond to them and disrupts prioritization. Under such conditions, there arises a need to organize the operation of a dispatching system, the primary function of which is to transform the stream of alarms into a manageable set of events with clear meaning and priority.

The overall operation of the dispatching system can be represented as follows (see Fig. 1):

 

Figure 1. Operation of the dispatching system, compiled by the author

 

Referring to Fig. 1, it should be noted that in contemporary scientific literature, approaches to the study of alarms based on their temporal characteristics and relationships between activations are considered a basis for identifying critical subsystems and downtime scenarios [1]. For oil and gas production, this principle is further reinforced by its connection with functional safety, since diagnostic solutions within the SIS loop (Safety Instrumented System — an instrumented safety system that records threshold shutdowns and includes built-in diagnostics, as well as diagnostics of the logic solver and related components) form part of the framework for demonstrable reliability and contribute to reducing response time under hazardous conditions [2].

Considering BMS as a system that serves as a source of risk-related information, alarm signals, and events, it is important to emphasize that, as noted earlier, the problem of redundancy and the occurrence of false alarms persists [11]. This circumstance arises from the specific characteristics of such systems, as BMS generate signals in response to: (1) setpoint exceedance; (2) loss of communication; (3) sensor failure; (4) deviation beyond permissible operating limits; (5) power supply disturbances; (6) fire protection automation events; (7) hardware faults of controllers; and others. In practice, a single primary defect may generate a cascade of interrelated alarms, while a change in operating mode (startup or shutdown) may additionally produce a surge of irrelevant alarm activations.

In the oil refining sector, alarm flood and nuisance alarms, according to the scientific literature, often lead to operator fatigue, reduced attention to critical alarms, increased operational risks, and decreased reliability of control over the entire engineering system. In this regard, various tools for dynamic alarm management are proposed, including shelving (the temporary postponement or hiding of an active alarm signal), suppression mechanisms, and analytical tools for the proactive identification of emerging problems [9]. From the perspective of BMS operation at oil and gas facilities, risk-related information is almost always present in system logs; however, without the purposeful narrowing of the signal flow, it does not transform into actionable knowledge.

Technically, the potential of logs as a source of risk-related information is revealed through the relationships between the “frequency and/or pattern of events and the probability of failure” and the “event context and the severity of consequences.” In this regard, the experience of risk-based maintenance is illustrative, where indicators such as PoF and CoF (probability of failure and consequence of failure, respectively) are evaluated on the basis of failure models and scenario analysis, and maintenance planning is adjusted to focus on the most vulnerable components. Similar organizational principles can be applied to BMS events as signatures of system degradation [5].

Moreover, the issue of minimizing false alarms remains highly relevant. For example, according to research on SCADA systems, false activations increase operational costs and downtime, while their detection can be achieved through modeling of “normal behavior” and verification of deviations within the context of external factors [10]. In the BMS environment, an analogous role is played by normal operating profiles of the facility (time windows, temperature trajectories, switching scenarios) and algorithms for detecting outliers and/or anomalies. Based on the above considerations, it appears feasible to develop a model of risk-based dispatching of BMS alarm signals (see Fig. 2). The model is presented as a sequence of steps.

 

Figure 2. Model of risk-based dispatching of BMS alarm signals, compiled by the author

 

Step 1. Signal (Alarm/Notification). At this stage, the primary activation is recorded, including the value, setpoint, default priority, source, and channel status. The quality of data transmission and the stability of real-time data integration are critical at this stage. Technological solutions are widely used to ensure high reliability of data delivery and rapid response to critical events. Accordingly, the reliability of dispatching begins with the reliability of data transmission and the linkage of telemetry to the object model [12].

Step 2. Event. Signals are aggregated into events based on temporal correlation, a common source, and causal relationships; that is, event packages are formed (for example, loss of power in a section, loss of communication with controllers, or multiple sensor alarms).

Step 3. Risk profile. For each event, a risk profile is determined: (1) PoF, involving an assessment of the probability of a true failure and/or hazardous deviation (taking into account the frequency of repetitions, persistence, confirmation by other channels, and contextual conditions); and (2) CoF, involving an assessment of consequences (human safety, environmental impact, technological losses, and downtime). A static risk matrix is effective for initial assessment; however, in a BMS environment, the key factor becomes the dynamic updating of risk over time. In this regard, dynamic risk-based inspection is applicable, within which risk is updated according to monitored process parameters, while integration with integrity windows enables early warnings and thereby reduces the probability of sudden failures in engineering systems [7]. Similarly, the risk profile of BMS events should change during transitions between operating modes and when confirming and/or contradicting signals appear.

Step 4. Management response (Response/Action). The result of dispatching is a recommended action, which can be divided into: (1) automatic actions (switching, blocking, transition to a safe state); (2) operator actions (inspection, confirmation, switching, escalation, service request); and (3) analytical actions (transmission of the event to a predictive maintenance system and the generation of a diagnostic task). To minimize false alarms and improve the effectiveness of decision-making, a combined processing approach is advisable, using a model of expected behavior and cooperation between independent detectors and/or agents; according to available data, the coordination of signals reduces the probability of a false activation within a single channel [11].

It should be noted that the role of the operator in the model is fundamental, since dispatching should reduce workload without eliminating responsibility. Therefore, the model incorporates mandatory transparency mechanisms, ensuring that it is clear why an event has received a high risk priority, which factors influenced PoF and CoF, which actions are recommended, and what consequences may arise from delayed response.

Equally promising is the use of second-level digital tools. For example, a digital twin can be considered as an environment in which risk-based dispatching becomes reproducible. A virtual model synchronized with the physical facility may be used to test scenarios, assess consequences, and support decision-making. However, from a practical standpoint, it is important to note that in the oil and gas sector such solutions remain largely local and experimental; their implementation and scaling are not feasible without the development of appropriate standards [6].

Based on the above considerations, an additional tool is proposed in the form of a methodology for assessing the impact of risk-based dispatching on the reliability and controllability of an engineering system. To substantiate the effectiveness of its implementation, it is proposed to introduce an indicator that reflects: (1) the reduction of total risk within the operational loop; (2) the reduction of alarm overload; and (3) the improvement in response timeliness. An integral indicator of reliability and controllability () associated with the residual risk after dispatching is proposed.

Let the analyzed period contain a set of events . For each event, and  are evaluated; thus, the risk is defined as . Dispatching assigns a response priority (weight)  to each event (including the fact that some events are suppressed as insignificant). Then:

where  represents the baseline aggregate risk assessment under the “traditional” operating mode (for example, without contextual filtering, with fixed setpoints, and without event aggregation).

The main idea of the proposed indicator is that the more effectively dispatching reduces the residual risk (through the correct identification of critical events and timely responses), the closer the value of   approaches 1.

The assessment of PoF may rely on probabilistic models and failure statistics; in the absence of sufficient statistical data, expert evaluation is applied with mechanisms aimed at reducing subjectivity. In particular, the interpretation of the criticality of failure modes and the ranking of events may include the aggregation of expert evidence and interval estimates (as a development of classical FMECA), which is particularly applicable in situations involving ambiguous signals and varying quality of data channels [13].

For CoF, it is advisable to define consequence scales aligned with safety requirements, taking into account technological losses and downtime.

The indicator is complemented by operational metrics, such as the share of false alarms, the average number of alarms per minute during peak operating conditions, the response time to high-criticality events, and the share of events requiring manual verification. Taken together, and considering all indicators, the integrated assessment makes it possible to evaluate reliability as controllability, rather than solely as the fault tolerance of equipment. The further development and refinement of this indicator appear to be promising directions for future research.

Conclusion. Thus, the conducted study makes it possible to draw a number of conclusions:

1. It is shown that in engineering systems of oil and gas production, one of the main limitations to ensuring reliability lies in information management: alarm overload, false activations, and cascades of signals lead to reduced controllability of emergency situations. Therefore, the transition to risk-based management of events becomes a fundamentally important issue.

2. Dispatching and event analysis are identified as an independent functional layer aimed at transforming streams of BMS signals into manageable events with transparent criticality. The integration of diagnostic strategies of functional safety increases the reliability of response and reduces reaction time under hazardous conditions, while also aligning with the objectives of risk-based reliability management of engineering systems. In this context, BMS alarm signals and events represent an important source of risk-related information; however, their value is realized only when false alarms are filtered and their context is properly interpreted.

3. A model of risk-based dispatching is proposed, in which prioritization is based on a dynamic risk profile and updated in real time according to process parameters, which is consistent with the principles of a dynamic risk-based approach. Along with the model, an integral reliability indicator is presented, aimed at assessing the effect of dispatching and reflecting the reduction of residual risk relative to the baseline operating mode.

 

References:

  1. Castillo-Navarro, J., Kristjanpoller, F., Mena, R., Godoy, D. R., Viveros, P. A Methodological Framework for Managing the Alarms in Wind Turbine Control and Data Acquisition Systems for Failure Analysis / J. Castillo-Navarro, F. Kristjanpoller, R. Mena, D. R. Godoy, P. Viveros // Machines. – 2024. – Vol. 12, № 9. –  Art. 597. – DOI: 10.3390/machines12090597.
  2. Catelani, M., Ciani, L., Patrizi, G. Logic Solver Diagnostics in Safety Instrumented Systems for Oil and Gas Applications / M. Catelani, L. Ciani, G. Patrizi // Safety. – 2022. – Vol. 8, № 1. – Art. 15. – DOI: 10.3390/safety8010015.
  3. Chen, Z., O’Neill, Z., Wen, J., Pradhan, O., Yang, T., Lu, X., Lin, G., Miyata, S., Lee, S., Shen, C., Chiosa, R., Piscitelli, M. S., Capozzoli, A., Hengel, F., Kührer, A., Pritoni, M., Liu, W., Clauß, J., Chen, Y., Herr, T. A review of data-driven fault detection and diagnostics for building HVAC systems / Z. Chen, Z. O’Neill, J. Wen [et al.] // Applied Energy. – 2023. – Vol. 339. – Art. 121030. – DOI: 10.1016/j.apenergy.2023.121030.
  4. El-Thalji, I. Emerging Practices in Risk-Based Maintenance Management Driven by Industrial Transitions: Multi-Case Studies and Reflections / I. El-Thalji // Applied Sciences. – 2025. – Vol. 15, № 3. – Art. 1159. – DOI: 10.3390/app15031159.
  5. Elwerfalli, A., Alsadaie, S., Mujtaba, I. M. Development of Maintenance Plan for Power-Generating Unit at Gas Plant of Sirte Oil Company Using Risk-Based Maintenance (RBM) Approach / A. Elwerfalli, S. Alsadaie, I. M. Mujtaba // Processes. – 2025. – Vol. 13, № 8. – Art. 2533. – DOI: 10.3390/pr13082533.
  6. Hamidishad, N., Barbosa, R. S., Allahyarzadeh-Bidgoli, A., Yanagihara, J. I. Digital Twin Frameworks for Oil and Gas Processing Plants: A Comprehensive Literature Review / N. Hamidishad, R. S. Barbosa, A. Allahyarzadeh-Bidgoli, J. I. Yanagihara // Processes. – 2025. – Vol. 13, № 11. – Art. 3488. – DOI: 10.3390/pr13113488.
  7. Han, Z., Liu, J., Li, J., Kang, H., Xie, G. Integrated Application of Dynamic Risk-Based Inspection and Integrity Operating Windows in Petrochemical Plants / Z. Han J. Liu, J. Li, H. Kang, G. Xie // Processes. – 2024. – Vol. 12, № 7. – Art. 1509. – DOI: 10.3390/pr12071509.
  8. Liu C., Li, G., Xiao, W., Liu, J., Tan, L., Li, C., Wang, T., Yang, F., Xue, C. Reliability analysis of subsea control system using FMEA and FFTA / C. Liu, G. Li, W. Xiao [et al.] // Scientific Reports. – 2024. – Vol. 14. – Art. 21353. – DOI: 10.1038/s41598-023-42030-3.
  9. Onyeke F. O., Odujobi, O., Adikwu, F. E., Elete, T. Y. Revolutionizing process alarm management in refinery operations: Strategies for reducing operational risks and improving system reliability / F. O. Onyeke, O. Odujobi, F. E. Adikwu, T. Y. Elete // Magna Scientia Advanced Research and Reviews. – 2023. – Vol. 9, № 2. – P. 187–194. – DOI: 10.30574/msarr.2023.9.2.0156.
  10. Peco Chacón, A. M., Segovia Ramírez, I., García Márquez, F. P. False Alarms Analysis of Wind Turbine Bearing System / A. M. Peco Chacón, I. Segovia Ramírez, F. P. García Márquez // Sustainability. – 2020. – Vol. 12, № 19. – Art. 7867. – DOI: 10.3390/su12197867.
  11. Teixeira, W. C. E., Sanz-Bobi, M. Á., Oliveira, R. C. L. de Applying Intelligent Multi-Agents to Reduce False Alarms in Wind Turbine Monitoring Systems / W. C. E. Teixeira, M. Á. Sanz-Bobi, R. C. L. de Oliveira // Energies. – 2022. – Vol. 15, № 19. – Art. 7317. – DOI: 10.3390/en15197317.
  12. Wang, Z., Xiao, H., Guan, C., Zhou, L., Fu, D. Research on the Development of a Building Model Management System Integrating MQTT Sensing / Z. Wang, H. Xiao, C. Guan, L. Zhou, D. Fu // Sensors. – 2025. – Vol. 25, № 19. – Art. 6069. – DOI: 10.3390/s25196069.
  13. Zhang, X., Wei, R., Wu, Z., Dong, L., Liu, H. Risk Assessment and Reliability Analysis of Oil Pump Unit Based on D-S Evidence Theory / X. Zhang, R. Wei, Z. Wu, L. Dong, H. Liu // Energies. – 2023. – Vol. 16, № 13. – Art. 4887. – DOI: 10.3390/en16134887.
Информация об авторах

Head of Instrumentation and Control (I&C) Group, Maintenance Department, Tengizchevroil, Kazakhstan, Almaty

руководитель группы КИП, отдел Техобслуживания, Тенгизшевройл, Казахстан, г. Алматы

Журнал зарегистрирован Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор), регистрационный номер ЭЛ №ФС77-54434 от 17.06.2013
Учредитель журнала - ООО «МЦНО»
Главный редактор - Звездина Марина Юрьевна.
Top