INTERPRETABLE MACHINE LEARNING FOR ADMISSION-TIME IN-UNIT MORTALITY PREDICTION IN INTENSIVE CARE UNITS

ИНТЕРПРЕТИРУЕМОЕ МАШИННОЕ ОБУЧЕНИЕ ДЛЯ ПРОГНОЗИРОВАНИЯ ЛЕТАЛЬНОГО ИСХОДА В ОТДЕЛЕНИИ ИНТЕНСИВНОЙ ТЕРАПИИ НА МОМЕНТ ПОСТУПЛЕНИЯ

Imashev A.

28.04.2026 115

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Imashev A. INTERPRETABLE MACHINE LEARNING FOR ADMISSION-TIME IN-UNIT MORTALITY PREDICTION IN INTENSIVE CARE UNITS // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22522 (дата обращения: 28.05.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.145.4.22522

Статья поступила в редакцию: 01.04.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

Timely risk stratification in intensive care units is essential for improving patient outcomes and supporting resource allocation. This study proposes an interpretable machine learning pipeline for admission-time prediction of in-unit mortality using the eICU Collaborative Research Database. Only features available at admission were used in order to avoid leakage from post-admission severity scores. The analysis included 200,859 ICU stays and compared Logistic Regression, Random Forest, XGBoost, and Random Forest with SMOTE oversampling. XGBoost achieved the best performance with ROC-AUC = 0.7716 and PR-AUC = 0.1913. SHAP-based explainability showed that respiratory, cardiovascular, and sepsis-related diagnoses were the strongest predictors. The results demonstrate that admission-time features provide a meaningful but limited signal for early mortality prediction and form an interpretable baseline for future ICU risk assessment models.

АННОТАЦИЯ

Своевременная стратификация риска в отделениях интенсивной терапии важна для повышения качества клинических решений и более рационального распределения ресурсов. В статье предложен интерпретируемый конвейер машинного обучения для прогнозирования летального исхода в рамках пребывания пациента в ОИТ на момент поступления на основе базы данных eICU Collaborative Research Database. В анализ включались только признаки, доступные при поступлении, что позволило избежать утечки данных из постфактум вычисляемых шкал тяжести состояния. Исследование охватило 200 859 эпизодов пребывания в ОИТ; сравнивались модели Logistic Regression, Random Forest, XGBoost и Random Forest с применением SMOTE. Наилучший результат показала модель XGBoost: ROC-AUC = 0.7716 и PR-AUC = 0.1913. Анализ SHAP показал, что наибольший вклад в прогноз вносили респираторные, сердечно - сосудистые и септические диагнозы. Полученные результаты демонстрируют, что признаки, доступные при поступлении, обеспечивают ограниченный, но практически значимый сигнал для раннего прогноза.

Keywords: intensive care unit, in-unit mortality, machine learning, risk prediction, eICU, explainable AI, SHAP.

Ключевые слова: отделение интенсивной терапии, летальный исход, машинное обучение, прогнозирование риска, eICU, объяснимый искусственный интеллект, SHAP.

Introduction

Timely and accurate risk stratification in intensive care units (ICUs) is important for supporting clinical decision-making and improving patient outcomes. Patients admitted to ICUs often present with unstable conditions and may deteriorate rapidly, which makes early identification of individuals at increased risk of in-unit mortality especially valuable. In such settings, predictive models can assist clinicians in prioritising care, allocating resources more effectively, and monitoring high-risk patients more closely.

The growing availability of electronic health records and large-scale critical care databases has created new opportunities for data-driven risk modelling. Machine learning methods have been increasingly applied in healthcare because they can capture complex relationships among demographic, diagnostic, and clinical variables that may not be fully represented by conventional approaches [16, 21, 23]. The eICU Collaborative Research Database aggregates records from over 200,000 ICU stays across more than 200 hospitals and therefore provides a suitable basis for robust model development and evaluation [20].

At the same time, the development of reliable machine learning models for ICU mortality prediction remains challenging. Clinical data are heterogeneous, incomplete, and strongly imbalanced, which can reduce model robustness and limit

generalisability [8, 26]. In addition, predictive performance alone is not sufficient for practical adoption, since models used in medical settings are expected to be transparent and clinically interpretable [6, 12, 18]. A major methodological problem in published ICU mortality studies is data leakage caused by the use of post- admission severity indicators such as APACHE-derived scores or precomputed mortality predictions. These indicators are partially based on information collected after admission and can inflate performance estimates [11,13]. The aim of the present study was to develop an interpretable admission-time model for in-unit mortality prediction that uses only variables available at the true decision point and to compare several common classification approaches under this restriction.

Materials and Methods

The study used the eICU Collaborative Research Database, a multicentre critical care database with de-identified information on ICU stays in the United States [20]. Only admission-time variables were used in order to preserve the temporal validity of the prediction setting. The final analytical dataset contained 200,859 ICU stays, including 189,952 survivors and 10,907 in-unit deaths, which corresponds to an event rate of approximately 5.4%.

Рисунок 1. Distribution of the binary target variable for in-unit mortality in the analytical cohort

The target variable was constructed from the unitdischargestatus field in patient.csv. Patients with status Expired were assigned class 1, while all remaining cases were assigned class 0. The final feature set included age, gender, ethnicity, admission weight, ICU admission source, ICU unit type, hospital admission source, and ten binary diagnosis indicators derived from admission diagnoses. Diagnosis labels from admissionDx.csv were grouped into the following broad clinical categories: cardiovascular, respiratory, neurologic, sepsis, gastrointestinal, metabolic, trauma, surgical, oncologic, and hematologic.

Таблица 1.

Feature groups used in the final modelling pipeline

Feature group	Variables
Demographic variables	age, gender, ethnicity, admission weight
Administrative variables	unit admission source, ICU unit type, hospital admission source
Diagnosis indicators	cardiovascular, respiratory, neurologic, sepsis, gastrointestinal, metabolic, trauma, surgical, oncologic, hematologic

Severity-related variables derived from APACHE scoring and other post- admission measurements were deliberately excluded in order to avoid shortcut learning from precomputed risk information [11, 13]. Missing numerical values were imputed using statistics estimated on the training partition, and continuous variables were standardised. Missing categorical values were replaced with the most frequent category and then encoded by one-hot encoding. All preprocessing steps were fitted only on the training data and then applied to the test data.

Four classification models were evaluated: Logistic Regression, Random Forest, XGBoost, and Random Forest trained on SMOTE-resampled training data [4]. The dataset was divided into training and test subsets using an 80/20 stratified split. In addition to held-out evaluation, five-fold stratified cross-validation was performed. The main metrics were ROC-AUC, PR-AUC, accuracy, precision, recall, F1-score, and the Brier score [8]. SHAP-based explainability was then applied to the Random Forest model in order to obtain stable global and local feature-attribution estimates [15].

Рисунок 2. Missing-value rates for selected admission-time predictors

Results and Discussion

XGBoost achieved the strongest overall discrimination among the evaluated models. On the held-out test set it reached ROC-AUC = 0.7716 and PR-AUC = 0.1913, outperforming Logistic Regression (ROC-AUC = 0.7388), Random Forest (ROC- AUC = 0.7391), and RF + SMOTE (ROC-AUC = 0.7254). All models remained in the range 0.73–0.77 in terms of ROC-AUC, which reflects the intrinsic difficulty of mortality prediction from admission-time variables alone.

Таблица 2.

Model performance on the held-out test set

Model	ROC-AUC	PR-AUC	Acc.	Prec.	Recall	F1	Brier
Logistic Regression	0.7388	0.1321	0.6721	0.1059	0.6772	0.1832	0.2101
Random Forest	0.7391	0.1472	0.6997	0.1111	0.6470	0.1896	0.2146
XGBoost	0.7716	0.1913	0.7454	0.1301	0.6488	0.2167	0.1864
RF + SMOTE	0.7254	0.1394	0.7429	0.1176	0.5745	0.1952	0.1987

The low PR-AUC values highlight the difficulty of the task under severe class imbalance, since deceased patients account for only 5.4% of ICU stays. SMOTE oversampling did not improve discrimination and produced the weakest ROC- AUC, which suggests that synthetic minority-class examples were not sufficient to

compensate for the limited predictive signal available in the admission-time feature space [4]. Five-fold cross-validation confirmed the stability of the model ranking: the mean ROC-AUC values were 0.7326 for Logistic Regression, 0.7328 for Random Forest, 0.7630 for XGBoost, and 0.7194 for RF + SMOTE.

Рисунок 3. ROC curves for all evaluated models on the held-out test set

Calibration analysis showed that all models tended to overestimate absolute mortality risk, although XGBoost also achieved the lowest Brier score and thus provided the best overall probabilistic behaviour among the tested classifiers. In practical terms, the models identified a considerable fraction of in-unit deaths, but they also generated many false-positive alerts.

Рисунок 4. Calibration plots for all evaluated models

SHAP analysis demonstrated that diagnosis-related features dominated the model’s risk assessment. In particular, respiratory, cardiovascular, and sepsis diagnosis indicators made the largest contributions to the predicted probability of death. These findings are clinically plausible and are consistent with the higher early risk burden associated with acute medical and infectious presentations.

Рисунок 5. SHAP summary plot for the Random Forest model

Рисунок 6. Mean absolute SHAP values for the Random Forest model

Рисунок 7. SHAP waterfall plot for a single high-risk patient

The results also support the methodological value of excluding post-admission severity indicators. Studies that incorporate APACHE-derived variables usually report higher discrimination [1,19], but such gains partly reflect access to information unavailable at the actual admission-time decision point. In the present study, the ROC-AUC of 0.77 should therefore be interpreted as a more conservative but more valid estimate of what can be achieved before extended physiological monitoring has taken place. At the same time, the moderate difference between Logistic Regression, Random Forest, and XGBoost suggests that the main limitation is the informativeness of admission-time features rather than model complexity. Future work should focus on adding temporally valid vital signs and laboratory measurements, on probability calibration, and on external validation on independent cohorts [24, 25].

Conclusion

The study developed an interpretable machine learning pipeline for admission-time prediction of in-unit mortality in ICU patients using the eICU Collaborative Research Database. By restricting the model to variables available at admission and explicitly excluding post-admission severity indicators, the proposed approach avoids a common source of data leakage. XGBoost demonstrated the strongest discrimination, while SHAP analysis confirmed that the model relied on clinically plausible diagnosis categories and admission-source variables. The obtained results show that admission-time features alone provide a meaningful but limited basis for early risk stratification and can serve as a baseline for future ICU mortality prediction research.

Список литературы:

Awad A., Bader-El-Den M., McNicholas J., Briggs J. Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach // International Journal of Medical Informatics. 2017. Vol. 108. P. 185–195.
Brossard C. et al. Predicting emergency department admissions using a machine- learning algorithm: a proof of concept with retrospective study // BMC Emergency Medicine. 2025. Vol. 25. No. 1. Art. 3.
Cabitza F., Rasoini R., Gensini G.F. Unintended consequences of machine learning in medicine // JAMA. 2017. Vol. 318. No. 6. P. 517–518.
Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P. SMOTE: Synthetic minority over-sampling technique // Journal of Artificial Intelligence Research. 2002. Vol. 16. P. 321–357.
Doshi-Velez F., Kim B. Towards a rigorous science of interpretable machine learning // arXiv preprint arXiv:1702.08608. 2017.
Guidotti R., Monreale A., Ruggieri S., Turini F., Giannotti F., Pedreschi D. A survey of methods for explaining black box models // ACM Computing Surveys. 2018. Vol. 51. No. 5. Art. 93.
Harutyunyan H., Khachatrian H., Kale D.C., Ver Steeg G., Galstyan A. Multitask learning and benchmarking with clinical time series data // Scientific Data. 2019. Vol. 6. Art. 96.
He H., Garcia E.A. Learning from imbalanced data // IEEE Transactions on Knowledge and Data Engineering. 2009. Vol. 21. No. 9. P. 1263–1284.
Holzinger A. From machine learning to explainable AI // Proceedings of the 2018 IEEE World Symposium on Digital Intelligence for Systems and Machines (DISA). IEEE, 2018. P. 55–66.
Johnson A.E.W. et al. MIMIC-III, a freely accessible critical care database // Scientific Data. 2016. Vol. 3. Art. 160035.
Kapoor S., Narayanan A. Leakage and the reproducibility crisis in machine- learning-based science // Patterns. 2023. Vol. 4. No. 9. Art. 100804.
Kawamoto K., Houlihan C.A., Balas E.A., Lobach D.F. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success // BMJ. 2005. Vol. 330. No. 7494. P. 765.
Knaus W.A., Draper E.A., Wagner D.P., Zimmerman J.E. APACHE II: A severity of disease classification system // Critical Care Medicine. 1985. Vol. 13. No. 10. P. 818–829.
Lee Y.C. et al. Machine learning models for predicting unscheduled return visits to an emergency department: a scoping review // BMC Emergency Medicine. 2024. Vol. 24. Art. 20.
Lundberg S.M. et al. From local explanations to global understanding with explainable AI for trees // Nature Machine Intelligence. 2020. Vol. 2. No. 1. P. 56–67.
Miotto R., Wang F., Wang S., Jiang X., Dudley J.T. Deep learning for healthcare: review, opportunities and challenges // Briefings in Bioinformatics. 2018. Vol. 19. No. 6. P. 1236–1246.
Mutegeki H., Nahabwe A., Nakatumba-Nabende J., Ggaliwango M. Interpretable machine learning-based triage for decision support in emergency care // Proceedings of the 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, 2023. P. 983–990.
Musen M.A., Middleton B., Greenes R.A. Clinical decision-support systems // Biomedical Informatics / ed. by E.H. Shortliffe, J.J. Cimino. London: Springer, 2014. P. 643–674.
Pirracchio R. et al. Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study // The Lancet Respiratory Medicine. 2015. Vol. 3. No. 1. P. 42–52.
Pollard T.J. et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research // Scientific Data. 2018. Vol. 5. Art. 180178.
Rajkomar A., Dean J., Kohane I. Machine learning in medicine // New England Journal of Medicine. 2019. Vol. 380. No. 14. P. 1347–1358.
Shahul M., Pushpalatha K.P. Machine learning based patient classification in emergency department // Proceedings of the 2023 International Conference on Advances in Intelligent Computing and Applications (AICAPS). IEEE, 2023. P. 1–5.
Shickel B., Tighe P.J., Bihorac A., Rashidi P. Deep EHR: A survey of recent advances in deep learning techniques for electronic health record analysis // IEEE Journal of Biomedical and Health Informatics. 2018. Vol. 22. No. 5. P. 1589–1604.
Stenwig E., Salvi G., Rossi P.S., Skjærvold N.K. Comparative analysis of explainable machine learning prediction models for hospital mortality // BMC Medical Research Methodology. 2022. Vol. 22. No. 1. Art. 53.
Tsoni R., Kaldis V., Kapogianni I., Sakagianni A., Feretzakis G., Verykios V.S. A machine learning pipeline using KNIME to predict hospital admission in the MIMIC-IV database // Proceedings of the 2023 14th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE, 2023. P. 1–6.
Weiskopf N.G., Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research // Journal of the American Medical Informatics Association. 2013. Vol. 20. No. 1. P. 144–151..