Postgraduate student, Higher School of Artificial Intelligence Technologies, Peter the Great St. Petersburg Polytechnic University, Russia, Saint Petersburg
ORCID 0009-0009-0638-052X
Международный
ABSTRACT
With the development of precision medicine, disease risk prediction based on clinical data has become a key task. This study proposes a deep learning model with multi-feature fusion that integrates demographic, physiological, and clinical data. The use of an attention mechanism enables the identification of the most significant risk factors, while a fully connected neural network captures complex nonlinear relationships. Experimental results demonstrate improvements in Accuracy and AUC compared to traditional methods. The model provides more accurate and stable risk assessment and can be applied in clinical decision support systems.
АННОТАЦИЯ
С развитием прецизионной медицины прогнозирование риска заболеваний на основе клинических данных становится ключевой задачей. В работе предложена модель глубокого обучения с многопризнаковой интеграцией, объединяющая демографические, физиологические и клинические данные. Использование механизма внимания позволяет выделять наиболее значимые факторы риска, а полносвязная нейронная сеть — выявлять сложные нелинейные зависимости. Эксперименты показывают улучшение показателей Accuracy и AUC по сравнению с традиционными методами. Модель обеспечивает более точную и стабильную оценку риска и может применяться в системах поддержки медицинских решений.
Keywords: deep learning, multi-feature fusion, disease risk prediction, attention mechanism.
Ключевые слова: глубокое обучение, многопризнаковая интеграция, прогнозирование риска заболеваний, механизм внимания.
Introduction. The widespread adoption of Electronic Health Records (EHR) and the rapid development of healthcare information systems have led to clinical data characterized by massive scale, multimodality, and strong heterogeneity. Mining latent associations from such data and enabling early disease risk prediction have become important research directions in the field of intelligent healthcare. Traditional statistical methods, such as Logistic Regression (LR) and Support Vector Machine (SVM), perform reliably on structured data but heavily depend on manual feature engineering and are incapable of effectively capturing high-order nonlinear relationships and cross-modal interactions. In recent years, deep learning has achieved significant breakthroughs in medical prediction tasks due to its powerful capability for automatic representation learning. However, most existing models still rely on simple concatenation or equal-weight fusion strategies, neglecting the varying contributions of different features in risk prediction. This often results in redundant noise in high-dimensional spaces interfering with model decision-making [3]. Therefore, introducing an intelligent mechanism capable of dynamically assigning feature weights is particularly necessary[4; 5; 6].
The Attention Mechanism, originally demonstrating strong advantages in sequence modeling, is based on learning the relative importance of input elements to achieve selective information focus. Its effectiveness has been widely validated in multimodal fusion and feature selection tasks. In this study, a feature-level attention mechanism is innovatively integrated into a medical risk prediction framework, aiming to significantly enhance the representation of key risk factors through adaptive weighting while maintaining end-to-end trainability [1;2].
Materials and methods
1. Overall Framework
The proposed model consists of three main modules: (1) a multi-source feature construction module; (2) an attention-based weighted fusion module (the core innovation); and (3) a deep nonlinear mapping and prediction module. The overall workflow is as follows: multimodal medical data input → standardized encoding → attention-based dynamic weighting → deep feature extraction via a fully connected network → probability output using a Sigmoid function.
/Liu.files/image001.png)
Figure 1. Architecture of the Proposed Multi-feature Fusion Medical Risk Prediction Model
2. Multi-feature Data Modeling
Let a patient sample be represented as a triplet
, where
denotes demographic features (e.g., age, gender),
represents physiological indicators (e.g., blood pressure, blood glucose, BMI), and
corresponds to historical medical records (e.g., past medical history, medication information).
After applying Z-score normalization, mean imputation for missing values, and One-hot encoding, the features from all modalities are concatenated into a unified high-dimensional feature vector: /Liu.files/image006.png)
where
denotes the total feature dimension (typically ranging from 30 to 120).
3. Attention Mechanism Modeling (Core Innovation)
To accurately capture the varying contributions of features to the prediction task, a feature-level attention mechanism is designed. Its mathematical formulation is as follows:
Given an input feature vector
, each component
is first projected nonlinearly to extract an importance signal:
/Liu.files/image010.png)
where
and
are learnable parameters, and
is the projection dimension (typically set to 64).
The attention weights are then obtained via Softmax normalization:
/Liu.files/image014.png)
The final weighted feature vector is computed as:
/Liu.files/image015.png)
Advantages and Differentiability:
1)
automatically reflects the contribution of the iii-th feature without requiring manual specification.
2) The entire computation is fully differentiable (derivatives of tanh and Softmax exist), supporting end-to-end backpropagation.
3) Compared with traditional L1 regularization–based feature selection, this mechanism is adaptive, robust to noise, highly interpretable, and can dynamically suppress interference from redundant dimensions.
4. Deep Feature Extraction (FCN)
The weighted feature vector
is fed into a 3–5 layer fully connected neural network for deep nonlinear mapping:
/Liu.files/image018.png)
where
(with
), and the number of neurons in the hidden layers is 128, 64, 32, and 16, respectively. The ReLU activation function effectively mitigates the vanishing gradient problem.
5. Risk Prediction Output
The output layer uses a Sigmoid activation function:
/Liu.files/image021.png)
to represent the probability of an individual having the disease. The loss function is binary cross-entropy:
/Liu.files/image022.png)
Training is performed using the Adam optimizer with a learning rate of 0.001, a batch size of 64, and 100 epochs. An early stopping strategy is applied to prevent overfitting.
6. Experimental Design
The experiments were conducted on publicly available datasets, such as a subset of MIMIC-III dataset, containing approximately 8,000–45,000 samples with 40–110 feature dimensions, covering both structured and semi-structured data. Preprocessing included mean/median imputation for missing values, Z-score normalization, One-hot encoding for categorical features, and a 7:1:2 split for training, validation, and test sets. Baseline models included Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and a deep neural network (DNN) without attention, all using the same data splits and hyperparameter tuning. Evaluation metrics were Accuracy, Precision, Recall, and the Area Under the ROC Curve (AUC), with each experiment repeated five times and results reported as mean
standard deviation.
Results and discussion
1. Overall Performance ComparisonTable 1 presents the performance comparison of each model on the test set (values are the mean of five repeated experiments).
Table 1.
Performance Comparison of Different Models
|
Model |
Accuracy |
Precision |
Recall |
AUC |
|
LR |
0.762 |
0.731 |
0.785 |
0.823 |
|
SVM |
0.791 |
0.768 |
0.812 |
0.851 |
|
RF |
0.815 |
0.792 |
0.834 |
0.872 |
|
DNN |
0.837 |
0.819 |
0.855 |
0.901 |
|
Model in this paper |
0.865 |
0.847 |
0.882 |
0.933 |
The results show that the proposed model significantly outperforms the baselines across all metrics, with an approximate 3.2% improvement in AUC, demonstrating the effectiveness of multi-feature fusion and the attention mechanism.
The figure shows five curves corresponding to LR, SVM, RF, DNN, and the proposed model. The curve of the proposed model is closest to the top-left corner, with the highest AUC, indicating that it maintains the highest true positive rate and the lowest false positive rate across different thresholds.
/Liu.files/image024.png)
Figure 2. Comparison of ROC curves
2. Analysis of Attention Mechanism
Visualization of attention weight heatmaps shows that physiological indicators (e.g., systolic blood pressure, blood glucose) generally have weights above 0.15, while some redundant demographic features are suppressed to below 0.05, reflecting the model’s dynamic focusing ability.
2.1. Ablation Study Design
To quantify the contribution of the attention mechanism, two comparative experiments were designed:
1) Full model: the complete framework proposed in this study.
2) w/o Attention variant: multi-source features are directly concatenated and fed into the FCN, with all other hyperparameters, dataset splits, and training strategies kept identical.
Each experiment was repeated five times with the same random seed to control variables. Statistical significance was evaluated using a t-test (
considered significant).
Table 2.
Ablation Study Results
|
Model variants |
Accuracy |
AUC |
Number of convergence rounds (mean) |
|
Full Model |
0.865 |
0.933 |
48 |
|
w/o Attention |
0.842 |
0.915 |
67 |
After removing the attention mechanism, the AUC decreased by approximately 1.8%, and the number of training epochs to convergence increased by about 39%, indicating that the attention mechanism effectively reduces interference from irrelevant information and improves the efficiency of gradient propagation.
3. Convergence Analysis
The loss curve of the proposed model stabilizes around the 40th epoch, converging approximately 20% earlier than the DNN and exhibiting smaller fluctuations. This improvement is mainly attributed to attention-based weighting, which filters out noise in advance and enables more efficient gradient updates.
/Liu.files/image026.png)
Figure 3. Comparison of Training Loss Curves
Conclusion
This study proposes a deep learning–based multi-feature fusion model for medical risk prediction. By employing a feature-level attention mechanism, the model adaptively weights high-dimensional heterogeneous medical data and intelligently extracts key risk factors, significantly improving prediction accuracy and training efficiency. Experimental results demonstrate its advantages in Accuracy and AUC, while ablation studies confirm that the attention module is the key contributor to performance enhancement. Future work may involve incorporating Transformer encoders to strengthen long-range dependency modeling, integrating temporal EHR data (e.g., via LSTM), or utilizing SHAP values to enhance clinical interpretability, providing more reliable decision support for precision medicine.
References: