BILATERAL ATTENTION-BASED CLASSIFICATION OF SPINAL DEGENERATIVE DISEASE PATTERNS FROM LUMBAR MRI SLICES

ДВУХСТОРОННЯЯ КЛАССИФИКАЦИЯ ОСНОВАННАЯ НА ВНИМАНИИ ПРИ ДЕГЕНЕРАТИВНЫХ ЗАБОЛЕВАНИЯХ ПОЗВОНОЧНИКА ПО СРЕЗАМ МРТ ПОЯСНИЧНОГО ОТДЕЛА

Akjan Y. Kabdrakhova S.S.

28.05.2026 324

5(146)

10. Информатика, вычислительная техника и управление

Цитировать:

Akjan Y., Kabdrakhova S.S. BILATERAL ATTENTION-BASED CLASSIFICATION OF SPINAL DEGENERATIVE DISEASE PATTERNS FROM LUMBAR MRI SLICES // Universum: технические науки : электрон. научн. журн. 2026. 5(146). URL: https://7universum.com/ru/tech/archive/item/22788 (дата обращения: 28.07.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.146.5.22788

Статья поступила в редакцию: 04.05.2026

Принята к публикации: 16.05.2026

Опубликована: 28.05.2026

УДК 621.7

ABSTRACT

This study aims to develop a bilateral attention-based deep learning framework for automated classification of lumbar degenerative disease patterns from MRI. Lumbar spine degenerative diseases represent a leading cause of chronic low back pain and functional disability, entailing precise MRI-based assessment for timely diagnosis and treatment planning. Automated classification remains difficult due to subtle, localized abnormalities, a considerable reliance on MRI views, and the need to integrate information across adjacent slices and bilateral anatomical structures. This study reports a bilateral attention-based deep learning framework for localized lumbar degenerative disease classification from MRI. The approach reformulates the RSNA 2024 Lumbar Spine Degenerative Classification benchmark as an ROI-centered binary pathology detection task utilizing expert-provided lesion coordinates. Each sample contains a stack of five adjacent MRI slices centered on the target pathology. A shared ResNet-18 encoder extracts slice-wise visual features, which are aggregated via attention-based multi-slice pooling and fused with structured anatomical metadata—including lumbar level, pathology type, MRI view, and side-aware bilateral context—using a Transformer encoder. Experimental evaluation demonstrates strong discriminative performance, achieving an AUROC of 0.952 on the validation data, with particularly strong results for spinal canal stenosis. The proposed method outperforms the simple ResNet-18 baseline by 7% and the variant without bilateral attention by 1%. These results illustrate the effectiveness of pathology-centered ROI learning, slice-level attention, and bilateral contextual fusion for reliable computer-aided lumbar MRI diagnosis.

АННОТАЦИЯ

Целью данного исследования является разработка двусторонней структуры глубокого обучения на основе внимания для автоматической классификации моделей дегенеративного заболевания поясничного отдела позвоночника на основе МРТ. Дегенеративные заболевания поясничного отдела позвоночника являются ведущей причиной хронической боли в пояснице и функциональной инвалидности, требующей точной оценки на основе МРТ для своевременной диагностики и планирования лечения. Автоматическая классификация остается сложной из-за тонких, локализованных аномалий, значительной зависимости от МРТ-изображений и необходимости интеграции информации между соседними срезами и двусторонними анатомическими структурами. В этом исследовании представлена двусторонняя система глубокого обучения на основе внимания для локализованной классификации дегенеративных заболеваний поясничного отдела позвоночника на основе МРТ. Подход переформулирует эталон RSNA 2024 Lumbar Spine Degenerative Classification как задачу бинарного обнаружения патологии, ориентированную на ROI, с использованием координат поражений, предоставленных экспертами. Каждый образец содержит стопку из пяти смежных срезов МРТ, центрированных на целевой патологии. Общий кодер ResNet-18 извлекает визуальные признаки по срезам, которые агрегируются с помощью многосрезового пулинга на основе внимания и объединяются со структурированными анатомическими метаданными включая уровень поясницы, тип патологии, вид МРТ и двусторонний контекст с использованием кодера Transformer. Экспериментальная оценка демонстрирует высокую дискриминационную эффективность, достигая AUROC 0,952 на проверочных данных, с особенно сильными результатами для стеноза спинномозгового канала. Предложенный метод превосходит простой базовый уровень ResNet-18 на 7% и вариант без двустороннего внимания на 1%. Эти результаты иллюстрируют эффективность обучения на основе ROI, центрированного на патологии, внимания на уровне срезов и двустороннего контекстного слияния для надежной компьютерной диагностики МРТ поясничного отдела позвоночника.

Keywords: Lumbar spine MRI, degenerative disease classification, bilateral attention, multi-slice learning, ROI classification, Transformer fusion.

Ключевые слова: МРТ поясничного отдела позвоночника, классификация дегенеративных заболеваний, двустороннее внимание, многослойное обучение, классификация ROI, слияние трансформаторов.

Introduction

Lumbar spine degenerative diseases present a significant diagnostic and clinical burden, frequently appearing as chronic low back pain, radiculopathy, reduced mobility, and long-term disability worldwide [1, p. 16]– [3, p. 16]. Common abnormalities, including spinal canal stenosis, neural foraminal narrowing, and subarticular stenosis, frequently involve subtle structural changes that compress neural elements and progressively worsen patient quality of life [4, p. 16]– [6, p. 16]. These degenerative patterns are especially prevalent in the lower lumbar levels, particularly L4/L5 and L5/S1, where biomechanical loading is highest.

From an automated analysis perspective, lumbar MRI remains particularly challenging because diagnostically relevant findings are often confined to small, localized anatomical regions and may be visible across only a limited set of adjacent slices. Although Magnetic Resonance Imaging (MRI) provides excellent soft-tissue contrast and multi-planar visualization, routine interpretation still requires radiologists to jointly analyze Axial T2, Sagittal T1, and Sagittal T2/STIR sequences across multiple lumbar levels [7, p. 17]– [9, p. 17]. This multi-view, multi-slice dependency introduces substantial inter-reader variability, making reliable automated classification highly challenging.

Recent advances in deep learning have produced promising results in musculoskeletal MRI analysis [10, p. 17]– [12, p. 17], yet lumbar degenerative disease classification continues to be challenging owing to the strong dependency on imaging view, subtle pathology localization, and the need to jointly reason across multiple lumbar levels and disease categories.

A major breakthrough in this direction was enabled by the release of the RSNA 2024 Lumbar Spine Degenerative Classification (LumbarDISC) dataset [13, p. 17], currently the largest publicly available lumbar spine MRI benchmark, containing 2,697 patient studies and 8,593 MRI series from 8 institutions across 6 countries. The dataset provides expert radiologist annotations for five degenerative conditions across five lumbar disc levels and three clinically important MRI views, serving as an ideal benchmark for formulating robust AI-based diagnostic systems.

This study uses the RSNA lumbar MRI benchmark to develop a novel ROI-centered deep learning framework for spinal degenerative disease classification. The proposed method is specifically designed to exploit pathology-centered local crops and multi-view anatomical context to enhance discriminative representation learning for lumbar abnormalities.

To handle the limitations of coarse image-level lumbar MRI classification, a bilateral attention-based multi-slice framework is proposed that jointly models pathology-centered ROI appearance, slice-wise contextual importance, and side-aware anatomical metadata through transformer-based fusion.

The main contributions of this work are summarized as follows:

1. Lumbar degenerative disease assessment is reformulated as a localized, ROI-level binary classification task through leveraging expert-provided pathology coordinates to extract lesion-centered crops from multi-view MRI sequences, replacing conventional image-level classification with spatially targeted learning.

2. A bilateral attention-based multi-slice deep learning framework is proposed that jointly integrates pathology-centered ROI encoding via a shared ResNet-18 backbone, gated attention pooling across adjacent slices for through-plane context aggregation, side-aware bilateral gating for lateralized anatomical reasoning, and transformer-based fusion of visual features with structured clinical metadata including condition type, lumbar level, and MRI view.

3. Ablation experiments are conducted on the large-scale RSNA LumbarDISC benchmark, comparing the proposed framework against a standard ResNet-18 baseline and an architecture variant without bilateral context modeling, demonstrating that each proposed component contributes measurably to classification performance.

Literature Review

Artificial intelligence for spinal imaging has evolved from early artificial neural networks and traditional machine learning systems to advanced deep learning pipelines for detection, segmentation, grading, and outcome prediction. Comprehensive reviews of spinal AI indicate that these methods have been applied to diagnosis, prognosis, imaging assessment, and surgical decision support, with spinal MRI identified as a key application domain [14, p. 17]. A recent systematic review focusing on lumbar degenerative disc disease found that AI-assisted lumbar MRI diagnosis addresses a broad spectrum of manifestations, including disc degeneration, herniation, bulging, and stenosis, and that both machine learning and deep learning approaches have demonstrated strong diagnostic performance [15, p. 17]. Collectively, these outcomes position lumbar MRI analysis as an active and clinically significant research area.

Classical and hybrid lumbar imaging pipelines. Earlier computer-aided systems for spine imaging typically relied on handcrafted features, segmentation, and shallow classifiers. Segmentation-based methods isolated structures such as intervertebral discs prior to downstream grading or classification [16, p. 18]. Hybrid systems that integrated preprocessing, region-of-interest extraction, feature engineering, and conventional classifiers have also been introduced for lumbar disease detection [17, p. 18]. Although these approaches demonstrated the utility of localization and structural analysis, they generally relied on manually designed representations and were often limited to small, private datasets, thereby constraining scalability and robustness.

Single-view CNN-based lumbar MRI classification. With the advancement of deep learning, convolutional neural networks have become the predominant approach for analyzing lumbar MRIs. Standard architectures such as VGG [18, p. 18], ResNet [19, p. 18], and EfficientNet [20, p. 18] are frequently adapted for spinal image classification. In a lumbar MRI study, Mbarki et al. [21, p. 18] used axial MRI with deep CNNs for lumbar disc classification, demonstrating the effectiveness of axial-plane analysis for localized disc pathology. Al-kubaisi and Khamiss [22, p. 18] investigated transfer learning for lumbar disc-state classification, showing that focusing on relevant regions of interest and transferring from related source domains can improve performance. Abuhayi et al. [23, p. 18] introduced an involution-based VGG variant for multiclass lumbar disease classification on sagittal T2 MRI. Although these methods achieved promising results, most relied on a single imaging plane, thereby limiting the assimilation of complementary anatomical information from both axial and sagittal views.

Localization-aware and interpretable lumbar MRI systems. A significant trend in clinically oriented spinal AI systems is the incorporation of localization prior to classification. Bharadwaj et al. [24, p. 18] developed a lumbar MRI pipeline that first localized relevant anatomical structures, followed by classification of central canal stenosis, neural foraminal stenosis, and facet arthropathy. Notably, their work also addressed interpretable formulations for central canal stenosis, stressing the importance of explainability in clinical applications. The evidence shows that anatomically grounded systems are often preferable to black-box image-level classifiers, particularly in lumbar MRI, where faint, localized degenerative changes can greatly affect diagnosis.

Attention-enhanced MRI classification. Recent research in lumbar MRI has increasingly investigated attention methods to improve feature selection and highlight disease-relevant structures. Lin et al. [25, p. 19] introduced a CNN with multiple attention components for lumbar spinal stenosis classification, demonstrating that attention enables the network to focus on subtle and spatially significant degenerative findings. This evidence indicates that spinal MRI classification benefits from feature refinement beyond common convolutional approaches. Nevertheless, most present methods continue to emphasize coarse image-level classification rather than explicitly localized, pathology-centered learning.

Multi-stage and multi-view lumbar diagnosis. Multi-stage and multi-view modeling represents an additional important direction. Chen et al. [26, p. 19] introduced a two-step framework in which Mask R-CNN [27, p. 19] first identifies lumbar vertebrae and discs, followed by in-depth classification using a synthesized multi-angle disc view derived from sagittal and axial images. Their data show that combining multiple views can substantially enhance performance compared with using sagittal or axial MRI alone. This supports the perspective that lumbar pathology is inherently multi-view and that single-plane approaches may not fully utilize available information.

Multimodal and language-assisted systems. Beyond image-only models, recent studies have investigated multimodal diagnostic approaches. Dong et al. [28, p. 19] developed a BERT-based spinal large language model that integrates MRI-derived features, spinal measurements, and textual information for lumbar disorder classification. This work illustrates a wider trend toward combining imaging, structured measurements, and language models to achieve more precise diagnosis. While these systems are promising, they generally require supplementary metadata or clinical reports and thus differ from imaging-focused pathology classification pipelines.

Related spinal MRI work beyond lumbar degeneration. Several studies addressing other spinal MRI challenges further point out the value of region-of-interest-aware and staged modeling. Hallinan et al. [29, p. 19] developed a deep learning model for classifying metastatic epidural spinal cord compression on MRI using expert-labeled axial T2 images. Similarly, Zhang et al. [30, p. 19] introduced a Faster R-CNN-based framework for region-of-interest detection and cascade classification of cervical central canal and neural foraminal stenosis. Although these investigations focus on different pathologies or spinal regions, they provide additional evidence that spinal MRI analysis benefits from anatomically localized and structured pipelines rather than global classification approaches.

Research gap. Despite these advances, several limitations persist in the current literature. First, many previous lumbar MRI studies depend on small private datasets, limited pathology definitions, or single-view classification. Second, although some methods incorporate region-of-interest extraction or multi-stage processing, few have reformulated lumbar MRI diagnosis as a localized, ROI-level pathology classification task using large standardized benchmarks. Third, while attention-based methods have been explored, attention models have rarely been integrated with benchmark-scale, pathology-centered, multi-view-aware lumbar classification frameworks.

Materials and methods

Dataset Characteristics.

We evaluate our method on the RSNA 2024 Lumbar Spine Degenerative Classification dataset [31, p. 19], a large-scale public benchmark for lumbar spine MRI analysis. The dataset contains 2,697 patient studies and 8,593 MRI series, collected from multiple institutions and countries, providing substantial diversity in scanner settings, imaging protocols, and patient populations. All images are provided in DICOM format, preserving slice-level metadata and allowing anatomically precise localization.

The dataset focuses on degenerative abnormalities across the five lumbar disc levels, namely L1/L2, L2/L3, L3/L4, L4/L5, and L5/S1. For each level, annotations are provided for five clinically significant pathological conditions: spinal canal stenosis, left neural foraminal narrowing, right neural foraminal narrowing, left subarticular stenosis, and right subarticular stenosis. Each condition is originally graded into three severity levels: Normal/Mild, Moderate, and Severe. In this work, we formulate the task as binary abnormality detection, grouping Moderate and Severe cases as positive samples and Normal/Mild cases as negative samples.

A key advantage of this benchmark is its multi-view MRI composition, which includes Axial T2, Sagittal T1, and Sagittal T2/STIR sequences. These views provide complementary anatomical and pathological information highly relevant to lumbar disease assessment. In particular, spinal canal stenosis is best characterized in sagittal T2/STIR views, neural foraminal narrowing is primarily assessed in sagittal T1 sequences, and subarticular stenosis is most clearly visible in axial T2 slices. This clinically meaningful correspondence between pathology type and imaging plane makes the dataset especially suitable for view-aware representation learning. Figure 1 visualizes samples from different view types along with their positive/negative and coordinate labels.

Figure 1. Representative positive ROI samples from the three MRI view types used in this study: Axial T2, Sagittal T1, and Sagittal T2/STIR. Red markers indicate expert-provided pathology localization coordinates used for ROI extraction

Figure 2. Per-condition distribution of unique MRI slices

Figure 2 summarizes two important characteristics of the dataset. The level-wise positive ratio shown in Fig. 2.2 reveals a clear increase in degenerative prevalence toward the lower lumbar spine, with L4/L5 and L5/S1 showing the highest proportions of positive samples, consistent with known clinical patterns of lumbar degeneration. In addition, Fig. 2.1 illustrates the condition-wise distribution of unique MRI slices, depicting the view-dependent sampling characteristics of the dataset across the five disease categories.

Dataset Preprocessing.

To convert the original RSNA lumbar MRI benchmark into a localized pathology classification task, we perform a multi-stage preprocessing pipeline that transforms study-level severity annotations into ROI-level pathology-centered samples. The original dataset provides severity labels for five degenerative conditions across five lumbar disc levels, together with expert localization coordinates indicating the approximate pathological center on the corresponding MRI slice. Since our objective is binary abnormality detection, the original three-class severity labels are binarized by treating Normal/Mild as the negative class and combining Moderate and Severe as the positive class.

Next, MRI series descriptions are standardized into three canonical view categories: Axial T2, Sagittal T1, and Sagittal T2/STIR. This normalization step assures consistent handling of variations in scanner-specific naming conventions while preserving the clinically relevant imaging plane and contrast information. The expert localization annotations are then matched with the corresponding study-level labels and MRI view metadata based on study, anatomical level, pathological condition, and slice index. This results in a unified ROI-level metadata representation in which each sample is associated with a specific slice, pathology type, lumbar level, view category, and binary target label.

Using the expert-provided pathology coordinates, we extract fixed-size ROI crops centered around the annotated lesion location from the corresponding DICOM slice. ROI-based preprocessing has been widely shown to improve performance in medical image classification tasks by reducing irrelevant anatomical background and forcing the network to focus on pathology-specific structures such as the spinal canal, neural foramina, and subarticular recesses [22, p. 18]. This design is particularly well-suited for lumbar degenerative disease analysis, where abnormalities are typically localized to small anatomical regions rather than distributed across the entire slice. By restricting the receptive field to the lesion-centered neighborhood, ROI cropping improves spatial supervision and reduces the confounding effect of unrelated vertebral and soft-tissue structures [17, p. 18].

The resulting dataset contains 48,657 ROI-level labeled samples, of which 37,626 are negative, and 11,031 are positive, corresponding to a positive ratio of 22.7%. Table 1 further summarizes the distribution of positive and negative ROI crops across lumbar levels for each of the five pathological conditions. For each annotated target location, a multi-slice sample was constructed by selecting the central slice and its neighboring slices, yielding a stack of 5 adjacent slices per sample.

Data augmentation was applied only to the training set. The augmentation strategy included mild affine and intensity modifications, including random rotation, small translations, slight scaling, and brightness, contrast, and gamma adjustments. To preserve anatomical conformity across the multi-slice input, the same augmentation parameters were applied to all slices within the same sample. Validation samples were processed without augmentation.

Table 1.

Summary of the ROI-level dataset and study-wise split used in our experiments

	Number of Studies	Number Slices	Number of ROI Crops	Positive Ratio
Training	1580	19,626	38,938	22.5%
Validation	394	4,902	9,719	23.3%
Total	1,974	24,528	48,657	22.7%

Model Architecture.

The proposed network follows a hybrid convolutional–attention design for localized lumbar MRI pathology classification. Instead of relying on a single image slice, the model uses a neighborhood of K=5 adjacent slices centered around the target region of interest (ROI), allowing it to capture both local appearance and limited through-plane anatomical context. The input to the model is a tensor of shape B×5×3×224×224, where each ROI crop is extracted at 160×160 pixels around the expert-provided pathology coordinate and resized to 224×224.

Each slice is processed independently via a shared ResNet-18 encoder [32, p. 19], initialized from ImageNet-pretrained weights [33, p. 19]. All layers up to the final global average pooling are retained, producing a 512-dimensional feature vector per slice. Weight sharing across the K slices ensures a consistent feature space while keeping the number of parameters manageable. The B×5 slices are reshaped into a single batch of B⋅5 images for efficient forward computation, then reshaped back to B×5×512.

Slice-Position Encoding and Gated Attention Pooling: To preserve spatial ordering along the through-plane axis, a learned slice-position embedding of dimension 512 is added to each slice feature based on its relative offset from the center (-2,-1,0, +1,+2). The position-encoded slice features are then aggregated into a single study-level representation via a gated attention pooling mechanism, as formulated by Ilse et al. [34, p. 20]. The gated attention scores are computed as:

where are learned projections, is a learned weight vector, and denotes element-wise multiplication. The aggregated representation is the weighted sum. This mechanism allows the network to assign higher importance to slices containing more discriminative pathological evidence. The pooled feature is then projected from 512 to 256 dimensions via a linear layer.

Bilateral Context Module: To incorporate side-aware anatomical reasoning, the projected image feature is modulated by a bilateral context module. Three learned embeddings encode the pathology pair type (central, left–right foraminal, or left–right subarticular), the anatomical side (left, right, or central), and the lumbar level, each of dimension 256. These are concatenated and passed through a two-layer MLP (768→256→256 with ReLU activation) to produce a context vector. A gating mechanism then controls the influence of this context on the image feature:

where, and is a binary mask that disables bilateral modulation for spinal canal stenosis, which is a midline condition without laterality.

Transformer-Based Multimodal Fusion: The bilateral-modulated image feature is combined with three metadata embeddings, specifically condition type (5 categories), lumbar level (5 levels), and MRI view (3 views), each embedded into 256 dimensions. Together with a learnable [CLS] token, these form a sequence of 5 tokens that is processed by a Transformer encoder with 2 layers, 8 attention heads, a feedforward dimension of 1024, and a dropout rate of 0.1 [35, p. 20]. Self-attention in this stage enables the network to model interactions between appearance-based evidence and structured anatomical descriptors. The output corresponding to the [CLS] token serves as the final fused representation.

Classification Head: The fused 256-dimensional [CLS] representation is passed through a two-layer classification head (256→256 with ReLU and 0.2 dropout, followed by 256→1) to produce a single logit for binary abnormality prediction. The total model contains approximately 13.1M trainable parameters.

Results and discussions

Training Configurations.

The training was conducted on a single NVIDIA Tesla T4 GPU with 16 GB of memory and 320 Tensor Cores in the Kaggle environment. The model was trained using multi-slice ROI-centered samples, where each training example consisted of 5 adjacent slices centered around the target pathological location. With a batch size of 16, this corresponds to an effective input of 80 slices per optimization step. The ROI crop size was 160×160 pixels, and each crop was resized to 224×224 before being fed into the network.

The model was optimized using AdamW with an initial learning rate of and a weight decay of. Training was performed for up to 18 epochs with a batch size of 16. To compensate for class imbalance, the positive class weight was computed separately for the training set from the ratio of negative to positive samples. A cosine annealing learning rate schedule was used throughout training. Mixed-precision training was employed to improve memory efficiency and computational speed. In addition, early stopping was applied based on the validation AUROC in order to reduce overfitting, with training terminated when no improvement was observed for several consecutive epochs.

Evaluation Methods.

The main evaluation metric was the area under the receiver operating characteristic curve (AUROC). The AUROC was first computed over all validation samples to provide an overall estimate of binary discrimination performance. In addition, condition-wise AUROC values were calculated separately for each pathological category whenever both positive and negative samples were present in the corresponding validation subset. This allowed the evaluation to reflect not only the model's global performance but also its behavior across different lumbar degenerative conditions. For model selection, the checkpoint with the highest validation AUROC was retained.

Results.

The proposed bilateral attention-based multi-slice framework demonstrated strong discriminative performance on the RSNA LumbarDISC benchmark, achieving an AUROC of 0.952, indicating excellent binary abnormality discrimination.

Condition-wise analysis additionally revealed consistently high performance across all five degenerative pathology categories. The highest AUROC was observed for Spinal Canal Stenosis with 0.979, followed by Left Subarticular Stenosis with 0.960, Left and Right Neural Foraminal Narrowing with 0.943 and 0.940, and Right Subarticular Stenosis with 0.936. These results indicate robust generalization throughout different lumbar degenerative conditions and demonstrate that the proposed architecture reliably captures both pathology-centered ROI features and bilateral anatomical context.

Table 2 compares the performance of the proposed framework against two baseline configurations. The first baseline is a vanilla ResNet-18 trained on full MRI slices without ROI cropping, multi-slice context, metadata fusion, or bilateral modeling, serving as a minimal reference point. The second baseline retains the full proposed architecture, including multi-slice attention pooling, slice-position encoding, transformer-based metadata fusion, and the classification head, but removes the Bilateral Context Module, isolating the contribution of side-aware anatomical reasoning.

As shown in Table 2, the ResNet-18 baseline achieves an overall AUROC of 0.884, confirming that image-level classification without localized ROI extraction and contextual modeling provides substantially lower discriminative performance. The addition of ROI-centered cropping, multi-slice attention pooling, and transformer-based metadata fusion yields an overall AUROC of 0.921, representing a 3.7 gain in comparison to Baseline 1. This gain demonstrates that pathology-centered spatial targeting and a structured clinical context are both critical for accurate classification of lumbar degenerative disease.

At the condition level, Baseline 1 shows the weakest performance on neural foraminal narrowing, with AUROCs of 0.798 and 0.794 for the left and right sides respectively, suggesting that image-level classification without ROI targeting struggles most with subtle foraminal pathology. Baseline 2 substantially recovers these gaps, improving left and right foraminal narrowing to 0.921 and 0.909, while also achieving strong performance on spinal canal stenosis at 0.965.

The full proposed model, which further incorporates the Bilateral Context Module, achieves the highest overall AUROC of 0.952. The improvement in comparison to Baseline 2 is most pronounced for the lateralized conditions: left neural foraminal narrowing improves from 0.921 to 0.943, right neural foraminal narrowing from 0.909 to 0.940, left subarticular stenosis from 0.918 to 0.960, and right subarticular stenosis from 0.913 to 0.936. In contrast, spinal canal stenosis, a midline condition for which the bilateral mask disables side-aware modulation, shows only a minimal change between Baseline 2 and the proposed model (0.965 vs. 0.979). This pattern confirms that the Bilateral Context Module selectively enhances classification for conditions with inherent anatomical laterality, corroborating its design as a side-aware gating mechanism rather than a generic feature transformation.

Table 2.

Ablation study comparing baselines and the proposed model on the validation set. All values are AUROC. Best results per condition are shown in bold.

Condition	Baseline 1:	Baseline 2:	Proposed
	ResNet-18	w/o Bilateral
Spinal Canal Stenosis	0.899	0.965	0.979
Left Neural Foraminal Narrowing	0.798	0.921	0.943
Right Neural Foraminal Narrowing	0.794	0.909	0.940
Left Subarticular Stenosis	0.929	0.918	0.960
Right Subarticular Stenosis	0.932	0.913	0.936
Overall	0.884	0.921	0.952

Discussion

The strong AUROC performance across all pathology categories suggests that localized ROI-centered learning, combined with multi-slice contextual aggregation, is highly effective for lumbar MRI analysis. Particularly strong performance on spinal canal stenosis may be attributed to the clearer structural deformation patterns present in sagittal T2/STIR views, while the slightly lower AUROC for right neural foraminal narrowing likely reflects the greater difficulty of subtle bilateral foraminal abnormalities. Overall, these outcomes support the hypothesis that side-aware contextual fusion and slice-level attention improve robustness for anatomically localized spinal disease classification.

Despite these promising results, several limitations should be acknowledged. First, the proposed framework was evaluated on a single public benchmark, and although the RSNA LumbarDISC dataset is large and diverse, additional external validation on independent multi-center datasets would be necessary to further assess generalizability. Second, the current study reformulates the task as binary abnormality detection by merging moderate and severe cases into a single positive class. While this simplification improves training stability and supports reliable abnormality screening, it does not fully reflect the fine-grained severity assessment required in real clinical practice. Third, the framework relies on expert-provided localization coordinates for ROI extraction, which may limit direct applicability in fully automated end-to-end deployment settings unless combined with a preceding localization module. Finally, although the multi-slice design captures limited through-plane context, broader volumetric modeling across the entire MRI series may further improve performance for complex degenerative patterns.

Future work will therefore focus on extending the framework in several directions. A natural next step is to move beyond binary classification toward multi-class severity grading, enabling more clinically informative prediction of normal/mild, moderate, and severe degeneration. In addition, integrating an automatic ROI localization stage would enable the pipeline to operate in a fully end-to-end manner without reliance on manually annotated coordinates. The current architecture uses ResNet-18 as the shared backbone due to computational constraints; exploring stronger encoders such as ResNet-50, ConvNeXt, or EfficientNet-B3 may yield improved feature representations. These extensions would help bridge the gap between benchmark-level performance and real-world computer-aided lumbar MRI diagnosis.

Conclusion

This study presented a novel bilateral attention-based deep learning framework for localized lumbar degenerative disease classification from MRI. By combining pathology-centered ROI extraction, multi-slice contextual learning, slice-attention pooling, side-aware bilateral context modeling for lateralized pathologies, and transformer-based fusion of anatomical metadata, the proposed method effectively captures subtle degenerative abnormalities across multiple lumbar conditions. Ablation experiments confirmed that the Bilateral Context Module provides targeted improvements for lateralized conditions such as neural foraminal narrowing and subarticular stenosis, while leaving midline spinal canal stenosis unaffected, validating its role as a structurally motivated design choice. Experimental validation on the RSNA LumbarDISC benchmark demonstrated strong performance, achieving an AUROC of 0.952 in a grouped patient-level evaluation. These findings highlight the potential of anatomically grounded AI systems for reliable computer-aided lumbar MRI diagnosis and future clinical decision support.

References:

J. N. Katz and M. B. Harris, “Lumbar spinal stenosis,” New England Journal of Medicine, vol. 358, no. 8, pp. 818–825, 2008.
J. Lurie and C. Tomkins-Lane, “Management of lumbar spinal stenosis,” BMJ, vol. 352, p. h6234, 2016.
F. Zaina, C. Tomkins-Lane, E. Carragee, and S. Negrini, “Surgical versus non-surgical treatment for lumbar spinal stenosis,” Cochrane Database of Systematic Reviews, no. 1, p. CD010264, 2016.
J. N. Katz, Z. E. Zimmerman, H. Mass, and M. C. Makhni, “Diagnosis and management of lumbar spinal stenosis: A review,” JAMA, vol. 327, no. 17, pp. 1688–1699, 2022.
K. M. C. Cheung, J. Karppinen, D. Chan, D. W. H. Ho, Y.-Q. Song, P. Sham, K. S. E. Cheah, J. C. Y. Leong, and K. D. K. Luk, “Prevalence and pattern of lumbar magnetic resonance imaging changes in a population study of one thousand forty-three individuals,” Spine, vol. 34, no. 9, pp. 934–940, 2009.
M. A. Adams and P. J. Roughley, “What is intervertebral disc degeneration, and what causes it?” Spine, vol. 31, no. 18, pp. 2151–2161, 2006.
J. A. Carrino, J. D. Lurie, A. N. A. Tosteson, T. D. Tosteson, E. J. Carragee, J. N. Weinstein, R. Herzog, and others, “Lumbar spine: Reliability of MR imaging findings,” Radiology, vol. 250, no. 1, pp. 161–170, 2009.
C. Schizas, N. Theumann, A. Burn, R. Tansey, D. Wardlaw, F. W. Smith, and G. Kulik, “Qualitative grading of severity of lumbar spinal stenosis based on the morphology of the dural sac on magnetic resonance images,” Spine, vol. 35, no. 21, pp. 1919–1924, 2010.
S. Lee, J. W. Lee, J. S. Yeom, K.-J. Kim, H.-J., Kim, S. K., Chung, and H. S. Kang, “A practical MRI grading system for lumbar foraminal stenosis,” AJR American Journal of Roentgenology, vol. 194, no. 4, pp. 1095–1098, 2010.
S. Gitto, F. Serpi, D. Albano, G. Risoleo, S. Fusco, C. Messina, L. M. Sconfienza, and others, “AI applications in musculoskeletal imaging: A narrative review,” European Radiology Experimental, vol. 8, p. 22, 2024.
W. Liawrungrueang, J.-B. Park, W. Cholamjiak, P. Sarasombath, and K. D. Riew, “Artificial intelligence-assisted MRI diagnosis in lumbar degenerative disc disease: A systematic review,” Global Spine Journal, vol. 15, no. 2, pp. 1405–1418, 2024.
J. T. P. D. Hallinan, L. Zhu, K. Yang, and others, “Deep learning model for automated detection and classification of central canal, lateral recess, and neural foraminal stenosis at lumbar spine MRI,” Radiology, vol. 300, no. 1, pp. 130–138, 2021.
T. J. Richards, et al., “The RSNA lumbar degenerative imaging spine classification (LumbarDISC) dataset,” arXiv preprint arXiv:2506.09162, 2025.
P. Azimi, T. Yazdanian, E. C. Benzel, H. N. Aghaei, S. Azhari, S. Sadeghi, and A. Montazeri, “A review on the use of artificial intelligence in spinal diseases,” Asian Spine Journal, vol. 14, no. 4, pp. 543–571, 2020.
W. Liawrungrueang, J.-B. Park, W. Cholamjiak, P. Sarasombath, and K. D. Riew, “Artificial intelligence-assisted MRI diagnosis in lumbar degenerative disc disease: A systematic review,” Global Spine Journal, 2024.
J. V. Shinde, Y. V. Joshi, and R. R. Manthalkar, “Machine learning-based approach for segmentation of intervertebral disc degeneration from lumbar section of spine using MRI images,” Bio-Algorithms and Med-Systems, vol. 18, no. 1, pp. 55–68, 2022.
Ruchi, D. Singh, J. Singla, M. K. I. Rahmani, S. Ahmad, M. U. Rehman, S. Jha, D. Prashar, and J. Nazeer, “Lumbar spine disease detection: Enhanced CNN model with improved classification accuracy,” IEEE Access, vol. 11, pp. 141889–141901, 2023.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” International Conference on Learning Representations, 2015.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” International Conference on Machine Learning, 2019.
W. Mbarki, M. Bouchouicha, S. Frizzi, F. Tshibasu, L. B. Farhat, and M. Sayadi, “Lumbar spine discs classification based on deep convolutional neural networks using axial view MRI,” Interdisciplinary Neurosurgery: Advanced Techniques and Case Management, vol. 22, 2020.
A. Al-kubaisi and N. N. Khamiss, “A transfer learning approach for lumbar spine disc state classification,” Electronics, vol. 11, no. 1, p. 85, 2022.
B. M. Abuhayi, Y. A. Bezabh, and A. M. Ayalew, “Lumbar disease classification using an involutional neural-based VGG nets (INVGG),” IEEE Access, vol. 12, pp. 27518–27529, 2024.
U. U. Bharadwaj, M. Christine, S. Li, D. Chou, V. Pedoia, T. M. Link, C. T. Chin, and S. Majumdar, “Deep learning for automated, interpretable classification of lumbar spinal stenosis and facet arthropathy from axial MRI,” European Radiology, vol. 33, no. 5, pp. 3435–3443, 2023.
J. Lin, H. Zhang, and H. Shang, “Convolutional neural network incorporating multiple attention mechanisms for MRI classification of lumbar spinal stenosis,” Bioengineering, vol. 11, no. 10, 2024.
K. Chen, L. Zheng, H. Zhao, and Z. Wang, “Deep learning-based intelligent diagnosis of lumbar diseases with multi-angle view of intervertebral disc,” Mathematics, vol. 12, no. 13, 2024.
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-CNN,” in 2017 IEEE international conference on computer vision, 2017, pp. 2961–2969.
R. Dong, X. Cheng, M. Kang, and Y. Qu, “Classification of lumbar spine disorders using large language models and MRI segmentation,” BMC Medical Informatics and Decision Making, vol. 24, no. 1, 2024.
J. T. P. D. Hallinan, L. Zhu, W. Zhang, D. S. W. Lim, S. Baskar, X. Z. Low, K. Y. Yeong, E. C. Teo, N. B. Kumarakulasinghe, Q. V. Yap, Y. H. Chan, S. Lin, J. H. Tan, N. Kumar, B. A. Vellayappan, B. C. Ooi, S. T. Quek, and A. Makmur, “Deep learning model for classifying metastatic epidural spinal cord compression on MRI,” Frontiers in Oncology, vol. 12, 2022.
E. Zhang, M. Yao, Y. Li, Q. Wang, X. Song, Y. Chen, K. Liu, W. Zhao, X. Xing, Y. Zhou, F. Meng, H. Ouyang, G. Chen, L. Jiang, N. Lang, S. Jiang, and H. Yuan, “Deep learning model for the automated detection and classification of central canal and neural foraminal stenosis upon cervical spine magnetic resonance imaging,” BMC Medical Imaging, vol. 24, no. 1, 2024.
“RSNA 2024 lumbar spine degenerative classification.” Kaggle Competition, 2024.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp. 770–778.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
M. Ilse, J. M. Tomczak, and M. Welling, “Attention-based deep multiple instance learning,” in Proceedings of the 35th international conference on machine learning (ICML), 2018, pp. 2127–2136.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems (NeurIPS), 2017, vol. 30.