DETECTING AI-GENERATED IMAGES USING DEEP LEARNING

ОБНАРУЖЕНИЕ ИЗОБРАЖЕНИЙ, СГЕНЕРИРОВАННЫХ ИИ, С ИСПОЛЬЗОВАНИЕМ ГЛУБОКОГО ОБУЧЕНИЯ

Adilzhan B. Serek A. Meraliyev M.

28.04.2026 91

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Adilzhan B., Serek A., Meraliyev M. DETECTING AI-GENERATED IMAGES USING DEEP LEARNING // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22618 (дата обращения: 15.05.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.145.4.22618

Статья поступила в редакцию: 13.04.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

Recent advances in generative artificial intelligence have enabled the creation of highly realistic synthetic images, raising concerns regarding misinformation and digital authenticity. Consequently, detecting AI-generated images has emerged as a significant research challenge. This study evaluates deep learning approaches for detecting AI-generated images using the CIFAKE dataset. Two convolutional neural network architectures, ResNet18 and EfficientNetB0, are analyzed in a binary classification context. The impact of preprocessing choices on model performance, including input resolution and interpolation methods, is also investigated. Experiments compare native 32 × 32 images with resized 224 × 224 inputs and assess different interpolation strategies, such as nearest neighbor, bilinear, and bicubic. Results demonstrate that both models achieve high detection performance, with EfficientNet-B0 slightly outperforming ResNet18. However, performance decreases when native low resolution images are used, underscoring the importance of input scaling. Additionally, bilinear interpolation yields the best performance, suggesting that preprocessing artifacts may enhance detection. These findings indicate that preprocessing decisions influence deep learning-based detection of AI-generated images and should be carefully considered in practical applications.

АННОТАЦИЯ

В последние годы развитие генеративного искусственного интеллекта позволило создавать высокореалистичные синтетические изображения, что вызывает обеспокоенность в отношении дезинформации и подлинности цифрового контента. В связи с этим обнаружение изображений, сгенерированных ИИ, стало важной исследовательской задачей.

В данной работе проводится оценка методов глубокого обучения для обнаружения изображений, сгенерированных ИИ, с использованием набора данных CIFAKE. В рамках исследования рассматриваются две архитектуры сверточных нейронных сетей — ResNet18 и EfficientNet-B0 — в задаче бинарной классификации.

Также анализируется влияние этапов предварительной обработки на качество моделей, включая разрешение входных изображений и методы интерполяции. Эксперименты включают сравнение изображений с исходным разрешением 32 × 32 и масштабированных до 224 × 224, а также оценку различных методов интерполяции, таких как nearest neighbor, билинейная и бикубическая интерполяция.

Результаты показывают, что обе модели демонстрируют высокую точность обнаружения, при этом EfficientNet-B0 незначительно превосходит ResNet18. Однако при использовании изображений низкого разрешения наблюдается снижение качества, что подчеркивает важность масштабирования входных данных. Кроме того, билинейная интерполяция показывает наилучшие результаты, что может свидетельствовать о том, что артефакты предварительной обработки способствуют улучшению обнаружения.

Полученные результаты подтверждают, что выбор методов предварительной обработки оказывает существенное влияние на эффективность моделей глубокого обучения и должен тщательно учитываться при разработке практических систем обнаружения.

Keywords: AI-generated images; deep learning; CIFAKE; ResNet; EfficientNet

Ключевые слова: изображения, сгенерированные ИИ; глубокое обучение; CIFAKE; ResNet; EfficientNet

Introduction

Recent advances in generative artificial intelligence have substantially improved the quality of synthetic images produced by models such as generative adversarial networks (GANs) [1] and diffusion-based systems [2], [3]. Although these technologies enable numerous applications in creative industries, they also introduce serious challenges related to misinformation, deepfakes, and digital authenticity violation. As a result, detecting AI-generated images has become a critical issue in computer vision. Deep learning methods, particularly convolutional neural networks (CNNs), have demonstrated strong performance in distinguishing between real and synthetic images [4], [5]. Architectures such as ResNet and EfficientNet are widely used due to their ability to learn hierarchical visual features [6], [7].

Despite advancements in model design, the role of input pre- processing remains an often overlooked aspect. Specifically, image resolution and interpolation methods can substantially affect model performance. Many research datasets, including CIFAKE [8], contain low-resolution images (32×32), whereas standard CNN architectures are typically designed for higher- resolution inputs (e.g., 224 × 224). This discrepancy raises the question of how resizing and preprocessing strategies impact detection performance.

This study investigates the performance of deep learning models for detecting AI-generated images and analyzes the effects of preprocessing choices on classification outcomes. Experiments are conducted with the CIFAKE dataset, evaluating two widely used architectures: ResNet18 and EfficientNet-B0. The study focuses on three main aspects: model robustness, resolution sensitivity, and interpolation effects.

The main contributions of this work:

Evaluation of ResNet18 and EfficientNet-B0 performance for AI-generated image detection using the CIFAKE dataset.
Investigation of input resolution impact by comparing native 32 × 32 images with resized 224 × 224 inputs.
Analysis of three interpolation methods (nearest, bilinear, bicubic) and their influence on model performance.
Provision of qualitative analysis using Grad-CAM to visualize model attention and interpret classification decisions.

The results demonstrate that preprocessing decisions play a crucial role in detection performance, emphasizing the importance of carefully designing input pipelines for AI-generated image detection systems.

Related Work

As stated before, detection of AI-generated images has become a concerning problem in recent years due to the rapid development of generative models. Early methods aimed to detect artifacts introduced by GANs, such as inconsistencies in textures, color distributions, and frequency patterns [4], [5], [9]. More recent approaches have explored diffusion-specific detection methods and watermarking-based techniques [10], [11].

Several studies have shown that deep convolutional neural networks can effectively distinguish between real and synthetic images by learning hierarchical feature representations [12], [13]. Transformer-based methods have also been used to this problem and demonstrated competitive performance [14], [15].

The CIFAKE dataset [8] is considered as a benchmark for detecting AI-generated images derived from the CIFAR-10 dataset. Its balanced collection of real and synthetic images enables the evaluation of classification models under controlled conditions.

Another research has also explored explainability techniques, such as Grad-CAM [16]. It allows to visualize important regions in the image that contribute to classification decisions, providing insights into model behavior. This enables people to understand better how models detect synthetic content.

Despite these advances, most existing works focus primarily on model architectures, while the impact of preprocessing choices, such as image resolution and interpolation methods, remains underexplored. In particular, the effect of resizing low- resolution images for use with standard CNN architectures has not been thoroughly analyzed.

This work addresses this gap by experimentally evaluating how resolution and interpolation strategies influence the performance of deep learning models for detecting AI-generated images.

Methodology

Figure 1. Research pipeline diagram of the proposed study

This section describes the dataset, model architectures, training configuration, and experimental design employed in this study.

Dataset

The experiments were conducted using the CIFAKE dataset [8], which is derived from the CIFAR-10 dataset. It consists of a balanced set of real and AI-generated images. All images have a native resolution of 32 × 32 pixels. The dataset is split into training, validation, and test sets.

Models

Two convolutional neural network architectures were evaluated: ResNet18 [6] and EfficientNet-B0 [7]. These models were selected due to their strong performance in image classification tasks and their ability to capture hierarchical visual features. Both models were adapted for binary classification (real vs. AI-generated images) by modifying the final fully connected layer.

Training Setup

All models were implemented using PyTorch. During training, images were resized to 224 × 224 pixels to match the expected input size of standard CNN architectures. Data augmentation techniques such as random horizontal flipping and normalization were applied. The models were trained using the Adam optimizer with a learning rate of 3 × 10⁻⁴. Training was performed for 20 epochs using a batch size of 64.

Experimental Design

Several experiments were conducted to evaluate model performance under different conditions:

Multi-seed evaluation: Training was repeated with multiple random seeds to examine the stability and robustness of the results.
Resolution analysis: Model performance was compared between native 32 × 32 images and resized 224 × 224 images.
Interpolation analysis: Interpolation methods - nearest neighbor, bilinear and bicubic - were evaluated to analyze their effect on classification performance.
Explainability analysis: Grad-CAM [16] was used to visualize model attention and identify regions that con- tribute to classification decisions.

Figure 1 illustrates the research pipeline of this study, which describes the steps of the dataset preparation, pre- processing, model training, classification, and subsequent analysis.

Results

Table 1.

Model performance comparison on the CIFAKE dataset

(224 × 224 RESOLUTION)

This section presents the experimental results obtained from model evaluation under different conditions.

Model Comparison

Table I shows the performance of ResNet18 and EfficientNet-B0 on the CIFAKE dataset using resized 224 × 224 images. Both models achieved high classification performance, with EfficientNet-B0 slightly outperforming ResNet18 in all metrics. In particular, EfficientNet-B0 achieved the high- est F1-score and ROC-AUC, indicating better overall detection capability.

Resolution Analysis

To evaluate the impact of input resolution, experiments were conducted using native 32 × 32 images and resized 224 × 224 images. The results demonstrate a visible performance drop when using native low-resolution inputs. For instance, ResNet18 has an F1-score of 0.958 at 224 × 224 resolution, compared to 0.939 at 32 × 32. A similar trend happens for EfficientNet-B0. This indicates that resizing has an important role in improving classification performance.

Interpolation Analysis

Figure 2. Interpolation experiment results for resized 224 × 224 inputs

Several interpolation methods were evaluated when resizing images to 224 × 224 resolution. The corresponding results are presented in Figure 4. In the tested methods, bilinear interpolation consistently produced the best performance, followed by bicubic interpolation, while nearest neighbor resulted in slightly lower accuracy. These results suggest that interpolation artifacts may contribute to feature enhancement useful for classification.

Multi-seed Evaluation

To assess the stability of the models, training was repeated using multiple random seeds. The results showed minimal variation across runs, indicating that both architectures pro- duce stable and reliable performance.

Explainability Analysis

Figure 3. Grad-CAM visualizations for ResNet18 (TP, TN, FP, FN)

Figure 4. Grad-CAM visualizations for EfficientNet-B0 (TP, TN, FP, FN).

Grad-CAM visualizations were used to analyze model attention. Representative examples for ResNet18 are shown in Figure 2, while the corresponding visualizations for EfficientNet-B0 are presented in Figure 3. The results indicate that both models focus on relevant regions of the images when making predictions. However, some cases show that attention maps highlight background or non-semantic regions, suggesting that models may rely on subtle artifacts rather than high-level semantic features.

Discussion

The experimental results show several important insights on the detection of AI-generated images using deep learning models.

Firstly, both ResNet18 and EfficientNet-B0 achieved high classification performance, proving that convolutional neural networks are effective in distinguishing between real and synthetic images. The relatively small difference between the models suggests that the architecture choice, while important, may not be the primary factor influencing performance in this task.

Secondly, the resolution analysis demonstrates that input preprocessing plays a crucial role in model performance. Despite the original dataset containing low-resolution images (32 × 32), resizing inputs to 224 × 224 significantly improves classification. This finding indicates that resizing may introduce useful patterns or enhance existing features for facilitate detection.

Thirdly, the interpolation experiments show that different resizing strategies can impact model performance. Bilinear interpolation consistently outperformed other methods, indicating that smoother transformations may preserve or enhance relevant features for classification. This observation emphasizes that preprocessing decisions should not be considered as trivial steps in the pipeline.

The Grad-CAM analysis reveals that models do not always rely on semantically meaningful regions of the image. In some cases, attention maps indicate that models focus on background patterns or subtle artifacts, which may be indicative of differences between real and generated images. This corresponds to prior observations that generative artifacts can be detectable in controlled settings [4], [5]. At the same time, this raises questions about the generalization of such models to more complex and high-resolution datasets.

Overall, the results suggest that while deep learning models are able to detect AI-generated images with high accuracy, their performance is strongly dependent on preprocessing choices. This emphasizes the need for careful experimental design and highlights potential limitations when applying such models in real-world scenarios. Future evaluation should consider large-scale datasets such as GenImage [17] and real- world deepfake datasets such as FaceForensics++ and Celeb- DF [18], [19].

Conclusion

This study investigated deep learning approaches for detecting AI-generated images using the CIFAKE dataset. Two convolutional neural network architectures, ResNet18 and EfficientNet-B0, were evaluated in a binary classification set- ting.

The results indicate that both models achieve high detection performance, with EfficientNet-B0 slightly outperforming ResNet18. However, the experiments also revealed that pre-processing choices significantly influence model performance. Specifically, resizing low-resolution images to higher resolutions improves classification results, while different interpolation methods yield measurable performance variations.

These findings highlight that input preprocessing is an essential component in AI-generated image detection pipelines and should be carefully considered alongside model architecture.

Future work may include evaluating these approaches on higher-resolution and more diverse datasets, as well as investigating model robustness against different types of generative models and real-world image manipulations.

References:

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems (NeurIPS), 2014, pp. 2672–2680, publisher/ISBN unspecified.
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems (NeurIPS), 2020.
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2022, cVPR DOI unspecified.
S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn- generated images are surprisingly easy to spot... for now,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2020, pp. 8692–8701.
R. Durall, M. Keuper, and J. Keuper, “Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015. [Online]. Available: https://arxiv.org/abs/1512.03385
M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” 2020. [Online]. Available: https: //arxiv.org/abs/1905.11946
J. J. Bird and A. Lotfi, “Cifake: Image classification and explainable identification of ai-generated synthetic images,” IEEE Access, vol. 12, pp. 15 642–15 650, 2024.
L. Nataraj, T. M. Mohammed, S. Chandrasekaran, A. Flenner, J. H. Bappy, A. K. Roy-Chowdhury, and B. S. Manjunath, “Detecting GAN generated fake images using co-occurrence matrices,” 2019.
Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “Dire for diffusion-generated image detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 22 388–22 398.
P. Fernandez, G. Couairon, H. Je´gou, M. Douze, and T. Furon, “The stable signature: Rooting watermarks in latent diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
D. Park, H. Na, and D. Choi, “Performance comparison and visualization of ai-generated-image detection methods,” IEEE Access, vol. 12, pp. 62 609–62 627, 2024.
A. R. Gunukula, H. Das Gupta, and V. S. Sheng, “Detecting ai-generated images using a hybrid resnet-se attention model,” Applied Sciences, vol. 15, no. 13, p. 7421, 2025.
D. Lamichhane, “Advanced detection of ai-generated images through vision transformers,” IEEE Access, vol. 13, pp. 3644–3652, 2025.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,” in International Conference on Learning Representations (ICLR), 2021.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626.
M. Zhu, H. Chen, Q. Yan, X. Huang, G. Li, B. Zheng, S. Cui et al., “Genimage: A million-scale benchmark for detecting ai-generated im- ages,” in Advances in Neural Information Processing Systems (NeurIPS), 2023, author list may be incomplete in this BibTeX; verify from camera- ready.
A. Ro¨ssler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large- scale challenging dataset for deepfake forensics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.