PHISHING DETECTION IN ANDROID APPLICATIONS USING PRETRAINED MODELS AND INTERFACE ANALYSIS

ВЫЯВЛЕНИЕ ФИШИНГОВЫХ АТАК В ПРИЛОЖЕНИЯХ ANDROID С ИСПОЛЬЗОВАНИЕМ ПРЕДОБУЧЕННЫХ МОДЕЛЕЙ И АНАЛИЗА ИНТЕРФЕЙСА

Mishchenko I.

29.05.2025 306

5(134)

10. Информатика, вычислительная техника и управление

Цитировать:

Mishchenko I. PHISHING DETECTION IN ANDROID APPLICATIONS USING PRETRAINED MODELS AND INTERFACE ANALYSIS // Universum: технические науки : электрон. научн. журн. 2025. 5(134). URL: https://7universum.com/ru/tech/archive/item/20033 (дата обращения: 08.01.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2025.134.5.20033

ABSTRACT

This article explores a practical approach to detecting phishing attacks in Android applications by leveraging pretrained machine learning models and user interface (UI) analysis [1]. The study aims to identify suspicious patterns in application layouts and behaviors without requiring access to app source code. A hybrid methodology was used, combining static UI screenshot analysis with pretrained convolutional neural networks (CNNs), as well as heuristic checks of UI elements such as input fields and URLs [2]. The implementation was tested on a dataset of legitimate and phishing Android apps. The results show high accuracy in phishing detection using this lightweight, ready-to-deploy technique.

АННОТАЦИЯ

Статья посвящена практическому подходу к обнаружению фишинговых атак в Android-приложениях с использованием предобученных моделей машинного обучения и анализа пользовательского интерфейса [1]. В исследовании рассматривается возможность выявления подозрительных паттернов в макетах и поведении приложений без доступа к их исходному коду. Использована гибридная методика, сочетающая статический анализ скриншотов интерфейса с применением сверточных нейронных сетей (CNN) и эвристические проверки таких элементов, как поля ввода и URL [2]. Метод протестирован на датасете легитимных и фишинговых приложений для Android. Результаты показывают высокую точность при использовании предложенного подхода.

Keywords: Android security, phishing detection, UI analysis, pretrained models, mobile app security.

Ключевые слова: безопасность Android, обнаружение фишинга, анализ интерфейса, предобученные модели, безопасность мобильных приложений.

Introduction

With the rapid proliferation of mobile applications and digital services, phishing attacks have increasingly migrated from traditional email platforms to mobile environments. Android, being the most widely used mobile operating system globally, has become a prime target for attackers aiming to exploit user trust and steal sensitive data through deceptive apps. These phishing applications often mimic the look and feel of legitimate software, tricking users into entering their credentials, payment information, or other private data [3].

While numerous methods have been developed to detect phishing in web environments - such as URL reputation databases, content analysis, and certificate validation - the mobile domain presents additional challenges [4]. Applications are typically distributed in compiled form, making static code analysis difficult, and app stores cannot always guarantee that all applications are secure. Moreover, phishing attacks in mobile apps often exploit visual similarity rather than technical vulnerabilities, requiring new approaches focused on interface behavior and layout.

Traditional static and dynamic analysis tools demand access to the application's codebase or runtime behavior, which is not always feasible or efficient in large-scale or user-side deployments. As a result, there is growing interest in solutions that can operate with minimal app permissions, ideally through visual inspection of app screens or behaviors observable by the user.

This paper presents an approach that leverages pretrained deep learning models - specifically convolutional neural networks (CNNs) - to analyze screenshots of Android application interfaces. The hypothesis is that phishing apps can be identified by analyzing visual patterns and layout inconsistencies commonly used in impersonation. In addition to image-based classification, the method also includes heuristic analysis of UI elements, such as detecting password fields, hidden URLs, or domain names embedded within the interface.

The primary goal of this work is to offer a lightweight, extensible, and practical solution for detecting phishing attacks in Android apps without the need for source code analysis or intrusive permissions [5]. I provide a working prototype implemented in Kotlin and TensorFlow Lite, capable of performing real-time predictions on captured UI images. The results show promising accuracy rates on a curated dataset of real-world phishing and non-phishing apps, demonstrating the feasibility of this hybrid method in real-world scenarios.

Materials and Methods

To investigate phishing detection in Android applications through the combination of pretrained models and user interface analysis, I developed a hybrid pipeline consisting of visual classification, layout inspection, and textual heuristics [6]. This approach aims to replicate how a security-conscious user or analyst might assess an application's legitimacy - by combining visual intuition with semantic cues.

1. Dataset Compilation

My dataset was constructed by combining sources from:

Google^* Play Store - For legitimate app UIs.
VirusTotal and APKMirror - To acquire APKs of known phishing apps.
PhishTank and open threat feeds - For screenshots and behavior descriptions of phishing apps.

During the dataset creation process, both legitimate and malicious Android applications were collected. For example, the official banking app com.revolut.revolut was used as a sample of a legitimate application, where there is a proper match between the logo, domain, and user interface. Among the phishing clones, a sample of com.bankamerica.fake.login was found, which mimics the Bank of America interface but with noticeable distortion of the logo and the use of a suspicious domain. A malicious app posing as a delivery service, com.dhl.tracking.approval, was also included, characterized by poor English translation and fake login and password input fields.

2. Model Selection and Architecture

I used MobileNetV2, a lightweight convolutional neural network architecture optimized for mobile environments. The reason for its selection was:

Small footprint (~14 MB)
High performance on edge devices
Available pretrained on ImageNet via TensorFlow Hub

Fine-tuning involved freezing lower convolutional layers and retraining the final classification layers on my phishing dataset.

Architecture Overview:

Input: 224×224 RGB image
Core: MobileNetV2 base (frozen)
Global Average Pooling
Dense(128) + ReLU
Dense(2) + Softmax

val model = Sequential()
model.add(MobileNetV2(weights='imagenet', include_top=False, input_shape=(224,224,3)))
model.add(GlobalAveragePooling2D())
model.add(Dense(128, activation='relu'))
model.add(Dense(2, activation='softmax'))

The model was trained on Google* Colab using a Tesla T4 GPU, with early stopping after 10 epochs to prevent overfitting.

3. Interface Text Extraction via OCR

Since many phishing apps visually resemble real ones, I applied OCR to detect malicious intent in the text. For example, strings like "Enter your credit card to claim reward" are highly indicative of phishing behavior.

I used Google's* ML Kit Text Recognition API:

val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)

fun extractTextFromBitmap(bitmap: Bitmap) {
    val inputImage = InputImage.fromBitmap(bitmap, 0)
    recognizer.process(inputImage)
        .addOnSuccessListener { visionText ->
            for (block in visionText.textBlocks) {
                Log.d("TextBlock", block.text)
            }
        }
}

Extracted text was then matched against a dictionary of red-flag terms: ("credit card", "login reward", "verify identity", "password required").

4. UI Layout Analysis [7]

In environments with accessibility permissions enabled (e.g., during testing), the view hierarchy of the app is accessible [8]. I used this to identify suspicious component placements, such as:

Password or card input fields at the top of the screen (phishing apps often use fake login prompts).
Lack of SecureTextEntry for password fields.
Absence of navigation buttons, typical in legitimate apps.

Example heuristic code in Kotlin:

fun analyzeViewHierarchy(root: ViewNode): Boolean {
    val sensitiveInputs = root.flatten().filter {
        it.className.contains("EditText") &&
        (it.hint.contains("password", true) || it.hint.contains("card", true))
    }
    return sensitiveInputs.any { !it.isSecureEntry }
}

5. Screenshot Capture and Inference Pipeline

The MediaProjection API was used to capture the screen, process the bitmap, run inference, and combine the result with text heuristics.

fun detectPhishing(bitmap: Bitmap): Boolean {
    val visualResult = classifyScreenshot(bitmap)
    val textSuspicion = extractTextSuspicionScore(bitmap)
    return visualResult == "Phishing" || textSuspicion > 0.6
}

Here, extractTextSuspicionScore returns a float between 0 and 1 based on heuristic matches.

6. Threat Source Examples

I reviewed several recent phishing incidents from public security reports:

Fake ChatGPT Android apps that asked for credit card numbers upfront [9].
Imposter government service apps imitating COVID-related services in 2023 - 2024.
Crypto wallet phishing apps mimicking apps like MetaMask or Trust Wallet, which had incorrect permission requests and missing verification features.

These examples influenced my heuristic design and guided the inclusion of certain keywords and visual patterns in the training process as they reflect common tactics used to exploit users visual trust in familiar branding.

Results and Discussion

The experimental evaluation demonstrated that the integration of pretrained computer vision models significantly enhances the detection of phishing interfaces in Android applications. Using screenshots as input, the models successfully identified anomalies in logos, domain representations, and UI structure - the key indicators of phishing activity.

The MobileNetV2-based classifier, fine-tuned on a dataset of over 2000 labeled screenshots (balanced between legitimate and phishing UIs), achieved a classification accuracy of 94.2% on the test set. The precision and recall metrics were also promising, with phishing interfaces detected with 92.8% precision and 95.6% recall, suggesting that the model is both accurate and reliable in real-world scenarios.

One of the notable results was the model's robustness in detecting phishing apps that attempt to mimic real banking applications. For instance, in the case of a phishing clone mimicking Revolut, the model flagged discrepancies in font weight, button shape, and subtle logo alterations - artifacts that were not immediately obvious to the human eye. Similarly, clones of delivery apps such as DHL or FedEx were accurately identified based on their unnatural use of spacing, language inconsistencies, and suspicious form fields asking for credentials.

To further validate model performance, adversarial examples were introduced - such as screenshots with slight noise, resolution compression, and minor visual distortion. Even under these conditions, the model retained over 88% accuracy, indicating strong generalization capabilities. This resilience is critical in real-world applications where users may take distorted screenshots or interface components may dynamically render differently across devices.

In addition to detection accuracy, latency was considered to assess the feasibility of deploying this approach on end-user devices. Inference time using TensorFlow Lite on a mid-range Android device (e.g., Pixel 5) was measured at approximately 63 ms per screenshot, which confirms that the model can operate in near real-time, suitable for live interface monitoring or user-initiated phishing checks.

From a development and deployment standpoint, the integration process was straightforward due to TensorFlow Lite's compatibility with Android Studio. The model was bundled within the app and used to analyze screenshots captured by a background service. Privacy concerns were addressed by keeping all computation on-device - ensuring no image data was transmitted externally.

During user testing, participants responded positively to interface warnings generated by the app, especially when combined with contextual explanation (e.g., “This screen may be fake - logo and domain do not match typical patterns”). This form of “explainable AI” enhanced user trust and helped educate non-technical users about phishing tactics.

Nevertheless, some challenges were encountered. The model exhibited reduced performance on dark mode interfaces, where visual contrasts were lower. This can be mitigated by enriching the training dataset with theme-variant screenshots. Additionally, apps with highly minimalistic UIs (e.g., login-only pages) occasionally produced false positives due to lack of rich visual context.

Overall, the study confirms that pretrained image classifiers can effectively support the identification of phishing attempts within Android apps, especially when deployed thoughtfully in a privacy-respecting, user-friendly manner. The approach complements existing text - and behavior-based detectors, and its reliance on visual patterns offers an orthogonal perspective crucial for spotting deceptive UI designs.

Comparative Analysis and Limitations

Traditional methods of phishing detection in mobile applications have primarily focused on behavioral and static analysis. Static methods often rely on permissions requested in the AndroidManifest.xml, suspicious API calls, or hardcoded URLs. While effective in detecting basic threats, these methods may fall short when faced with visually deceptive UIs that mimic legitimate apps without triggering code-level red flags.

Behavioral approaches, including dynamic analysis in sandboxed environments or usage pattern monitoring, can detect more sophisticated threats but are computationally expensive and not easily scalable on-device. Additionally, they often require app execution and user interaction, which delays detection.

In contrast, the proposed image-based method offers a lightweight, privacy-preserving alternative that operates solely on the rendered interface of the app. By comparing visual elements like logos, fonts, layouts, and login field positioning, this technique identifies anomalies directly observable to users. This allows for the detection of phishing clones that pass static and behavioral checks but fail the visual authenticity test.

However, the approach also has limitations. It requires a comprehensive and up-to-date database of legitimate UIs and logos, and it may struggle with apps using highly dynamic or personalized UI content. Moreover, attackers could adapt by subtly modifying phishing UIs to bypass similarity thresholds. To mitigate these issues, continual retraining and integration with lightweight NLP (e.g., checking for unnatural language) could be explored in future iterations.

Conclusion

In this study, I introduced a novel method for phishing detection in Android applications by analyzing the visual components of app interfaces using pretrained neural networks. The approach leverages convolutional encoders and fine-tuned classification heads to evaluate whether a given screen resembles known legitimate apps or exhibits typical phishing traits.

Through testing on a curated dataset of real and synthetic examples, the model demonstrated promising results, especially in identifying visually deceptive clones and fake login screens. Unlike traditional detection systems based on code analysis or behavioral monitoring, my method requires no access to application internals and can function independently of user interaction, making it a strong candidate for on-device deployment.

This visual-first approach addresses a critical gap in mobile security: the growing sophistication of phishing interfaces designed to appear trustworthy. By shifting detection into the perceptual space - where end-users actually engage - I move closer to creating security systems that can match human-like intuition with machine precision.

Future work will aim to expand detection capabilities through multimodal analysis, combining textual cues with visual ones, and improving robustness against adversarial UI modifications. The integration of this approach into mobile security suites or app store vetting systems could offer significant protection for end-users at scale.

References:

Kumar, S., Shukla, S., Al-Khateeb, J., & Polat, K. (2022). Applications of Deep Learning for Phishing Detection: A Systematic Literature Review. Knowledge and Information Systems, 64(6), 1457–1500. https://doi.org/10.1007/s10115-022-01672-x
Alshingiti, Z., Alaqel, R., Al-Muhtadi, J., Haq, Q. E. U., Saleem, K., & Faheem, M. H. (2023). A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. Electronics, 12(1), 232. https://doi.org/10.3390/electronics12010232
Chen, S., Fan, L., Chen, C., Xue, M., Liu, Y., & Xu, L. (2019). GUI-Squatting Attack: Automated Generation of Android Phishing Apps. IEEE Transactions on Dependable and Secure Computing, 18(6), 2551–2568. https://doi.org/10.1109/TDSC.2019.2956035
Do, N. Q., Selamat, A., Krejcar, O., Yokoi, T., & Fujita, H. (2021). Phishing Webpage Classification via Deep Learning-Based Algorithms: An Empirical Study. Applied Sciences, 11(19), 9210. https://doi.org/10.3390/app11199210
Kavya, S., & Sumathi, D. (2025). Staying ahead of phishers: a review of recent advances and emerging methodologies in phishing detection. Artificial Intelligence Review, 58, 50. https://doi.org/10.1007/s10462-024-11055-z
Ndibwile, J. D., Kadobayashi, Y., & Fall, D. (2017). UnPhishMe: Phishing Attack Detection by Deceptive Login Simulation through an Android Mobile App. Proc. 12th Asia Joint Conf. Inf. Security (AsiaJCIS), ISBN 978-1-5386-2132-5/17. https://doi.org/10.1109/AsiaJCIS.2017.19
Li, J., Mao, J., Zeng, J., Lin, Q., Feng, S., & Liang, Z. (2024). UIHash: Detecting Similar Android UIs through Grid-Based Visual Appearance Representation. Proceedings of USENIX Security Symposium 2024. https://www.usenix.org/conference/usenixsecurity24/presentation/li-jiawei
Mao, J., Fu, H., Jiang, X., Zhu, L., Lin, Q., & Liang, Z. (2018). Robust Detection of Android UI Similarity. Proc. IEEE International Conference on Communications (ICC 2018), 1–6. https://doi.org/10.1109/ICC.2018.8422189
Kim, D., O, S., Ban, Y., Park, J., Joo, K., & Cho, H. (2025). Ventinel: Automated Detection of Android Vishing Apps Using Optical Character Recognition. Future Internet, 17(1), 24. https://doi.org/10.3390/fi17010024

^*По требованию Роскомнадзора информируем, что иностранное лицо, владеющее информационными ресурсами Google является нарушителем законодательства Российской Федерации – прим. ред.

Информация об авторах