Master’s student in Software Engineering at the School of Information Technology and Engineering, Kazakh-British Technical University, Kazakhstan, Almaty
FRAUD DETECTION IN CREDIT CARD TRANSACTIONS USING MACHINE LEARNING: A COMPARATIVE ANALYSIS
ABSTRACT
The accelerated development in digital financial transactions has enlarged the risk of fraudulent activities, which increased the need for more efficient fraud detection methods. This research study introduces a ML based approach to detect fraud in credit card transactions. Paper consists of sections, which summarize detailed exploratory data analysis, data preprocessing, model training, hyperparameter optimization, and discussion of the results. Various model trainings on a large credit card dataset demonstrated that the best model achieved an AUC of 96.82% and an average precision of 88%. Considering the data-driven approach applied in the study, this framework can be dynamically altered to detect new emerging fraud patterns. Additionally, the capability of the model to handle large volumes of data in real-time makes it well-suited for financial institutions with high transaction loads.
АННОТАЦИЯ
Быстрое развитие цифровых финансовых транзакций увеличило риск мошеннических действий, что увеличило потребность в более эффективных методах обнаружения мошенничества. В данной статье представлено комплексное исследование, которое использует методы машинного оьучения для выявления мошенничества в транзакциях по кредитным картам. Статья состоит из разделов: эксплораторный анализ данных, предварительная обработка данных, обучение модели с различными методами балансировки классов и оптимизация гиперпараметров. Эксперименты на большом наборе данных кредитных карт показали, что лучшая модель достигла AUC 96,82% и средней точности 88%. Поскольку структура обучается с использованием подхода, управляемого данными, она может динамически адаптироваться к новым схемам мошенничества, обеспечивая высокую точность выявления подозрительных действий. Кроме того, способность модели обрабатывать большие объемы данных в режиме реального времени делает ее более подходящей для финансовых учреждений, управляющих высокими транзакционными нагрузками.
Keywords: Machine Learning, Random Forest, Logistic Regression, Fraud, Hyperparameter Optimization, XGBoost, Gradient Boosting
Ключевые слова: Случайный лес, Логистическая регрессия, Мошенничество, Оптимизация гиперпараметров, XGBoost, Градиентный бустинг
Introduction
With the rise of digital banking and online transactions, credit card fraud has become a growing concern for both financial institutions and customers. Although fraudulent transactions make less than 1% of all transactions, they can lead to billions of dollars in financial losses each year. Disputed transactions also undermine trust for online payment systems, which make customers hesitant and reluctant to use them. Traditional rule-based fraud detection methods, which rely on predefined patterns and manual oversight, are becoming less effective as malicious actors unremittingly develop more sophisticated methods.While recent progress in machine learning offers promising tools with automated pattern recognition and anomaly detection, there is a significant demand for more sophisticated and adaptable approaches to fraud prevention.
Credit card fraud has a massive economic impact. According to the Nilson Report global losses due to fraud activities estimated at $33.83 billion in 2023 (Nilson Report, 2023). Apart from identifying deceptive transactions, the major challenge for financial institutions is to minimize financial losses while avoiding unnecessary disturbances to legitimate customers. A system which responds too many false positives can frustrate users, leading to declined transactions and lost revenue, while a system that misses fraudulent transactions can cause serious financial damage. Highly imbalanced nature of transactional records is the major challenge, because there might be only a few fraudulent transactions. For this reason, it is highly intricate to detect fraud effectively with traditional models without false alarms. This research study focuses on the development and evaluation of machine learning models, specifically designed to handle imbalanced datasets. The primary goal is to develop a more robust fraud detection framework which minimizes any negative impact on the overall customer experience. This paper investigates the effectiveness of machine learning techniques for credit card fraud detection, addressing key challenges such as class imbalance and model performance optimization. The study aims to evaluate different fraud detection approaches and highlight their practical applications in real-world financial systems. The research emphasizes the balance between fraud detection accuracy and minimizing false positives to ensure a seamless customer experience while maintaining security.
Literature Review
The majority of the research conducted in the field of fraud detection substantially addresses Random Forest as the most accurate machine learning model to use. In general, existing approaches can be categorized into several main groups: deep learning, machine learning, ensemble learning and feature ranking methods, and user authentication strategies (Alarfaj et al., 2022). Figure 1 illustrates the commonly used payment card authorization process for credit card authentication. Generally, authentication methods fall into two main categories: password-based authentication and biometric authentication.
/Serzhan.files/image001.png)
Figure 1. Payment card authorisation process (Alarfaj et al., 2022)
West and Bhattacharya (2016) highlight that data mining techniques can be time-consuming when working with large datasets. Additionally, as noted in papers written by Zareapoor et al. (2012) and Zojaji et al. (2016) highly correlated information is an indicative challenge in preparing credit card transaction data. Consequently, data might be misclassified for the reason that fraud ones resemble legitimate transactions. Another obstacle encountered by researches is handling categorical features. West and Bhattacharya (2016) and Zareapoor et al. (2012) debate on the challenges related to choosing appropriate detection algorithms and selecting relevant features. Predominant machine learning models require lots of time for training. For this reason, it is important to approach feature selection in detail in order to identify the most relevant attributes that characterize fraudulent behavior.
A hybrid approach, which combines Random Forest and Isolation Forest to identify anomalous transactions has been introduced by Vynokurova et al. (2020). Their proposed model consists of two main components: using an unsupervised and supervised learning approaches. According to Sulaiman et al. (2022), albeit Random Forest algorithms are highly effective for classification tasks, they have several limitations when applied to real-time credit card fraud detection. The training process for this algorithm is slower compared to other techniques, which makes it inapplicable for real-time fraud detection.
A comprehensive framework has been proposed by Hashemi et al. (2022), which integrates hyperparameter tuning with ensemble methods to manipulate class imbalance. Their results indicate that deep learning algorithms have shown considerable prospect in enhancing credit card fraud detection. Model proposed in this study exploits an artificial neural network (ANN) with federated learning, which preserves data confidentiality. Notwithstanding previous research on the use of ANN for fraud detection, they have been directed on lab-based datasets. While model by Hashemi et al. (2022) consolidates fraud detection efficiency and reliability.
Study done by Rong Zhang et al. (2023) analyzed various classifiers using Support Vector Machine (SVM) method. Their model trained on the BankSim dataset resulted in 99.23% accuracy. Although this experiment highlights prospects of SVM model, it has been trained on the well-structured transaction data. For this reason, it might not be suitable for detecting fraud in the real-world transaction data.
Further research in the sphere of fraud detection accentuates use of deep learning and neural network architectures in particular (Thennakoon, 2019). Their flexibility allows to perform more efficiently for anomaly detection tasks. Nevertheless, challenges such as excessive computational demands and training periods present significant challenge for practical deployment.
Materials and Methods
Figure 2 illustrates the data preprocessing and model training workflow in this research study. The following sections will explain each of the steps in detail.
/Serzhan.files/image002.jpg)
Figure 2. Model Training Workflow
Data Description and Preprocessing
Most of the real-world datasets come with lots of missing data, but in this case a well-organized and complete dataset from the Kaggle website is analyzed. The major two features "Time" and "Amount," remain in their original forms, while the remaining features are principal components derived from Principal Component Analysis. As typical in fraud detection, the dataset is highly imbalanced, with fraudulent transactions representing only 0.1727% of the total (only 492 fraudulent out of 284,807), which presents an indicative modelling challenge for the study. The dataset's structure, feature distributions, and the extent of class imbalance was inspected using descriptive statistics and visualizations. The major key insight revealed from this analysis is that most of the transactions were for relatively small value, which has been validated by median of $22.00. In general, the range of the transaction amounts were from $0 to $25,691.16, with a mean of $88.35.
/Serzhan.files/image003.png)
Figure 3. Transaction distribution over time
Figure 3 shows overall distribution of transactions over time, and it can be noticed that most of the transactions occur around lunch time (between 11 am and 4 pm) and after 6 pm. Figure 4 below illustrates comprehensive correlation matrix of features in the dataset. All V1-V28 features (PCA components) follow different distributions with varying ranges.
/Serzhan.files/image004.png)
Figure 4. Correlation matrix of features
Model Training and Class Balancing
In this study four machine learning classifiers have been trained: Logistic Regression, Random Forest, Gradient Boosting and XGBoost. For each classifier, three class balancing methods were applied:
- No Sampling: Training on the original imbalanced dataset.
- SMOTE: Synthetic Minority Over-sampling Technique to balance the classes by creating synthetic examples of the minority class, resulting in a balanced training set of 454,902 examples.
- Undersampling: Reducing the majority class to achieve balance (394 examples for each class), resulting in a much smaller training set of 788 examples.
Before model training data was split into training (80%) and test (20%) sets using stratified sampling to preserve class distribution, resulting in 227,845 training examples and 56,962 test examples.
Hyperparameter Optimization
For the best-performing model (XGBoost without sampling) (figures 5 and 6), hyperparameter optimization was conducted using grid search with stratified 5-fold cross-validation. The grid search explored 2,592 parameter combinations, with a total of 12,960 models trained during this optimization process.
/Serzhan.files/image005.png)
Figure 5. Receiver Operating Characteristic curve for XGBoost without sampling
Results and Discussion
The best performance was achieved by the XGBoost model without sampling (Figure 6), which reached an AUC of 0.9682 and an average precision of 0.8800. This model also demonstrated excellent computational efficiency, requiring only 7.24 seconds for training.
/Serzhan.files/image006.png)
Figure 6. Models’ comparison on average precision
Detailed Performance Analysis
Several distinctive patterns have been revealed throughout this study. Table 1 summarizes how each model performed using different sampling techniques and quantifies the results. In most of the cases, SMOTE performed better while comparing AUC, however overall precision value decreased and in a similar way undersampling of model enhanced model recall but decreased precision. The best balance of metrics has been obtained by training the initial imbalanced dataset on XGBoost. Correspondingly, the training time for model with XGBoost required 7.24 seconds, making it the most suitable for detecting fraud transactions in real-time.
Table 1.
Detailed performance analysis
|
Model |
Sampling |
AUC |
Avg. Precision |
Precision |
Recall |
F1-Score |
Training Time (s) |
|
Logistic Regression |
None |
0.9722 |
0.7189 |
0.06 |
0.92 |
0.11 |
1.67 |
|
Logistic Regression |
SMOTE |
0.9698 |
0.7249 |
0.06 |
0.92 |
0.11 |
5.98 |
|
Logistic Regression |
Under |
0.9760 |
0.6778 |
0.04 |
0.92 |
0.07 |
0.14 |
|
Random Forest |
None |
0.9530 |
0.8650 |
0.96 |
0.74 |
0.84 |
225.39 |
|
Random Forest |
SMOTE |
0.9683 |
0.8748 |
0.85 |
0.83 |
0.84 |
646.07 |
|
Random Forest |
Under |
0.9783 |
0.6958 |
0.05 |
0.92 |
0.09 |
0.73 |
|
Gradient Boosting |
None |
0.3469 |
0.1567 |
0.53 |
0.18 |
0.27 |
659.91 |
|
Gradient Boosting |
SMOTE |
0.9807 |
0.6885 |
0.11 |
0.90 |
0.19 |
1359.22 |
|
Gradient Boosting |
Under |
0.9770 |
0.3687 |
0.03 |
0.93 |
0.07 |
1.28 |
|
XGBoost |
None |
0.9682 |
0.8800 |
0.88 |
0.84 |
0.86 |
7.24 |
|
XGBoost |
SMOTE |
0.9777 |
0.8418 |
0.46 |
0.86 |
0.60 |
11.05 |
|
XGBoost |
Under |
0.9798 |
0.6142 |
0.02 |
0.92 |
0.03 |
0.27 |
Comparison with Hashemi et al. (2022)
Hashemi et al. (2022) developed an ensemble framework that leverages Bayesian optimization to tune hyperparameters and mitigate class imbalance. Their method achieved ROC-AUC values around 0.95.
Table 2.
Comparison with Hashemi et al. (2022)
|
Aspect |
Hashemi et al. |
Our Approach |
|
Dataset Size |
284,807 transactions |
284,807 transactions |
|
Best Model |
Ensemble (CatBoost, XGBoost, LightGBM) |
XGBoost without sampling |
|
AUC |
0.9512 |
0.9682 |
|
Average Precision |
0.9043 |
0.8800 |
|
Training Time |
176.5 seconds |
7.24 seconds |
The approach studied in this paper yields a higher ROC-AUC (0.9682) compared to 0.9512, although with a slightly lower average precision. The most significant advantage of the method is computational efficiency, with our model training more than 24 times faster than their ensemble approach (Table 2).
Comparison with Zhang et al. (2023)
Zhang et al. (2023) evaluated multiple classifiers on a synthetic BankSim dataset, reporting that SVM achieved the highest accuracy (99.23%).Table 3.
Comparison with Zhang et al. (2023)
|
Aspect |
Rong Zhang et al. |
Our Approach |
|
Dataset |
Synthetic BankSim |
Real credit card transactions |
|
Features |
Age, gender, payment domain |
PCA-transformed features |
|
Best Model |
SVM |
XGBoost without sampling |
|
Accuracy |
99.23% |
99.98% |
|
F1-Score |
0.79 |
0.86 |
Our XGBoost model achieves both higher accuracy and F1-score compared to their SVM approach. Additionally, our evaluation on real-world data rather than synthetic data provides stronger evidence for practical applicability (Table 3).
Comparison with Thennakoon et al. (2019)
The study done by Thennakoon et al. (2019) on real-time fraud detection using deep learning and ensemble methods emphasizes the advantages of deep learning architectures for capturing complex patterns.
Table 4.
Comparison with Thennakoon et al. (2019).
|
Aspect |
IEEE Study |
Our Approach |
|
Methodology |
Deep Learning (LSTM, GRU) |
Traditional ML (XGBoost) |
|
AUC |
0.9526 |
0.9682 |
|
Inference Time |
124ms per transaction |
≈2ms per transaction (estimated) |
|
Model Size |
248MB |
≈12MB (estimated) |
Although our approach does not integrate deep learning, the robust performance of our XGBoost model demonstrates that traditional machine learning methods, when properly tuned, can achieve competitive results in fraud detection (Table 4). This reinforces the notion that model selection and hyperparameter tuning are critical to performance, even without the computational complexity of deep neural networks. Furthermore, the model offers significant advantages in terms of inference time and model size, making it more suitable for deployment in resource-constrained environments.
Conclusion
To summarize, the aim of the research project was to analyze different machine learning techniques for bettere performance in credit card fraud detection. The thorough literature review has been done on ML techniques and class imbalance methods to examine the cost-effectiveness and to validate the experimental tests. The XGBoost model without sampling has proven to be computationally efficient and accurate model for detection in real-time compared to other models.
Further recommendations to achieve better results include an exhaustive investigation on the hybrid approaches as combining ML models with deep learning components. While sampling techniques like SMOTE and undersampling are often recommended for imbalanced datasets, the results indicate that they should be applied solicitously. Apart from this, the research study has proven that our model effectively balances high detection accuracy with computational efficiency, making it a strong candidate for real-world fraud detection systems.
References:
- Alarfaj F.K., Malik I., Khan H.U., Almusallam N., Ramzan M., Ahmed M. (2022). Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access. — 2022. — 10. — P. 39700–39715. https://doi.org/10.1109/ACCESS.2022.3166891 [in Eng].
- Hashemi S.K., Mirtaheri S.L., Greco S. (2022). Fraud detection in banking data by machine learning techniques. IEEE Access. — 2022. — 11. — P. 3034–3043. https://doi.org/10.1109/ACCESS.2022.3232287 [in Eng].
- Machine Learning Group–ULB. Credit Card Fraud Detection. Kaggle, 2018. [Online]. Available: https://www.kaggle.com/mlgulb/creditcardfraud. [Accessed: 12.12.2024]. [in Eng].
- Nilson Report. Card fraud losses worldwide in 2023. — 2023. [Online]. Available: https://nilsonreport.com/articles/card-fraud-losses-worldwide-in-2023/. [Accessed: 21.02.2025]. [in Eng].
- Sulaiman R. Bin, Schetinin V., Sant P. (2022). Review of machine learning approach on credit card fraud detection. Human-Centric Intelligent Systems. — 2022. — 2(1). — P. 55–68. https://doi.org/10.1007/s44230-022-0000 [in Eng].
- Thennakoon A., Bhagyani C., Premadasa S., Mihiranga S., Kuruwitaarachchi N. (2019). Real-time credit card fraud detection using machine learning. In: 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence). — IEEE. — P. 488–493. https://doi.org/10.1109/CONFLUENCE.2019.8776942 [in Eng].
- Vynokurova O., Peleshko D., Bondarenko O., Ilyasov V., Serzhantov V., Peleshko M. Hybrid machine learning system for solving fraud detection tasks // 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP). — IEEE, 2020. — P. 1–5. — https://doi.org/10.1109/DSMP47368.2020.9204244. [in Eng].
- West J., Bhattacharya M. (2016). An investigation on experimental issues in financial fraud mining. In: 2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA). — IEEE. — P. 1796–1801. https://doi.org/10.1109/ICIEA.2016.7603878. [in Eng].
- Zareapoor M., Seeja K.R., Alam M.A. (2012). Analysis on credit card fraud detection techniques: based on certain design criteria. International Journal of Computer Applications. — 2012. — 52(3). https://doi.org/10.5120/8184-1538. [in Eng].
- Zhang R., Cheng Y., Wang L., Sang N., Xu J. (2023). Efficient Bank Fraud Detection with Machine Learning. Journal of Computational Methods in Engineering Applications. — 2023. — P. 1–10. https://doi.org/10.62836/jcmea.v3i1.030102 [in Eng].
- Zojaji Z., Atani R.E., Monadjemi A.H. (2016). A survey of credit card fraud detection techniques: data and technique oriented perspective. arXiv preprint arXiv:1611.06439. — 2016. https://doi.org/10.48550/arXiv.1611.06439 [in Eng].