Optimizing credit card fraud detection with random forests and SMOTE.
Journal:
Scientific reports
Published Date:
May 22, 2025
Abstract
In the financial world, Credit card fraud is a budding apprehension in the banking sector, necessitating the development of efficient detection methods to minimize financial losses. The usage of credit cards is experiencing a steady increase, thereby leading to a rise in the default rate that banks encounter. Although there has been much research investigating the efficacy of conventional Machine Learning (ML) models, there has been relatively less emphasis on Deep Learning (DL) techniques. In this article, a machine learning-based system to detect fraudulent transactions using a publicly available dataset of credit card transactions. The dataset, highly imbalanced with fraudulent transactions representing less than 0.2% of the total, was processed using techniques like Synthetic Minority Over-sampling Technique (SMOTE) to handle class imbalance. To predict credit card default, this study evaluates the efficacy of a DL (Deep Learning) model and compares it to other ML models, such as Decision Tree (DT) and Adaboost. The objective of this research is to identify the specific DL parameters that contribute to the observed enhancements in the accuracy of credit card default prediction. This research makes use of the UCI ML repository to access the credit card defaulted customer dataset. Subsequently, various techniques are employed to pre-process the unprocessed data and visually present the outcomes through the use of exploratory data analysis (EDA). Furthermore, the algorithms are hyper tuned to evaluate the enhancement in prediction. We used standard evaluation metrics to evaluate all the models. The evaluation indicates that the Adaboost and DT exhibit the highest accuracy rate of 82 % in predicting credit card default, surpassing the accuracy of the ANN model, which is 78 %. Several classification algorithms, comprising Logistic Regression, Random Forest, and Neural Networks, were evaluated to determine their effectiveness in identifying fraudulent activities. The Random Forest model emerged as the best performing algorithm with an accuracy of 99.5% and a high recall score, indicating its robustness in detecting fraudulent transactions. This system can be deployed in real-time financial systems to enhance fraud prevention mechanisms and ensure secure financial transactions.
Authors
Keywords
No keywords available for this article.