Financial Fraud Detector
A compilation of Machine Learning models that detect fraudulent transactions.
Table of Contents:
Introduction
We delve through this project into the field of machine learning applied to financial data, exploring its potential and capabilities. Utilizing Python and a vast dataset of financial transactions, we will train several machine learning models and a neural network to determine the most effective approach. We will explore the possibilities of harnessing data to improve financial predictions and analysis.
The code for this project can be found here.
Materials and Methods
The project was implemented using the Python programming language. First, libraries like Pandas and Numpy were used for data wrangling and cleaning. During the data cleaning process, a highly linear correlation of 0.99 was observed between the new balance of an account and the sum of the older balance and the transaction amount. Despite the high correlation, the decision was made to retain these columns in the analysis as exceptions were identified as having a higher likelihood of being fraudulent.
The data was divided into training and testing datasets and several machine learning models were created using the Scikit-learn and Tensorflow libraries. These models were trained and evaluated for performance through a pipeline process. The best-performing model was then selected for further analysis.
The methods of this study include the followings:
- Data cleaning
- Data wrangling
- Exploratory data analysis
- Machine learning
- Deep learning
- Hyperparameter optimization
- Data visualization
Results
In this section we provide an overview of how the different techniques performed in terms of detecting fraudulent transactions (accuracy, precision, recall, and F1 score), helping to shed light on the best-performing model for detecting financial fraud. The results obtained are the following:
Model | Accuracy | Recall | Precision | F1 Score |
Regression | 94.4% | 84% | 4% | 0.08 |
KNN | 99.7% | 76% | 58% | 0.65 |
Decision Tree | 99.9% | 85% | 89% | 0.87 |
Random Forest | 99.9% | 81% | 95% | 0.88 |
Deep learning | 99.3% | 98% | 30% | 0.46 |
In terms of implementing a soft safety measure, a deep learning model would be the most effective option with a detection rate of 98% for fraudulent transactions. While it is noted that 70% of the alerts generated by the model may be false flags, they only represent a small portion (0.6%) of the total transactions and thus would not have a significant impact.
On the other hand, if a more aggressive approach is desired, the Random Forest model may be a better choice as it offers a more precise detection rate of 81% with only a small percentage (0.01%) of false alerts. This allows for the detection of a high proportion of fraud without causing negative effects for legitimate customers.
References
The synthetic data used in this project was sourced from kaggle and was generated from a sample of one month of financial transactions records from a mobile money service. This data was carefully extracted to accurately reflect real-world scenarios.
License
This project is licensed under the MIT License, which permits the use, distribution, and modification of the code and materials with proper attribution and the sharing of any modifications made under the same license.