Financial Fraud Detector | Fernando Borrero Granell

Table of Contents:

Introduction.
Materials and Methods.
Results.
References.
License.

Introduction

We delve through this project into the field of machine learning applied to financial data, exploring its potential and capabilities. Utilizing Python and a vast dataset of financial transactions, we will train several machine learning models and a neural network to determine the most effective approach. We will explore the possibilities of harnessing data to improve financial predictions and analysis.

The code for this project can be found here.

Materials and Methods

The project was implemented using the Python programming language. First, libraries like Pandas and Numpy were used for data wrangling and cleaning. During the data cleaning process, a highly linear correlation of 0.99 was observed between the new balance of an account and the sum of the older balance and the transaction amount. Despite the high correlation, the decision was made to retain these columns in the analysis as exceptions were identified as having a higher likelihood of being fraudulent.

The data was divided into training and testing datasets and several machine learning models were created using the Scikit-learn and Tensorflow libraries. These models were trained and evaluated for performance through a pipeline process. The best-performing model was then selected for further analysis.

The methods of this study include the followings:

Data cleaning
Data wrangling
Exploratory data analysis
Machine learning
Deep learning
Hyperparameter optimization
Data visualization

Results

In this section we provide an overview of how the different techniques performed in terms of detecting fraudulent transactions (accuracy, precision, recall, and F1 score), helping to shed light on the best-performing model for detecting financial fraud. The results obtained are the following:

Model	Accuracy	Recall	Precision	F1 Score
Regression	94.4%	84%	4%	0.08
KNN	99.7%	76%	58%	0.65
Decision Tree	99.9%	85%	89%	0.87
Random Forest	99.9%	81%	95%	0.88
Deep learning	99.3%	98%	30%	0.46

In terms of implementing a soft safety measure, a deep learning model would be the most effective option with a detection rate of 98% for fraudulent transactions. While it is noted that 70% of the alerts generated by the model may be false flags, they only represent a small portion (0.6%) of the total transactions and thus would not have a significant impact.

On the other hand, if a more aggressive approach is desired, the Random Forest model may be a better choice as it offers a more precise detection rate of 81% with only a small percentage (0.01%) of false alerts. This allows for the detection of a high proportion of fraud without causing negative effects for legitimate customers.

References

The synthetic data used in this project was sourced from kaggle and was generated from a sample of one month of financial transactions records from a mobile money service. This data was carefully extracted to accurately reflect real-world scenarios.

License

This project is licensed under the MIT License, which permits the use, distribution, and modification of the code and materials with proper attribution and the sharing of any modifications made under the same license.