Credit-Card-Fraud-Detection

This is a Python script that analyses the credit card fraud dataset by performing pre-processing, feature selection, and classification using several models. The script first imports the necessary libraries and reads in the dataset using pandas. It then prints the shape and summary statistics of the dataset and checks for any missing values. If any missing values are found, the script prints the name of the column, the number of missing values, the length of the column, and the percentage of missing values.

Next, the script generates kernel density estimation (KDE) plots, a heatmap of the correlation matrix, box plots, individual box plots for each feature against the class variable, and a pie chart to show the distribution of classes in the dataset.

The dataset is then split into training and testing sets, and the ratio of non-fraudulent to fraudulent transactions is printed for both sets. The script uses a decision tree classifier for feature selection, and the resulting transformed training and testing sets are stored in trainXSel and testXSel.

Finally, the script instantiates four classification models and passes them through a cross-validation pipeline. The cross-validation F1-scores are printed for each model, and a data frame of model names, mean F1-scores, and standard deviations of F1-scores is returned. The results show the mean F1-score and standard deviation of the F1-score for each of the four classification models (Logistic Regression, Random Forest, AdaBoost, and Gradient Boosting) after performing oversampling to address the class imbalance issue.

Based on the results, the Random Forest classifier achieved the highest F1-score of 0.844082, which is significantly higher than the other models. This indicates that the Random Forest model has a better balance between precision and recall in predicting the positive class (i.e., credit card fraud) in the imbalanced dataset after oversampling. The AdaBoost classifier achieved a relatively lower F1-score of 0.217858, which is lower than both the Random Forest and Gradient Boosting classifiers. This indicates that the AdaBoost model may not be as effective in addressing the class imbalance issue and in correctly identifying the positive class in the dataset.

The script uses the Random Forest Classifier (RFC) algorithm to build a predictive model for a binary classification problem. It uses GridSearchCV from Scikit-learn to perform hyperparameter tuning for the RFC algorithm, including the class_weight, criterion, oob_score, ccp_alpha, max_samples, and n_estimators. After hyperparameter tuning, the script fits the best model on the training set and evaluates its performance on the test set using several metrics such as accuracy, balanced accuracy, precision, recall, F1 score, ROC AUC, and confusion matrix. The output shows the values of these metrics for the best model.

For FraudDetectionpipeline Scripts:

Introduction:

These scripts are designed to detect fraudulent transactions using a pre-trained machine learning model. The fraudDetectorPipelineCLI.py takes command line arguments to specify the input data file, feature selector model file, classification model file, and the output file. The fraudDetectorPipelineGUI.py opens a graphical user interface (GUI) window to select these files. Both scripts output the predicted classes for each transaction in the input data file, as well as the number of fraudulent transactions detected.

Instructions for fraudDetectorPipelineCLI.py:

Open a command prompt or terminal window and navigate to the directory where the script is saved.

Run the script with the following command:

python fraudDetectorPipelineCLI.py --dataPath [path to input data file] --featureSelectorPath [path to feature selector model file] --clfPath [path to classification model file] --outputPath [path to output file]

For example:

python fraudDetectorPipelineCLI.py --dataPath data.csv --featureSelectorPath featureSelector.pkl --clfPath clfModel.pkl --outputPath output.csv

The predicted classes for each transaction in the input data file will be saved to the output file, and the number of fraudulent transactions detected will be printed to the console.

Instructions for fraudDetectorPipelineGUI.py:

Double-click on the fraudDetectorPipelineGUI.py file

Open a command prompt or terminal window and navigate to the directory where the script is saved.

Run the script with the following command:

python fraudDetectorPipelineGUI.py

A GUI window will open to select the input data file, feature selector model file, and classification model file.

Select the files using the GUI window and click "Save" to select the output file.

The predicted classes for each transaction in the input data file will be saved to the output file, and the number of fraudulent transactions detected will be printed to the console.

imyaash / credit-card-fraud-detection Goto Github PK

credit-card-fraud-detection's Introduction

Credit-Card-Fraud-Detection

credit-card-fraud-detection's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent