Code Monkey home page Code Monkey logo

credit-card-fraud-detection's Introduction

Credit-Card-Fraud-Detection

This is a Python script that analyses the credit card fraud dataset by performing pre-processing, feature selection, and classification using several models. The script first imports the necessary libraries and reads in the dataset using pandas. It then prints the shape and summary statistics of the dataset and checks for any missing values. If any missing values are found, the script prints the name of the column, the number of missing values, the length of the column, and the percentage of missing values.

alt text

alt text

alt text

Next, the script generates kernel density estimation (KDE) plots, a heatmap of the correlation matrix, box plots, individual box plots for each feature against the class variable, and a pie chart to show the distribution of classes in the dataset.

alt text

alt text

alt text

alt text

alt text

The dataset is then split into training and testing sets, and the ratio of non-fraudulent to fraudulent transactions is printed for both sets. The script uses a decision tree classifier for feature selection, and the resulting transformed training and testing sets are stored in trainXSel and testXSel.

alt text

Finally, the script instantiates four classification models and passes them through a cross-validation pipeline. The cross-validation F1-scores are printed for each model, and a data frame of model names, mean F1-scores, and standard deviations of F1-scores is returned. The results show the mean F1-score and standard deviation of the F1-score for each of the four classification models (Logistic Regression, Random Forest, AdaBoost, and Gradient Boosting) after performing oversampling to address the class imbalance issue.

Based on the results, the Random Forest classifier achieved the highest F1-score of 0.844082, which is significantly higher than the other models. This indicates that the Random Forest model has a better balance between precision and recall in predicting the positive class (i.e., credit card fraud) in the imbalanced dataset after oversampling. The AdaBoost classifier achieved a relatively lower F1-score of 0.217858, which is lower than both the Random Forest and Gradient Boosting classifiers. This indicates that the AdaBoost model may not be as effective in addressing the class imbalance issue and in correctly identifying the positive class in the dataset.

The script uses the Random Forest Classifier (RFC) algorithm to build a predictive model for a binary classification problem. It uses GridSearchCV from Scikit-learn to perform hyperparameter tuning for the RFC algorithm, including the class_weight, criterion, oob_score, ccp_alpha, max_samples, and n_estimators. After hyperparameter tuning, the script fits the best model on the training set and evaluates its performance on the test set using several metrics such as accuracy, balanced accuracy, precision, recall, F1 score, ROC AUC, and confusion matrix. The output shows the values of these metrics for the best model.

alt text

alt text

alt text

For FraudDetectionpipeline Scripts:

Introduction:

These scripts are designed to detect fraudulent transactions using a pre-trained machine learning model. The fraudDetectorPipelineCLI.py takes command line arguments to specify the input data file, feature selector model file, classification model file, and the output file. The fraudDetectorPipelineGUI.py opens a graphical user interface (GUI) window to select these files. Both scripts output the predicted classes for each transaction in the input data file, as well as the number of fraudulent transactions detected.

Instructions for fraudDetectorPipelineCLI.py:

Open a command prompt or terminal window and navigate to the directory where the script is saved.

Run the script with the following command:

python fraudDetectorPipelineCLI.py --dataPath [path to input data file] --featureSelectorPath [path to feature selector model file] --clfPath [path to classification model file] --outputPath [path to output file]

For example:

python fraudDetectorPipelineCLI.py --dataPath data.csv --featureSelectorPath featureSelector.pkl --clfPath clfModel.pkl --outputPath output.csv

The predicted classes for each transaction in the input data file will be saved to the output file, and the number of fraudulent transactions detected will be printed to the console.

Instructions for fraudDetectorPipelineGUI.py:

Double-click on the fraudDetectorPipelineGUI.py file

Or

Open a command prompt or terminal window and navigate to the directory where the script is saved.

Run the script with the following command:

python fraudDetectorPipelineGUI.py

A GUI window will open to select the input data file, feature selector model file, and classification model file.

Select the files using the GUI window and click "Save" to select the output file.

The predicted classes for each transaction in the input data file will be saved to the output file, and the number of fraudulent transactions detected will be printed to the console.

credit-card-fraud-detection's People

Contributors

imyaash avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.