Code Monkey home page Code Monkey logo

fraud-detection-handbook / fraud-detection-handbook Goto Github PK

View Code? Open in Web Editor NEW
436.0 20.0 158.0 21.64 MB

Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook

Home Page: https://fraud-detection-handbook.github.io/fraud-detection-handbook/Foreword.html

License: Other

Jupyter Notebook 98.71% Python 0.79% CSS 0.01% JavaScript 0.01% Shell 0.01% TeX 0.49%
fraud-detection machine-learning data-science data-mining credit-card credit-card-fraud open-data

fraud-detection-handbook's Introduction

Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook

Early access

Preliminary version available at https://fraud-detection-handbook.github.io/fraud-detection-handbook/Foreword.html.

Motivations

Machine learning for credit card fraud detection (ML for CCFD) has become an active research field. This is illustrated by the remarkable amount of publications on the topic in the last decade.

It makes no doubt that the integration of machine learning techniques in payment card fraud detection systems has greatly improved their ability to more efficiently detect frauds. At the same time, a major issue in this new research field is the lack of reproducibility. There do not exist any recognized benchmarks, nor methodologies, to compare and assess the proposed techniques.

This book aims at making a first step in this direction. All the techniques and results provided in this book are reproducible. Sections that include code are Jupyter notebooks, which can be executed either locally, or on the cloud using Google Colab or Binder.

The intended audience is students or professionals, interested in the specific problem of credit card fraud detection from a practical point of view. More generally, we think the book is also of interest for data practitioners and data scientists dealing with machine learning problems that involve sequential data and/or imbalanced classification problems.

Provisional table of content:

  • Chapter 1: Book overview
  • Chapter 2: Background
  • Chapter 3: Getting started
  • Chapter 4: Performance metrics
  • Chapter 5: Model selection
  • Chapter 6: Imbalanced learning
  • Chapter 7: Deep learning
  • Chapter 8: Interpretability*

(*): Not yet published.

Current draft

The writing of the book is ongoing. We provide through this Github repository an early access to the book. As of January 2022, the first seven chapters are made available.

The online version of the current draft of this book is available here.

Any comment or suggestion is welcome. We recommend using Github issues to start a discussion on a topic, and to use pull requests for fixing typos.

Compiling the book

In order to read and/or execute this book on your computer, you will need to clone this repository and compile the book.

This book is a Jupyter book. You will therefore first need to install Jupyter Book.

The compilation was tested with the following package versions:

sphinxcontrib-bibtex==2.2.1
Sphinx==4.2.0
jupyter-book==0.11.2

Once done, this is a two-step process:

  1. Clone this repository:
git clone https://github.com/Fraud-Detection-Handbook/fraud-detection-handbook
  1. Compile the book
jupyter-book build fraud-detection-handbook

The book will be available locally at fraud-detection-handbook/_build/html/index.html.

License

The code in the notebooks is released under a GNU GPL v3.0 license. The prose and pictures are released under a CC BY-SA 4.0 license.

If you wish to cite this book, you may use the following:

@book{leborgne2022fraud,
title={Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook},
author={Le Borgne, Yann-A{\"e}l and Siblini, Wissam and Lebichot, Bertrand and Bontempi, Gianluca},
url={https://github.com/Fraud-Detection-Handbook/fraud-detection-handbook},
year={2022},
publisher={Universit{\'e} Libre de Bruxelles}
}

Authors

Acknowledgments

This book is the result of ten years of collaboration between the Machine Learning Group, Université Libre de Bruxelles, Belgium and Worldline.

  • ULB-MLG, Principal investigator: Gianluca Bontempi
  • Worldline, R&D Manager: Frédéric Oblé

We wish to thank all the colleagues who worked on this topic during this collaboration: Olivier Caelen (ULB-MLG/Worldline), Fabrizio Carcillo (ULB-MLG), Guillaume Coter (Worldline), Andrea Dal Pozzolo (ULB-MLG), Jacopo De Stefani (ULB-MLG), Rémy Fabry (Worldline), Liyun He-Guelton (Worldline), Gian Marco Paldino (ULB-MLG), Théo Verhelst (ULB-MLG).

The collaboration was made possible thanks to Innoviris, the Brussels Region Institute for Research and Innovation, through a series of grants which started in 2012 and ended in 2021.

  • 2018 to 2021. DefeatFraud: Assessment and validation of deep feature engineering and learning solutions for fraud detection. Innoviris Team Up Programme.
  • 2015 to 2018. BruFence: Scalable machine learning for automating defense system. Innoviris Bridge Programme.
  • 2012 to 2015. Adaptive real-time machine learning for credit card fraud detection. Innoviris Doctiris Programme.

The collaboration is continuing in the context of the Data Engineering for Data Science (DEDS) project - under the Horizon 2020 - Marie Skłodowska-Curie Innovative Training Networks (H2020-MSCA-ITN-2020) framework.

fraud-detection-handbook's People

Contributors

patrickxchong avatar yannael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fraud-detection-handbook's Issues

ENH: Add Anaconda dependencies for Python 3.9

Proposed new feature or change:

As a lot of data scientists use Anaconda to install their project dependencies, I suggest to add a environment.yml file to the repository.

Moreover, Python 3.9 seems to work well with the current state of the code in the repository, so I would also suggest to migrate to this version.

Abstract / Management Summary / Foreword

Small note: the link to the Forword is incorrect. .html should become .md maybe?

I think the book could use an Abstract or Management Summary containing the academic or business results.

It could be something like (sub-optimal example):
With methods described in this book, we could identify 0.2% anomalous transactions from unlabeled, actual transactions.

This would encourage readers to continue reading and contributing if those are desired results.
Or save them a lot of time it those aren't.

A template for this github could be:

Abstract or Management Summary

Objective

There are parts in the Motivation section, but they could be more inspiring, something like:

This book aims to explore machine learning based technologies for fraud detection in financial transactions.

Methodology

...

Main Findings

...

Implications

...

Keywords

Artificial Intelligence, Machine Learning, Fraud Detection, Finance

Flag TX_DURING_NIGHT improvement

Congratulation!. This is the best work i have found for fraud detection and i hope you keep up.
I have noticed the function is_night is not working as expected due to the timestamp format.
image
In the last rows, you clearly see tx with timestamp around 23 not flagged with 1.

This is a possible solution to resolve the issue.

def is_night(tx_datetime):
    # Get the hour of the transaction
    tx_hour = tx_datetime.hour
    # Binary value: opposite of day ( day is hour between 6 and 18)
    is_night = not(tx_hour >= 6 and tx_hour <= 18)
    return int(is_night)

[Doubts] Regarding data simulation

Hello,
I have a few doubts regarding data simulation:

  1. Why are the customer as well as the terminal coordinates uniform? Shouldn't they be distributed on the basis of actual population density and be actual coordinates instead of being between 0 and 100? I think Europe can be fit in a rectangle and population densities can be used to get clustered data.

  2. Is there a sweet spot for n_customers to n_terminals ratio?

  3. The radius r is set to 5, which corresponds to around 100 available terminals for each customer.

Is there a specific reason to use 100 terminals/customer?

  1. Scenario 3: Every day, a list of 3 customers is drawn at random. In the next 14 days, 1/3 of their transactions have their amounts multiplied by 5 and marked as fraudulent.

Is there a specific reason to use 14 days? Or maybe base this number on real world data, because i doubt it will take 14 days for a customer to notice unwanted transactions on his card.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.