Code Monkey home page Code Monkey logo

fraud_detector's Introduction

This project is to identify fraudsters using a hybrid of statistical, machine learning and graph techniques. The statistical analysis takes a time window and analyzes the raw log to derive cumulative statistics upon which anomalies are identified (items that are significantly different than the norm). At the same time, the machine learning classifier would take the same raw data within the time window and classify items that are anomalous.

Now having identified these anomalous items, we then apply a graph analysis on these items. Taking a longer time window (since a longer history may provide further evidence as to whether these items are indeed fraudsters), we model the nodes as the items and the directed edges as which node communicates with which other node. Having this graph, we then perform a classification using graph metrics, where we derive metrics such as triangle count and identify nodes that have the weakest communities.

Now the nodes that were identified from the graph analysis are much more likely to be fraudster. These nodes are then passed to a human fraud analyst for confirmation. Once the confirmation is provided then this labeled data is provided as training data for the previously mentioned random forest classifier.

###Results

The project visualization can be seen on www.fraud-detector.net . The concept implementation using public test data, indicated that the approach successfully identified nodes that appeared to be suspicious in nature. However, further work is necessary to confirm that the graph metrics used are indeed defining how a fraudster behaves.

###Motivation

$2 billion dollars a year is lost due to a specific type of telecommunication fraud.

The challenge is to identify the fraudsters while strictly minimizing the number of false positives.

Legacy Approach

The legacy approach employs a statistical treatment where phone numbers that get or make excessive calls are examined further by a human analyst.

Limitations of Current Solution

Smart fraudsters maybe able to fool statistical detection means by adapting their fraud call rates such that the rates are within the moving average windows. The other limitation is that this approach is prone to false positives which are very costly and creates a sense of disbelief on the fraud detection system in general.

Solution

This project presents a solution where a hybrid approach comprising of statistical, machine learning and graph analysis would be employed to minimize the false positives. The statistical and machine learning classifiers would identify phone numbers that are suspicious and then graph analysis metrics would be employed to further refine this list of phone numbers to those phone numbers that exhibit calling patterns that represent those of suspected fraudsters. For example, fraudsters tend to call random individuals who most likely dont know each other. In graph theory, the metric that could quantify this could be 'triangle count'. Using these graph metrics analysis, the number of false positives can be reduced.

Flow Diagram

Code Structure

The code is structured as the data flow diagram, where each box is encapsulated by a code file. The main file is called Fraud_Detection.py while the other files capture the rest of the flow. Fraud Detection.ipynb is an ipython notebook that captures the various pipeline segments.

Data Source

This project was done in the context of a hackathon where they sponsors provided data of their call traffic (phone numbers were deidentified).

Technologies

Postgres SQL, NetworkX, GraphLab (prototyped only), Spark (prototyped only), Python, Flask, D3 Javascript

fraud_detector's People

Contributors

kskk02 avatar

Stargazers

 avatar  avatar  avatar BIO AGI avatar Kaha avatar Berkay Akçay avatar Naeeme Danesh Moghaddam avatar  avatar Marco Calzada avatar  avatar Trinh Ngo avatar Aditya Barulhadi Margono avatar  avatar  avatar Diandra Y G P avatar  avatar Sonam Sharma avatar  avatar Smrutiranjan Sahu avatar Abdeltif Bouziane avatar Sukanya Mandal avatar Brad avatar  avatar Karthikeyan NG avatar  avatar corlin avatar Felix Hsu avatar  avatar  avatar Fangyan Chen avatar  avatar Qin avatar  avatar David Borrelli avatar  avatar data_lover avatar Ricardo Alanis avatar V avatar Jean Bouez avatar Max G Zinner avatar  avatar Rodrigo Monteiro do Amaral avatar Mohanth avatar  avatar 爱可可-爱生活 avatar Deep Narain Singh avatar Dhanush BK avatar  avatar TENSORTALK avatar samim avatar Tom Stark avatar  avatar  avatar Duttor avatar  avatar Joel Campbelll avatar Rishikesh (ऋषिकेश) avatar Daniel Sakuma avatar Animesh avatar Eyad Sibai avatar Jordan P.R. avatar Naveen Kumar K avatar  avatar kr4t0n avatar Zac avatar Timothy Chung avatar  avatar

Watchers

James Cloos avatar  avatar Vidyasagar N avatar Lily Elizabeth John avatar  avatar

fraud_detector's Issues

can you apply data

Hello, is it convenient to provide data? I want to execute the data in accordance with the code.Thank very much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.