Code Monkey home page Code Monkey logo

fraud-detection-enron's Introduction

Identifying Fraud from Enron Email

About This Project

The goal of this project is to identify whether a person is guilty for the notorious Enron Fraud, using publicly available information like financial incomes and emails.

From oil-trading, bandwidth trading, and weather trading with market-to-market accounting, Enron has spread its hands to a variety of commodities, got in touch with politicians like George W. Bush, and caused a great loss to the public, including the California electricity shortage. All these information can be useful if text learning was applied, and certain patterns could be found out like, a pattern indicating a decision-making person could very likely be a person of interest. However, this is not applied in this analysis since it's a more advanced topic.

This analysis used a finacial dataset containing people's salary, stock information, and so on. During Enron Fraud, people like Jefferey Skilling, Key Lay, and Fastow all have dumped large amounts of stock options, and they are all guilty. This information can be very helpful to check on other person of interest, and can be easily refelected in the dataset. This is also where machine learning comes into play. By creating models which calculate relationships between a person of interest and its available quantitative data, machine learning tries to find and memorize a pattern that helps us identify a guilty person in the future.

Overview of Main Report

To view the reports online,

The report is under directory documentations, named as "training_main.html". For more compact and summarized reporting, please check "documentation.html" in the same directory.

In this report, there are series of investigations performed to make a robust, strong final estimator to predict a person-of-interest(poi). These include,

  • an overview of the dataset.
  • outlier cleaning.
  • a performance comparison among different feature scaling methods, including MinMaxScaler, StandardScaler, and Normalizer.
  • creating three features, "stock_salary_ratio", "poi_from_ratio", "poi_to_ratio", and evaluating them.
  • a performance comparison between two different feature selection methods, SelectKBest and ExtraTreesClassifier.
  • a performance comparison between including PCA and excluding PCA.
  • a performance comparison between different classifiers, LinearSVC and KNeighborsClassifier.
  • tuning algorithms using F1 score as evaluation metric.
  • cross-validation on the final estimator.

Several helper functions are built for this project in poi_helper.py, which can be found in tools/. For more details, report poi_id.ipynb has all the thoughts and steps in building these functions.

fraud-detection-enron's People

Contributors

yyforyongyu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.