Code Monkey home page Code Monkey logo

enron-scandal-machine-learning's Introduction

Applying-Machine-Learning-Enron

In 2000, Enron was one of the largest companies in the United States. Enron was an American energy, commodities, and services company based in Houston, TX. that later became known for it's few acts of conspiracy, insider trading, and securities fraud.

Before going in deep with this project, it is important to note how technology has changed over time: Enron likely had it's own commercial intelligence group that would drive them to become "The Smartest Guys in the Room". When it comes to trading, everyone needs an edge. Enron's competitive edge was to take unique market information and analyze it in full depth. There was not as much application of data analytics or a data-driven mindset at the time of Enron. This is worth noting as it helps illustrate how rapidly the energy industry, and business across the board has evolved.

That in mind, by 2002 Enron had collapsed into bankruptcy due to widespread corporate fraud. The company hid the financial losses of the trading business and other operations of the company using mark-to-market accounting. This technique measures the value of a security based on its current market value instead of its book value.

In the resulting Federal investigation, a significant amount of typically confidential information entered into the public record, including tens of thousands of emails and detailed financial data for top executives. This data has been combined with a hand-generated list of persons of interest in the fraud case, which means individuals who were indicted, reached a settlement or plea deal with the government, or testified in exchange for prosecution immunity. This data has created a dataset of 21 features for 146 Enron employees.

For this project, a correlation matrix was created to get a direct visualization of how each of the 21 features were related to one another. The correlation matrix was useful in predicting one attribute from another and impute missing values. Additionally, it was used as a basic quantity for many modelling techniques.

In part with an exploratory analysis, I used the K-Means algorithm to help cluster the POI data; however, since the POI feature was a boolean value, I found that randomly selecting two clusters would not effectively apply to this dataset because this dataset had too many dimensions. For this reason, I moved on from the K-Means project early on in the process. However, for the purpose of this project, it is important to include the entire scope of the analysis.

In addition to the K-Means Algorithm, I performed a hierarhical analysis on the POI feature. By looking at the hierarchy of the POI, I was able to better visualize the spread between someone who is considered a Person of Interest (POI) and someone who is not.

Following an analysis of the target feature, POI, I took off with a predictive aalysis algorithm - Logistic Regression. This algorithm is based on the conept of probability. Logistic Regression is beneficial in this case as it provides an unbiased data testing point. For this project, this statistical method is used for predicting binary classes.

The Naive Bayes classifier was thought of as a logical approach to use as there is a strong independent assumption when having a dataset that has multiple features. This classifier is highly scalable, and since the Enron dataset has multiple parameters it has proved to be most appropriate for this project.

As a result of using Naive Bayes, the following conclusions have been found as to who was predicted to be a POI and who was actually a POI.

1.) Jeffrey Skilling, CEO, served a 12 year sentence as a POI. The Naive Bayes model correctly predicted that he would be a POI. 2.) Kevin Hannon, COO, served a two-year prison sentence. The same model correctly predicted he would be a POI. 3.) Interestingly enough, Scott Yeager, an employee, was convicted in the original dataset while this predictive model predicted him as not being a POI. He was acquitted of all charges after a long legal battle that went to the Supreme Court. 4.) Chris Calgar, another employee, was listed as a POI on the dataset, while this model predicted he was not a POI. Sure enough, his charges were dismissed. 5.) John Baxter, who was found dead in his car before he was to testify before Congress, was listed as a non-POI, while the model predicted it to be true. 6.) James Derrick was on the general counsel of Enron. While he testified before congress, the model predicted that he would be found as a POI as well.

Given the sample information above, the Naive Bayes model demonstrated to be most effective in revealing what this dataset had to offer. Overall, it was also found that a majority of the POI's have no deferred payments. Additionally, deferred salary payments made employees unsecured creditors of the company, exposing them to losing most of all of their money in their accounts. Additionally, most POI's had a higher bonus-to-salary ratio. Finally, some POI's were sending a lot of emails leading up to bankruptcy... by no coincidence, the CEO was receiving majority of those.

enron-scandal-machine-learning's People

Contributors

teomotun avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.