Code Monkey home page Code Monkey logo

lovpatel93 / udacity-finding_donors_for_charity_ml Goto Github PK

View Code? Open in Web Editor NEW
1.0 0.0 0.0 555 KB

Udacity Data Scientist Nanodegree Project - Employ supervised algorithms to accurately model individuals income

Home Page: https://archive.ics.uci.edu/ml/datasets/Census+Income

Jupyter Notebook 100.00%
supervised-learning-algorithms data-exploration data-visualization normalization log-transformation evaluation-metrics naive-predictor decison-trees ensemble-models adaboost

udacity-finding_donors_for_charity_ml's Introduction

Udacity-Finding_Donors_for_Charity_ML

As part of my Udacity Data Scientist Nanodegree Project, my job was to implement the additional functionality necessary to successfully complete this project.

As part of this project, I have to choose the best candidate algorithm from preliminary results and further optimize this algorithm to best model the data. My goal with this implementation was to construct a model that accurately predicts whether an individual makes more than $50,000. This sort of task can arise in a non-profit setting, where organizations survive on donations. Understanding an individual's income can help a non-profit better understand how large of a donation to request, or whether or not they should reach out to begin with. While it can be difficult to determine an individual's general income bracket directly from public sources, we infered this value from other publically available features.

Dataset for this project: https://archive.ics.uci.edu/ml/datasets/Census+Income

In this project, I employed several supervised algorithms of my choice to accurately model individuals' income using data collected from the 1994 U.S. Census. The following are the supervised learning models which I used:

  • Gaussian Naive Bayes (GaussianNB)
  • Decision Trees
  • Ensemble Methods (Bagging, AdaBoost, Random Forest, Gradient Boosting)
  • K-Nearest Neighbors (KNeighbors)
  • Stochastic Gradient Descent Classifier (SGDC)
  • Support Vector Machines (SVM)
  • Logistic Regression

Following three supervised learning models that were appropriate for this problem that I test on the census data:

  • Decision Trees
  • Support Vector Machines (SVM)
  • Ensemble Methods (AdaBoost)

For each model chosen I described the following:

  • Real-world application in industry where the model can be applied
  • Strengths of the model (when does it perform well)
  • Weaknesses of the model (when does it perform poorly)
  • What makes the model a good candidate for the problem, given the data

Out of the three models, AdaBoost was the most appropriate for our task based on following reasons:

  • It is the classifier that performs the best on the testing data, in terms of both the accuracy and f-score.
  • It also takes resonably low time to train on the full dataset
  • By default, Adaboost uses a decision stump i.e. a decision tree of depth 1 as its base classifier, which can handle categorical and numerical data.

udacity-finding_donors_for_charity_ml's People

Contributors

lovpatel93 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.