Code Monkey home page Code Monkey logo

detroit_housing_project's Introduction

Detroit In Transition

This ML classification project explores housing blight in Detroit, MI in an attempt to better understand evolving real estate trends in the city. The project consists of 3 parts:

  1. Exploratory Data Analysis on housing-related data from the Detroit Open Portal website
  2. Training and comparing model produced with this data to predict a home’s likelihood to have gotten at least one blighting citation in Detroit
  3. A post-hoc analysis of the final model to better understand the problem and develop next steps

1. EDA

The charts below show a couple of findings confirm my two of my hypotheses:

On blight trends as it relates to Detroiters:

'blight chart'

And blight trends as it relates to homeowners vs rentals:

'rentals chart'

The findings are further complicated by considering more about homeowners vs folks who put their homes up for rent. Future iterations of this project will include a more detailed breakdown about real estate trends for various demographics.

2. Modeling

The final Decision Tree model does well at binary classfication, with metrics like the following:

  • Accuracy : 94.69
  • F1-Score : 0.60
  • Precision Score : 0.589
  • Recall Score : 0.621

Note: The majority of top indicators were features engineered from the raw data: 'features chart'

3. Post-Hoc Analyses

However, feature importance evaluation, the total due per parcel (the most important feature at 0.534 importance) is too highly correlated with the target varialbe (total tickets) to provide an independent prediction. As such, the next iteration of this project applies methods that mitigate this confluence.

These methods include:

  • Up and Down Sampling
  • Various other modeling techniques such as grid search
  • Look at a baseline model and null accuracy
  • Deploy grid and random search to find optimal tree depth
  • Observe evaluation metrics (including confusion matrix) for each class
  • Try a more thorough EDA distribution of each feature again target and improve upon current visualizations
  • Scale and normalize each variable more thoroughly
  • Eventually try a multi-class target instead of binary, with categories of 0 tickets // 1-7 tickets // 8+ tickets

See full slides for this project's presentation at https://bit.ly/2MCz2Ob All data can be found for free at Detroit's Open Data Portal https://data.detroitmi.gov/

detroit_housing_project's People

Contributors

rebecca-hh-rosen avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.