Code Monkey home page Code Monkey logo

datafordiploma's Introduction

DataForDiploma

Analysis and Web App for Data for Diploma Challenge

Please refer to the website for all the information on the competition

datafordiplomas.devpost.com

The final submission website is at www.datafordiploma.com

Inspiration

DataForDiploma

Dropping out of high school is a critical issue in today's society that contributes to many of the long term issues in the nation. To tackle this issue, we need to understand the precise cause and effect relationships.

Our goal is to use data to derive actionable recommendations to help increase the national graduation rate to 90% by 2020.

We found that the most important factors for improving graduation rates are household stability and economic security. We also found that weather plays an important role. On the other hand, we found that school spending and low food access may not be very important for improving graduation rates and have very diminishing returns upon investments to improve them. Our best model had 0.68 R^2 but with 150 features. Our final selected model was with 0.6 R^2 with 7 features.

Interesting Insights!!!:

  1. Coming from educated households were NOT strong predictors of whether you would graduate high school.
  2. We found that the percent of people living under the poverty line was a much stronger predictor than median household income. This implies that it's not necessary to raise incomes for everyone, but just for the neediest families. This helps to target policy in the right direction.
  3. Salaries and Wages were impactful to grad rates but NOTHING else. Investments need to be in the right place !
  4. Food Access is NOT a big impact to grad rates.
  5. Weather in your area IS impactful !

What We Did

  • Data Clean up. The dataset provided, although detailed had a major flaw in it. Schools with cohort sizes under 60 students, had graduation rates with a very large margin of error and broad bucket.

  • Integration of many third party datasets at the tract or school district level. E.g. weather, food availability, school financials.

  • Deep analysis of the various hypotheses

  • Prediction of what specific investments to impact the key predictive features on the graduation rate in order to understand the best area to invest in with the maximum impact.

  • Interactive visualization tools to explore the final merged dataset, understand the key predictive features and deep dive into the analysis we performed.

How we built it

Most of the work was in the analysis and identifying actionable recommendations at the national level. In addition, we built a set of interactive visualization tools on a web platform in order to help the user understand what we accomplished.

Challenges

The data required significant cleaning and deep understanding. Firstly, the school districts for cohort sizes under 60 students had graduation rates that were in large buckets. In fact for 6-15 student cohorts, the rate was either <50% or โ‰ฅ50%. This could cause significant inaccuracies in our model, hence we decided to leave these cohorts outside of our modeling.

Next, we used the unmerged dataset and we needed to merge the specific tracts per school into district level data. The technique mentioned in the documentation assumed that all metrics are treated the same way, which is incorrect. We only applied the weighted average on absolute value metrics and recalculated the % value metrics.

Accomplishments

This project entailed an end to end data science methodology and implementation. Building a comprehensive data story, actionable recommendations using predictive analytics and finally a interactive visualization web platform to communicate the story.

Recommendations (More in the notebook on the site)

Given that Household Stability and Economic Security are the biggest factors, here are a few actions to take:

  1. Continued investment in community development corporations (CDCs) to strengthen communities from a financial and social perspective.

  2. Initiatives like the Low Income Housing Tax and the New Markets Tax Credit have brought vital services to low-income communities. Check out this article for more information. http://www.whatworksforamerica.org/ideas/the-future-of-community-development/#.VkqW59-rRE4

###School spending

  1. We found that more instructional spending tended to increase graduation rates, while other types tended to decrease it. Thus, all else being equal, it seems that schools should allocate more money towards instructional spending rather than support spending or administrative spending.

Inclement weather

  1. Colder areas tend to have lower graduation rates than warmer areas. One way to design around this problem is to have alternate school schedules for colder areas. It might be better to have a longer break during winter months and a shorter break during summer months for these schools

Future Work

Look into a model for graduation rates per race to understand what features makes each race's graduation rate to be different.

datafordiploma's People

Contributors

asjedh avatar dyerrington avatar kskk02 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

dyerrington

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.