Code Monkey home page Code Monkey logo

dsi_project1_satact_eda's Introduction

Data Science Immersive (DSI6) Project 1

Katy Chow

Problem Statement

We are trying to provide College Board with additional insights to increase participation rates in SAT testing.

Executive Summary

Given 2017 ACT and SAT results by states in the US, we are trying to infer information about participation rates and different scores by exploring the 2017 ACT and SAT scores and 2018 ACT and SAT scores.

Data Sources

  • 2 Datasets Provided
    • SAT 2017
      • 52 Observations (50 States, DC area, and National results)
      • Math, Verbal, Overall Scores
    • ACT 2017
      • 51 Observations (50 States and the DC area)
      • Math, Reading, English, Science, Composite Scores
  • 2 Datasets Scraped from Web/ Manually pulled into csv
    • SAT 2018
      • 51 Observations (50 States and National results)
      • Math, Verbal, Overall Scores
    • ACT 2018
      • 51 Observations (50 States and National Results)
      • Composite Scores Only

Data Dictionary

** Data within the jupyter notebook have prefixes of either 2017_SAT, 2017_ACT, 2018_SAT, or 2018_ACT to represent the year and standardize test the feature is referring to. **

Feature Type Dataset Description
sat_state str SAT Regional Marker
sat_participation float SAT Percentage of graduating class that participated in ACT/SAT Testing
sat_evidence_based_reading_and_writing int SAT SAT reading and writing section score
sat_math int SAT SAT math section score
sat_total int SAT SAT total score - Summation of Math and Reading Scores
act_state str ACT Regional Marker
act_participation float ACT Percentage of graduating class that participated in ACT/SAT Testing
act_english float ACT ACT English Score
act_math float ACT ACT Math Score
act_reading float ACT ACT Reading Score
act_science float ACT ACT Science Score
act_composite float ACT ACT Total (Composite) Score - Average of individual Scores

Conclusions & Next Steps

For myself, the key take aways are that you cannot always just rely on data to give you insights, but instead look at other factors that can drive the data to look a certain way. Lots of low participation rates for exams are correlated with higher exam scores. This is driven by two major factors. The first being a self selection bias of students that are motivated to go to college. The second is the state will also provide for the brightest students to take the exams. Evidence can be found here.

For any of the states that have lower participation rates, College Board can work with the particular state to provide free exams once a year or free test preparations. The ACT seems to have working relationships with most states with the exception of Texas, the West coast, and the Northeastern region.

Additional data that I would like to have would be demographic data along with county results for SAT and ACT testing. I believe high test scores are more likely to be correlated with wealth due to the resource availability to study for exams and additionally pay for multiple exams.

It would also be nice to normalize the data by grouping different participation rates together and comparing state results across similar participation groups. A good example of why normalization is important is that North Dakota has one of the highest average scores, but there are only 7800 students graduating and 148 students took the SATs in 2017. If you compared those values to a state like Florida, perhaps within one county that many students took the SATs.

dsi_project1_satact_eda's People

Contributors

katychow avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.