Code Monkey home page Code Monkey logo

ca-collision-history's Introduction

Data Analysis - California Collision History 2001-2020

Motivation

According to the CDC, in 2018 traffic collision deaths cost the state of California $5.83 billion. Furthermore, since 2001, 35,385 people have died in CA due to automobile collisions. I felt that by analyzing the historical automobile collision trends and correlating them to outside factors can provide useful data to state and local governments. This data can be used to plan trafffic policy, create local or state traffic laws, and rightsize CHP staffing. My objective is to examine this dataset and give an insight into what factors might influence collisions.

Data

  • This data comes from the California Highway Patrol Data Set. It covers collisions from Jan 1,2001 to October 2020.

  • There are 3 main tables. The collisions table has 74 columns, parties table has 31 columns and victims table has 11 columns.

    1. collisions: Contains ~9.17 Million rows, data about the collision, location, what vehicles were involved, officer data
    2. parties: Contains ~18Million rows, columns include age, sex, and sobriety
    3. victims: Contains ~9.17 Million rows, contains information about the injuries of specific people involved in the collision.

Exploratory Data Analysis

  • Using the latitude and longitude columns, I produced a heatmap to visually look at where the majority of the collisions occurred in the last 20 years. Out of 9.17 million rows, only 2.52 million had data.

    picture

    • San Francisco county is the highest in density in the state of California. Orange county and Los Angeles county follow as second and third. It makes sense that the highest density locations is where the heat map is most red, representing higher number of collisions. Local governments should consider adjusting traffic policy and implementation based on the occurrence of the areas of high collisions.
  • 86% of the total collisions in the dataset are automobile accidents. The rest of the collisions are bicycle, motorcycle, pedestrian, and truck collisions.

  • Here is a closer look at the automobile collision history. Having normalized for population, the below graph shows the crashes per 1000 people over the course of 20 years.

    picture

    • Crashes were in the 12-14 crashes per 1000 people range from 2001 to 2007. In 2008, the crashes steadily dropped for several years until 2013 when they started going back up, but never as high as the initial trend. In 2020, the crashes dropped by almost 50% due to COVID 19.

Statistical Test

  • I was curious to map the unemployment rate % to see how well it correlated with crashes. This graph is exaggerated due to the different y axes. We can see from 2001-2006 there was a slight positive correlation between crash rates and unemployment rate. From 2008 to 2020, there seems to be an inverse correlation.

    picture

  • The Pearson R correlation is -0.57. There is a moderately inverse correlation between unemployment rate % and crashes per 1000 people. This makes sense because crashes will drop when less people are driving to work.

  • The unemployment rate went up from 2007 to 2010 due to the Great Recession. It was the worst and longest standing financial crisis in the United States since the 1929 Depression. Subprime mortgage was the trigger.

  • In July 2008, CA Senate Bill 1613 went into effect. It banned the use of cell phones for hand held conversations while driving. In Jan 2009 texting ban was added.

    picture

Hypothesis

  • I used the Fisher exact test in my hypothesis testing because it is used for binary testing, it is applicable in two situations. I wanted to test whether there is an association between automobile collisions and cell phone involved collisions within my dataset or whether these are independent.
  1. Does the CA bill reduce cell phone related collisions?

    • Null: Rate of crashes involving cell phone use is not lower after the CA bill is passed
    • Alternate: rate of automobile crashes involving cell phone use is lower after the CA bill is passed
    • Performing the Fisher Exact test gives a p value of 2.89e-243. With a .05 alpha, I can confidently say that there is enough evidence to state that the rate of automobile crashes involving cell phone use is lower after the CA bill is passed.

    Government policies can make a difference.

Conclusion

Looking at the CHP dataset we found a correlation between unemployment rate percentage and crashes and rate of automobile crashes involving cell phone use being lower after the CA Senate Bill 1613. We can see that socioeconomic events and government policies can have a correlation with automobile collisions. This analysis is valuable information for local and state governments to cplan trafffic policy, create local or state traffic laws, and rightsize CHP staffing.

Future Research

  • Iโ€™d like to explore the data more deeply and incorporate the following data points in the future:
    1. Improved vehicle stafety standards
    2. Smarter cars - sensors, AEB(automatic emergency braking)
    3. Driver distraction related accidents
    4. Carpool trends

Resources

CA 2001-2020 Traffic Collisions Database

CDC California 2018 Collision Death Costs

CA Senate Bill 1613

CA Population Statistics

CA Unemployment Rate History

ca-collision-history's People

Contributors

rena5555 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.