Code Monkey home page Code Monkey logo

bayeshack-transportation-railroad's Introduction

Bayes Hack 2016

Department of Transportation Prompt #2

How can data help us heal communities at high risk for suicide?

Prompt

A person or vehicle is hit by a train about once every three hours. This results in approximately 700 deaths per year, by accident and by suicide.

By creating descriptive models that examine empiric data on train fatalities and predictive models that can anticipate accidents and suicide attempts, we can decrease the number of deaths that occur and focus on communities that are at disproportinately high risk. Good data can be a focus national efforts to heal areas impacted by suicide, and to promote smart planning and routing to prevent accidents.

In this Repo

  • data/ - Cleaned and prepared data sources
    • data/samples/ contains heavily downsampled versions of datasets, so you can poke around easily. They're in CSV, so Excel or Google Sheets should be able to load them too.
  • analysis/ - iPyton notebook files (which you can view right here on GitHub) loading the data and exploring a few things. Good to understand the datasets and get ideas for your project.
  • cleaning/ - See the data preprocessing code we used (cleaning/scripts/), and the raw data sources that preceded the clean ones (cleaning/raw-data/).

Data Quirks

Important things to know or notice

  • This is all railroad ACCIDENTS (a "casualty" is not necessarily a death)
  • The FATAL column gives a binary indication of if someone died (~9% of records)
  • Many columns are NOT AVAILABLE FOR ALL TIME. You can figure this out yourself by looking at a given column, and seeing how often its non-null each year, but some examples are below:
    • Lat/long is not recorded until sometime in 2003
    • COUNTY is not recorded until sometime in 1997
    • Day of month (DAY) and time of day (TIMEHR TIMEMIN) are not recorded until 1997
    • Basically, be careful how you interpret a field if you're doing all-time aggregates. Look at the example notebook (in analysis/) for a nice table of which fields were populated over time.
  • ** IMPORTANT ** -- The FRA_GUIDE.pdf file is the official DoT guide to the data collection process, with appendices explaining the coding of every column (though they dont explicitly give the column name, you can infer it).

bayeshack-transportation-railroad's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.