Code Monkey home page Code Monkey logo

rcp2's People

Contributors

anilca-lab avatar chellison avatar dschon avatar fbartsch avatar gianfoss avatar jaboola9 avatar jacob-spiegel avatar jessezlotoff avatar jsya avatar kelsonss avatar mattbarger avatar nbanion avatar rmcarder avatar sylvest00 avatar thwhitfield avatar trav-work avatar yechielk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rcp2's Issues

October DataJam: Housing Price

There are a number of housing Variables called
House_val_15K_20K
House_vall_25K_30K
etc.

  • create a variable called housing_prc_under_250K which aggregates all housing house_vals underneath that value

-[] create a variable called median_house_value that finds which category holds the cumulative 50th percentile and returns the mean of that val as a int
ex. 25-30K = 5%
30-50k = 20%
50-150k =25%
150-250k 25%.
in this scenario the median 50th percentile is in the 50-150k bracket so return 100,000 (the average of 50-150k)

ACS NFIRS EDA

Home Fire Area Profiles: Use ACS data to identify common demographic and economic themes for Census Blockareas reporting fires. Document methodolgy and results

Only use NFIRS data from 2013 - 2017 which corresponds to the years for the data collected in the ACS dataset.

See thw_1.0_EDA_NFIRS_SVI for inspiration

Draft guidelines for contributions to the project package.

Establish guidelines that will help developers contribute to a cohesive package. Guidelines should cover:

  • Module organization.
  • Documentation and doc string guidance.
  • Unit testing guidance.
  • Basic coding style guidance.

Our goal is to share manageable, practical advice that helps teammates collaborate. No need to get too detailed or to go crazy with enforcement.

Document the guidelines in the project Sphinx docs to close out the issue.

Review other fire risk models

  • Review other Fire Risk Models/ Risk Models in General | Seek out other fire risk models, if possible, document findings, relevant data used etc.

Make a rcp2 package proof of concept.

Purpose

It should be easy for volunteers and Red Cross partners to use the rcp2 project. By packaging our code and investing in project infrastructure, we can make it easier to maintain, document, and reproduce our work.

Objectives

  1. Discuss our needs, guiding principles, and vision for the high level infrastructure.
  2. Implement an extensible proof of concept (POC) for the desired infrastructure.

The POC won't be comprehensive. We can close this issue after, say, setting up the infrastructure for one data source. After that, we can add other issues to build out the package infrastructure.

Describe the fire propensity model.

@dschon, may I have your help documenting the fire propensity model?

The goal is to enable a newcomer to understand the purpose, approach, and basic mechanics of the model so that they could get started working on it. Details welcome. We'll add your content to the documentation where it'll help all users leverage your work. Here's a sneak peek from my own rcp2 fork.

A few key questions. Feel free to add anything else that's pertinent.

  • What does the model predict?
  • What scripts / notebooks are involved?
  • What data sources do you use?
  • What are the dimensions of the data?
  • What does each record represent?
  • What years do you use?
  • What is the outcome variable?
  • What features do you use?
  • What modeling approach do you use?
  • How do you transform the raw data?
  • When / how often will more data become available?
  • Any other important details?

I'm happy to wordsmith or discuss 1:1. Thanks for your help.

Parent / Child Relationships between Fire Stations and Fire Departments

There are approx 30k different fire departments across the country, some one one station, some have dozens. Very occasionaly two different fire departments may use the same fire station. The DHS HIFLD dataset and US Geological Survey have a (somewhat dated) dataset that lists 53k fire stations. This task is to create parent / child relationships for the fire department and fire stations. This may involve more than simple joins and fuzzy searches. There are several different lists of fire departments that don't all agree and there are several datasets on fire stations that don't agree. We also might need to hit the Google Places API to pull fire station location data for some of the most questionable data.

Analyze Fire Station/GEOID Reporting quality. Apply those quality scores to county, census tract, and blockgroups

Data sources: NFIRS, ARC Response

  • Fire Reporting Quality Assessment Step 1 - Determine statistical outliers for NFIRS reporting at the County level using year-over-year and month-over-month reporting. Assign confidence score for county reporting consistancy. Take into account "zero" reporters. | NFIRS

  • Fire Reporting Quality Assessment Step 2 - Determine possible outliers for NFIRS reporting at the county level by looking at annual ARC reported totals and comparing to NFIRS. NFIRS should be greater | NFIRS, ARC Response

  • Fire Reporting Quality Assessment Step 3 - Determine possible NFIRS reporting deficiencies for census tracts, e.g. FD reports fire consistently, but only in 5 of county's 10 tracts. Determine best way to interpret data. If possible, assign a census tract reporting score to county, e.g. percentage of tracts where fires were reported based on aggregate data.

Calculate absolute and driving distance from GEOIDs to closest one or more fire stations

Data Sources - FD Locations, GEOID geographies

  • Assign GEOID (Full FIPS) to FD Locations File - Use FD Location file's Lat Longs to identify Census Block Group Info (GEOID) for all Fire Stations, update master file.

  • Determine distance between FD and Tracts - Determine as the crow flies distance to nearest fire station from census tract centroid, Document distance in output. Denote if paid or volunteer, if located in same county, and FDID

  • Determine distance between FD and Block Group - Determine as the crow flies distance to nearest fire station from census block group centroid, Document distance in output. Denote if paid or volunteer, if located in same county, and FDID

  • Determine Avg Drive time from FD to Census Tract Boundary - Use FD location data to determine closest possible drive distance to ingress at census tract level. Denote if paid or volunteer, if located in same county, and FDID

  • Determine Avg Drive time from FD to Census Block Group Boundary - Use FD location data to determine closest possible drive distance to ingress at census block group level. Denote if paid or volunteer, if located in same county, and FDID

Use SVI and HFC Data to identify factors related to fire alarm rate, fire rate, and lives saved rate

Data Sources: SVI, HFC Home Visits, HVC Lives Saved

  • Use SVI and HFC data to identify common demographic and economic themes for Census Tract areas where alarms were installed. Document methodology and results

  • Home Fire Area Profiles - Use SVI and HFC data to identify common demographic and economic themes for Census Tract areas reporting fires. Document methodology and results

  • Lives Saved Area Profiles - Use SVI and HFC data to identify common themes for Lives Saved Locations. Document methodology and results

Import project documentation to Read the Docs.

Host our Sphinx documentation on Read the Docs so that it is easy for users to access searchable, readable documentation about this project and its codebase.

I already imported the rcp2 documentation from DataKind-DC/rcp2. The docs currently fail to build due to an issue with requirements.txt. I expect we can fix the issue with a Read the Docs config file.

Some resources for completing this issue:

Make links between the project repo and project docs.

Add details that link the project repo to the project documentation.

  • Link to the docs from the README.
  • Link to the project repo from the docs.
  • Automate documentation to build when the master branch moves.
  • Add a badge to show the documentation build status.

Visualize model outputs using Mapbox

  • Create a Master datasets consisting of:
  1. The three model outputs
  2. the previous model outputs
  3. census geography
  • upload data to mapbox
  • distribute visualization and allow fellow members to analyze.

ACS Data Exploration

We recently cleaned all of the ACS data at the block level. We need to do some exploratory data analysis to QC and look at some simple trends within this data.
Data is located in: Master Project Data > ACS 5yr Block Data

  • Using missingno or pandas create a visualization for the number of Nan's / Missing numbers in each column of the data

  • Using seaborne (or your favorite plotting tool) create histograms of each column. Are the answers normally distributed? if no, can they be easily clustered ( ex low-income, medium-income, high-income)?

  • Using seaborne( or your favorite plotting tool) create a correlation matrix of the columns in the ACS data ( see thw_1.0_EDA_NFIRS_SVI in the notebooks page for inspiration). What columns are highly correlated with eachother?
    ** note ** if the column isn't population adjusted (it isnt' a percentage or <1) do that first or it will be correlated with block size.

write script to download google drive directory

I think it would be helpful if we could write a script to download the raw and processed data files from google drive, and then place that into the makefile as the first step. I've been able to download individual files using gdown, but I haven't been able to figure out how to download a whole google drive directory.

If anyone knows of a package or has an approach to download a whole google drive directory as part of a script, that would be great.

Document the project data pipeline.

In the Sphinx documentation, describe and diagram this project's pipeline from raw data sources to model predictions. This work will help new users contribute to the project, and it will help us organize project package code.

OctoberDataJam Munging: Housing age

ACS data has several housing categories

house_prc_built_before_1929
house_prc_built_1929_1939
...
house_prc_built_1969_1979

create a feature called house_prc_before_1980 that aggregates all these features into one feature.

Gather data from ACS and AHS

Data sources: ACS, AHS

  • Get All Available ACS Data @ BlockGroup - Pull from Census API, 5-year estimates

  • Get All Available AHS Data @ Tract Level - Pull from Census API, Desire years 2013 (last with smoke alarm data) and most recent available data.

Calculate rurality of counties, census tracts, and census block groups

Data sources: SVI, GIS, Census Data

  • County Rurality (Populated Areas) - Use SVI sq mile and population estimate data (E_TOTPOP) to determine rurality of populated areas in counties. Exclude any Census Tracts w/ 0 est pop. Score and rank order.

  • Census Tract Rurality (Populated Tracts) - Determine (and document) means to measure Census Tract Rurality (see Census Tract Rurality Tab). Exclude zero pop areas, score and rank order.

  • Census Block Group Rurality (Populated Block Groups) - Based on outputs from census tract rurality, determine means to assign rurality scores to block groups (based on size most likely). Should be relative to tract rurality score

Compare Fire Propensity Model and Fire Severity Model

Determine:

  1. Are the risk factors for severe fires the same for fire propensity.
  2. What is the overlap in the model predictions?
  • how many blocks are high in severe fires but not total fires?
  • how many blocks have a high number of fires but not severe fires?

ARC home fire campaign impact

broad project to study the impact of ARC's home fire. Suggested datasets:

  1. ARC response data: approximate ARC presence in a region by analyzing the proportion of recorded home fires they attended in a region

  2. NFIRS data: record of fires

Smoke_alarm_model Missing some NFIRS Geoids

In the Smoke_Alarm Model There are some blocks that have never been visited in the ARC data and so there are a number of blocks that are not included in the current model.

to fix:
Cross-reference ARC data ensuring all blocks exist and add rows with zeros for blocks with no visits.

Conda environment setup fails: ResolvePackageNotFound.

Hi @kelsonSS, I tried setting up using the new environment.yml file, and I get a ResolvePackageNotFound error.

$ conda env create -f environment.yml 
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - python.app
  - appnope

This Stack Overflow response suggests installing the problem dependencies using a pip subprocess, though they don't cite a source or explain their reasoning much. I tried it, and while conda successfully solves the environment, pip cannot find requirement python.app.

I'm installing on Ubuntu 18.04.4 LTS.

Describe the smoke alarm model.

@kelsonSS, may I have your help documenting the smoke alarm model?

The goal is to enable a newcomer to understand the purpose, approach, and basic mechanics of the model so that they could get started working on it. Details welcome. We'll add your content to the documentation where it'll help all users leverage your work. Here's a sneak peek from my own rcp2 fork.

A few key questions. Feel free to add anything else that's pertinent.

  • What does the model predict?
  • What scripts / notebooks are involved?
  • What data sources do you use?
  • What are the dimensions of the data?
  • What does each record represent?
  • What years do you use?
  • What is the outcome variable?
  • What features do you use?
  • What modeling approach do you use?
  • How do you transform the raw data?
  • When / how often will more data become available?
  • Any other important details?

I'm happy to wordsmith or discuss 1:1. Thanks for your help.

Aggregate Rankings at Census Tract and County level

Jake wants to preserve as many of the features from the previous map as possible. Looking at the V1 map (link below) It had the ability to switch between tracts and Counties and show the absolute ranking of each geography.

Therefore we will need to be able to aggregate and rank our models across multiple geographies.

For all three models:

  1. (if needed) convert binary classification estimates to probabilities

  2. create a function that outputs the ranks geoid ranks by probability

  3. create a function that takes these probability and a geography level and gives the average probability for that geography level

https://home-fire-risk.github.io/smoke_alarm_map/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.