Code Monkey home page Code Monkey logo

are219's People

Contributors

acwatt avatar

Stargazers

 avatar

Watchers

 avatar

are219's Issues

Set up tiny linux AWS for downloading

Setup for AWS

  1. Create S3 bucket to save data CSVs to

  2. Create AWS free tiny linux instance

  3. Install git and python

  4. Clone SYP repo to instance

  5. Select which sensors to DL:

  • need to partition the sensors
  • QGIS to get all PA sensors within 25 mile radius of EPA sensor
  • Create indicator for "in range" or not, and include which EPA sensor it is near.

Housekeeping issues

Get time of reading into correct time zone -- check TS api docs

All TS times are in UTC. Need to:

  • get location of sensor
  • get timezone of location
  • specify timezone in TS API call

Compare hourly avg calculated by TS to hand-made hourly avg.

Purple Air Monitor data quality

Easiest option to remove questionable Purple Air data points: remove any sensors that are flagged or have downgraded sensors.

This would remove all data for those sensors, but there was likely good data at some point.

The more challenging option would be to create a measure of quality/uncertainty for each hourly average.

  • I have the PM2.5 measurement from each channel on an hourly average level.
  • I could either say that the readings represent the upper and lower bound of the reading, or report the difference between them as a measure of accuracy.
  • Then could run the analysis at various levels of accuracy to see if the results are sensitive.

imputation/prediction of missing values

Little & Rubin Statistical Analysis with Missing Data
One possibly useful approach is to esti
mate the missingness propensity for each variable with missing values, con-
ditional on the observed or imputed values of other variables, and then com-
pare the distribution of the observed values with the distribution of one set
of the imputed values, within categories of the estimated missingness propen-
sity. If the imputation model is generating reasonable imputations, these empir-
ical distributions should look similar. For more discussion of graphical and
diagnostic checks (see Aboyomi et al. 2008; Bondarenko and Raghunathan
2016).

  • Gibbs sampler
  • JointAI R package for imputing missing values based on Gibbs sampler and Bayesian multiple imputation
  • Also read more from the thesis linked above

5 minute presentation

Notes from Joel

CARB data teamup?

Model for decision maker (pollution recording)
can I use the pollution forcast data at the county level to model decision bevavior to come up with a correction?
Mullinathan paper (ML and Econ)
Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. “Human Decisions and Machine Predictions*.” The Quarterly Journal of Economics 133, no. 1 (February 1, 2018): 237–93. https://doi.org/10.1093/qje/qjx032.

Using Satellite Imagery and Machine Learning to Estimate the Livelihood Impact of Electricity Access
Nathan Ratledge, Gabe Cadamuro, Brandon de la Cuesta, Matthieu Stigler, Marshall Burke

Zou's other paper:
https://static1.squarespace.com/static/56034c20e4b047f1e0c1bfca/t/603afc5c6607da3e67640175/1614478432535/monitor_zou_202101.pdf

Derigina et al AER 2019 medicare data US, instrument for air pollution by CBSA x wind angle
Deryugina, Tatyana, Garth Heutel, Nolan H. Miller, David Molitor, and Julian Reif. “The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction.” American Economic Review 109, no. 12 (December 2019): 4178–4219. https://doi.org/10.1257/aer.20180279.

Quarter vs month: for each i, plot 2 donor PA pollution distributions (at times when i readings are not missing vs missing).

Think more about how to sell the "unbiased estimate of EPA_it hat"

Taking off top 20% of pollution observations from each i, then rerun analysis, predicting and compare to ground truth -- should have small prediction error if the prediction of missing high pollution is unbiased.

  • Plot distribution of individual components of algorithm bias sum (individual prediction errors for non-missing values)
    Plot 3-d slices of prediction error distributions (on non-missing data) over EPA_it observed values (slices are verticle, x = EPA_it)

Placebo bias is going to be artifically large for the PA monitors because I'm using the same times to omit (if the AltHypo is correct that high pollution observations are being omitted). Reconsider doing a randomization test.

Weather and temperature data

From Deryugina 2019
B. Atmospheric Conditions
Wind speed and wind direction data for the years 1999–2013 are obtained from
the North American Regional Reanalysis (NARR) daily reanalysis data
. 6 NARR
incorporates raw data from land-based weather stations, aircraft, satellites, radio-
sondes (essentially weather balloons), dropsondes (weather instruments dropped
from aircraft), and other meteorological datasets. Wind conditions are reported on
a 32-by-32 kilometer grid and consist of vector pairs, one for the east-west wind
direction (u-component) and one for the north-south wind direction (v-component).
We first interpolate between grid points to estimate the daily u- and v-components at
each pollution monitor. We then convert the average u- and v-components into wind
direction and wind speed and average up to the county-day level. We define “wind
direction” as the direction the wind is blowing from.

Finally, we obtain temperature and precipitation data from Schlenker and Roberts
(2009), which produces a daily weather grid using data from PRISM and weather
stations
. 7 Total daily precipitation and daily maximum and minimum temperatures
are reported for each point on a 2.5-by-2.5 mile grid covering the contiguous United
States for the years 1999–2013. We average the daily measures across all grid points
in a particular county to obtain a county-day measure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.