acwatt / are219 Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 120.39 MB

TeX 90.81% Python 9.19%

are219's People

Contributors

Stargazers

Watchers

are219's Issues

Set up tiny linux AWS for downloading

Setup for AWS

Create S3 bucket to save data CSVs to
Create AWS free tiny linux instance
Install git and python
Clone SYP repo to instance
Select which sensors to DL:

need to partition the sensors
QGIS to get all PA sensors within 25 mile radius of EPA sensor
Create indicator for "in range" or not, and include which EPA sensor it is near.

Housekeeping issues

Get time of reading into correct time zone -- check TS api docs

All TS times are in UTC. Need to:

get location of sensor
get timezone of location
specify timezone in TS API call

Compare hourly avg calculated by TS to hand-made hourly avg.

Purple Air Monitor data quality

Easiest option to remove questionable Purple Air data points: remove any sensors that are flagged or have downgraded sensors.

This would remove all data for those sensors, but there was likely good data at some point.

The more challenging option would be to create a measure of quality/uncertainty for each hourly average.

I have the PM2.5 measurement from each channel on an hourly average level.
I could either say that the readings represent the upper and lower bound of the reading, or report the difference between them as a measure of accuracy.
Then could run the analysis at various levels of accuracy to see if the results are sensitive.

imputation/prediction of missing values

Little & Rubin Statistical Analysis with Missing Data
One possibly useful approach is to esti
mate the missingness propensity for each variable with missing values, con-
ditional on the observed or imputed values of other variables, and then com-
pare the distribution of the observed values with the distribution of one set
of the imputed values, within categories of the estimated missingness propen-
sity. If the imputation model is generating reasonable imputations, these empir-
ical distributions should look similar. For more discussion of graphical and
diagnostic checks (see Aboyomi et al. 2008; Bondarenko and Raghunathan
2016).

Gibbs sampler
JointAI R package for imputing missing values based on Gibbs sampler and Bayesian multiple imputation
Also read more from the thesis linked above

5 minute presentation

Notes from Joel

CARB data teamup?

Model for decision maker (pollution recording)
can I use the pollution forcast data at the county level to model decision bevavior to come up with a correction?
Mullinathan paper (ML and Econ)
Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. “Human Decisions and Machine Predictions*.” The Quarterly Journal of Economics 133, no. 1 (February 1, 2018): 237–93. https://doi.org/10.1093/qje/qjx032.

Using Satellite Imagery and Machine Learning to Estimate the Livelihood Impact of Electricity Access
Nathan Ratledge, Gabe Cadamuro, Brandon de la Cuesta, Matthieu Stigler, Marshall Burke

Zou's other paper:
https://static1.squarespace.com/static/56034c20e4b047f1e0c1bfca/t/603afc5c6607da3e67640175/1614478432535/monitor_zou_202101.pdf

Derigina et al AER 2019 medicare data US, instrument for air pollution by CBSA x wind angle
Deryugina, Tatyana, Garth Heutel, Nolan H. Miller, David Molitor, and Julian Reif. “The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction.” American Economic Review 109, no. 12 (December 2019): 4178–4219. https://doi.org/10.1257/aer.20180279.

Quarter vs month: for each i, plot 2 donor PA pollution distributions (at times when i readings are not missing vs missing).

Think more about how to sell the "unbiased estimate of EPA_it hat"

Taking off top 20% of pollution observations from each i, then rerun analysis, predicting and compare to ground truth -- should have small prediction error if the prediction of missing high pollution is unbiased.

Plot distribution of individual components of algorithm bias sum (individual prediction errors for non-missing values)
Plot 3-d slices of prediction error distributions (on non-missing data) over EPA_it observed values (slices are verticle, x = EPA_it)

Placebo bias is going to be artifically large for the PA monitors because I'm using the same times to omit (if the AltHypo is correct that high pollution observations are being omitted). Reconsider doing a randomization test.

Weather and temperature data

From Deryugina 2019
B. Atmospheric Conditions
Wind speed and wind direction data for the years 1999–2013 are obtained from
the North American Regional Reanalysis (NARR) daily reanalysis data. 6 NARR
incorporates raw data from land-based weather stations, aircraft, satellites, radio-
sondes (essentially weather balloons), dropsondes (weather instruments dropped
from aircraft), and other meteorological datasets. Wind conditions are reported on
a 32-by-32 kilometer grid and consist of vector pairs, one for the east-west wind
direction (u-component) and one for the north-south wind direction (v-component).
We first interpolate between grid points to estimate the daily u- and v-components at
each pollution monitor. We then convert the average u- and v-components into wind
direction and wind speed and average up to the county-day level. We define “wind
direction” as the direction the wind is blowing from.

Finally, we obtain temperature and precipitation data from Schlenker and Roberts
(2009), which produces a daily weather grid using data from PRISM and weather
stations. 7 Total daily precipitation and daily maximum and minimum temperatures
are reported for each point on a 2.5-by-2.5 mile grid covering the contiguous United
States for the years 1999–2013. We average the daily measures across all grid points
in a particular county to obtain a county-day measure.

acwatt / are219 Goto Github PK

are219's People

Contributors

Stargazers

Watchers

are219's Issues

Set up tiny linux AWS for downloading

Setup for AWS

Housekeeping issues

Purple Air Monitor data quality

imputation/prediction of missing values

5 minute presentation

Notes from Joel

Weather and temperature data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent