are219's People
are219's Issues
Set up tiny linux AWS for downloading
Setup for AWS
-
Create S3 bucket to save data CSVs to
-
Create AWS free tiny linux instance
-
Install git and python
-
Clone SYP repo to instance
-
Select which sensors to DL:
- need to partition the sensors
- QGIS to get all PA sensors within 25 mile radius of EPA sensor
- Create indicator for "in range" or not, and include which EPA sensor it is near.
Housekeeping issues
Get time of reading into correct time zone -- check TS api docs
All TS times are in UTC. Need to:
- get location of sensor
- get timezone of location
- specify timezone in TS API call
Compare hourly avg calculated by TS to hand-made hourly avg.
Purple Air Monitor data quality
Easiest option to remove questionable Purple Air data points: remove any sensors that are flagged or have downgraded sensors.
This would remove all data for those sensors, but there was likely good data at some point.
The more challenging option would be to create a measure of quality/uncertainty for each hourly average.
- I have the PM2.5 measurement from each channel on an hourly average level.
- I could either say that the readings represent the upper and lower bound of the reading, or report the difference between them as a measure of accuracy.
- Then could run the analysis at various levels of accuracy to see if the results are sensitive.
imputation/prediction of missing values
Little & Rubin Statistical Analysis with Missing Data
One possibly useful approach is to esti
mate the missingness propensity for each variable with missing values, con-
ditional on the observed or imputed values of other variables, and then com-
pare the distribution of the observed values with the distribution of one set
of the imputed values, within categories of the estimated missingness propen-
sity. If the imputation model is generating reasonable imputations, these empir-
ical distributions should look similar. For more discussion of graphical and
diagnostic checks (see Aboyomi et al. 2008; Bondarenko and Raghunathan
2016).
- Gibbs sampler
- JointAI R package for imputing missing values based on Gibbs sampler and Bayesian multiple imputation
- Also read more from the thesis linked above
5 minute presentation
Notes from Joel
CARB data teamup?
Model for decision maker (pollution recording)
can I use the pollution forcast data at the county level to model decision bevavior to come up with a correction?
Mullinathan paper (ML and Econ)
Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. “Human Decisions and Machine Predictions*.” The Quarterly Journal of Economics 133, no. 1 (February 1, 2018): 237–93. https://doi.org/10.1093/qje/qjx032.
Using Satellite Imagery and Machine Learning to Estimate the Livelihood Impact of Electricity Access
Nathan Ratledge, Gabe Cadamuro, Brandon de la Cuesta, Matthieu Stigler, Marshall Burke
Zou's other paper:
https://static1.squarespace.com/static/56034c20e4b047f1e0c1bfca/t/603afc5c6607da3e67640175/1614478432535/monitor_zou_202101.pdf
Derigina et al AER 2019 medicare data US, instrument for air pollution by CBSA x wind angle
Deryugina, Tatyana, Garth Heutel, Nolan H. Miller, David Molitor, and Julian Reif. “The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction.” American Economic Review 109, no. 12 (December 2019): 4178–4219. https://doi.org/10.1257/aer.20180279.
Quarter vs month: for each i, plot 2 donor PA pollution distributions (at times when i readings are not missing vs missing).
Think more about how to sell the "unbiased estimate of EPA_it hat"
Taking off top 20% of pollution observations from each i, then rerun analysis, predicting and compare to ground truth -- should have small prediction error if the prediction of missing high pollution is unbiased.
- Plot distribution of individual components of algorithm bias sum (individual prediction errors for non-missing values)
Plot 3-d slices of prediction error distributions (on non-missing data) over EPA_it observed values (slices are verticle, x = EPA_it)
Placebo bias is going to be artifically large for the PA monitors because I'm using the same times to omit (if the AltHypo is correct that high pollution observations are being omitted). Reconsider doing a randomization test.
Weather and temperature data
From Deryugina 2019
B. Atmospheric Conditions
Wind speed and wind direction data for the years 1999–2013 are obtained from
the North American Regional Reanalysis (NARR) daily reanalysis data. 6 NARR
incorporates raw data from land-based weather stations, aircraft, satellites, radio-
sondes (essentially weather balloons), dropsondes (weather instruments dropped
from aircraft), and other meteorological datasets. Wind conditions are reported on
a 32-by-32 kilometer grid and consist of vector pairs, one for the east-west wind
direction (u-component) and one for the north-south wind direction (v-component).
We first interpolate between grid points to estimate the daily u- and v-components at
each pollution monitor. We then convert the average u- and v-components into wind
direction and wind speed and average up to the county-day level. We define “wind
direction” as the direction the wind is blowing from.
Finally, we obtain temperature and precipitation data from Schlenker and Roberts
(2009), which produces a daily weather grid using data from PRISM and weather
stations. 7 Total daily precipitation and daily maximum and minimum temperatures
are reported for each point on a 2.5-by-2.5 mile grid covering the contiguous United
States for the years 1999–2013. We average the daily measures across all grid points
in a particular county to obtain a county-day measure.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.