usepa / camd-eia-crosswalk Goto Github PK
View Code? Open in Web Editor NEWA data crosswalk to integrate U.S. power sector emission and operation data from EPA to EIA
License: MIT License
A data crosswalk to integrate U.S. power sector emission and operation data from EPA to EIA
License: MIT License
generator_step_string <- "3_1_Generator (generators) match on plant and gen IDs Step 1"
camd_eia_gen_crosswalk <-
get_manual_matches(
unit_manual_matches,
unit_manual_excluded,
camd_unit,
eia_generator,
eia_by = c("EIA_PLANT_ID", "EIA_GENERATOR_ID")
)
camd_eia_gen_crosswalk <- camd_eia_gen_crosswalk %>%
bind_rows(
match_camd_eia_units(
get_camd_unmatched(camd_unit, camd_eia_gen_crosswalk),
get_unmatched(eia_generator, camd_eia_gen_crosswalk, by = c("EIA_PLANT_ID", "EIA_GENERATOR_ID")),
by = plant_generator_match,
str_glue("{generator_step_string}a: Exact match")
)
)
I wanted to suggest that a Python version of the R script be developed (or that this code base be transitioned to Python) to help facilitate contributions by the parts of the user community that use the crosswalk but primarily use Python. Switching to Python might enable more robust contribution from the user community, most of whom I am aware of seem to exclusively publish code in Python. Examples:
At the very least, it might be helpful to specify an anaconda environment and python script/notebook that could be used to run the R script using a python package such as r2py.
It appears that the R script is using year 2018 EIA-860 data to construct the crosswalk. I suggest in the next release updating this to use year 2020 data (or the most recent year of data available) to run this. This may potentially fix some issues with missing associations.
I've been using the CAMD-EIA crosswalk to connect data from the CAMD CEMS dataset and the EIA Form 860.
I noticed that there are some ORISPL_CODE
values in the CEMS dataset that are missing from the crosswalk under CAMD_PLANT_ID
, the field I believe is the crosswalk equivalent.
Here are the 140 ORISPL_CODE
values that are in the CEMS data but not in the crosswalk:
[5, 247, 312, 334, 375, 569, 596, 604, 646, 647,
658, 668, 699, 700, 734, 964, 1294, 1360, 1372, 1392,
1458, 1470, 1496, 1555, 1557, 1585, 1589, 1918, 2397, 2473,
2497, 2502, 2529, 2531, 2629, 2640, 2642, 2858, 2867, 2877,
2947, 3099, 3109, 3110, 3112, 3114, 3120, 3134, 3139, 3142,
3143, 3144, 3145, 3146, 3147, 3154, 3155, 3182, 3334, 3419,
3436, 3438, 3440, 3442, 3451, 3454, 3455, 3461, 3471, 3480,
3493, 3503, 3523, 3524, 3526, 3527, 3549, 3610, 4036, 4233,
4938, 6025, 6598, 7185, 7945, 7996, 8058, 10114, 10252, 10321,
10430, 10522, 10616, 10618, 10628, 10883, 13213, 14013, 50459, 50468,
50855, 50954, 54088, 54089, 54138, 54656, 54807, 55082, 55209, 55303,
55373, 55486, 55683, 55858, 56186, 57185, 59882, 60589, 60698, 60925,
60926, 60927, 61028, 61035, 61241, 61242, 880009, 880013, 880020, 880021,
880022, 880026, 880066, 880068, 880070, 880077, 880081, 880091, 880094, 880109]
A good chunk of these seem to correlate directly with EIA_PLANT_ID
values from 860.
What is the best way to integrate these into the crosswalk? Should I use the manual mapping form?
Add a column to flag combined cycle units in the output.
Possibly include details about the relationship (e.g. many-to-many, one-to-many, etc)
FACT API will be decommissioned
Use CAM APIs instead of FACT API
Determine better way to indicate match types from the independent matches to EIA data (generator/boiler matches)
When either EPA or EIA has more identifiers for the same units, this creates "duplicate" outputs in the crosswalk.
For example, the following two plants, included in the manual match file, with IDs 52151 and 7903 have duplicates for CAMD units and EIA units respectively.
CAMD_PLANT_ID | CAMD_UNIT_ID | CAMD_GENERATOR_ID | EIA_PLANT_ID | EIA_BOILER_ID | EIA_GENERATOR_ID |
---|---|---|---|---|---|
52151 | 001 | GEN1 | 52151 | PB1 | GEN1 |
52151 | 001 | GEN1 | 52151 | RF1 | GEN1 |
52151 | 001 | GEN2 | 52151 | PB2 | GEN2 |
52151 | 001 | GEN2 | 52151 | RF2 | GEN2 |
7903 | MGS1A | MGS1A | 7903 | MGS1 | |
7903 | MGS1B | MGS1B | 7903 | MGS1 | |
7903 | MGS2A | MGS2A | 7903 | MSG2 | |
7903 | MGS2B | MGS2B | 7903 | MSG2 |
Plant Code 7063 Generator 1 is of type Natural Gas with Compressed Air Storage but is still in the Acid Rain Program. It is also the only unit in the U.S. with this prime mover
The EIA data includes 6_1_EnviroAssoc_Y2018.xlsx which connects EIA generator ID and EIA boiler ID. The boiler ID should be included in the spreadsheet and tested for possible matches.
Expand on the coal unit example and include combined cycle units.
Create an output with only columns that can be used to perform the crosswalk
Investigate fuzzy matches and Include a manual match or exclusion where necessary
Once the EPA web page is published, add the link to the README
Depends on monitoring plan data
Based on reading the crosswalk documentation, the crosswalk uses the EIA-860 boiler generator association (BGA) table as one of its inputs, but the crosswalk does not seem to include the complete set of associations included in the BGA table. I'm not sure whether this is intentional (ie your goal is only to crosswalk CAMD unit ids with EIA generator IDs) or unintentional.
For example, for plant 1391, the crosswalk associates camd unit id 1A only with EIA generator ID 1A and boiler ID 1A. However, the EIA-860 BGA table shows that generator 1A is associated not only with boiler 1A, but also 2A, 3A, and 9.
It might be helpful in the next release to include the full set of boiler-generator associations for each CAMD unit id. Otherwise I would suggest including documentation that users need to merge the crosswalk with the BGA table to get the full set of associations.
Should unmatched units have a sequence number and if so, should they be at the bottom or mixed with the rest sorted by PLANT_ID, UNIT_ID, BOILER_ID?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.