Code Monkey home page Code Monkey logo

standardizedinventories's Introduction

Standardized Emission and Waste Inventories (StEWI)

DOI - 10.3390/app12073447 DOI - 10.23719/1526441 build

StEWI is a collection of Python modules that provide processed USEPA facility-based emission and waste generation inventory data in standard tabular formats. The standard outputs may be further aggregated or filtered based on given criteria, and can be combined based on common facility and flows across the inventories.

StEWI consists of a core module, stewi, that digests and provides the USEPA inventory data in standard formats. Two matcher modules, the facilitymatcher and chemicalmatcher, provide commons IDs for facilities and flows across inventories, which is used by the stewicombo module to combine the data, and optionally remove overlaps and remove double counting of groups of chemicals based on user preferences.

StEWI v1 was peer-reviewed internally at USEPA and externally through Applied Sciences. An article describing StEWI was published in a special issue of Applied Sciences: Advanced Data Engineering for Life Cycle Applications.

USEPA Inventories Covered By Data Reporting Year (current version)

Source 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Discharge Monitoring Reports* x x x x x x x x x x x
Greenhouse Gas Reporting Program x x x x x x x x x x x
Emissions & Generation Resource Integrated Database x x x x x x
National Emissions Inventory** x i i x i i x i i x
RCRA Biennial Report* x x x x x
Toxic Release Inventory* x x x x x x x x x x x

*Earlier data exist and are accessible but have not been validated

**Only point sources included at this time from NEI; i interim years between triennial releases, accessed through the Emissions Inventory System, are not validated

Standard output formats

The core stewi module produces the following output formats:

Flow-By-Facility: Each row represents the total amount of release or waste of a single type in a given year from the given facility.

Flow-By-Process: Each row represents the total amount of release or waste of a single type in a given year from a specific process within the given facility. Applicable only to NEI and GHGRP.

Facility: Each row represents a unique facility in a given inventory and given year

Flow: Each row represents a unique flow (substance or waste) in a given inventory and given year

The chemicalmatcher module produces:

Chemical Matches: Each row provides a common identifier for an inventory flow chemical

The facilitymatcher module produces:

Facility Matches: Each row provides a common identifier for an inventory facility

The stewicombo module produces:

Flow-By-Facility Combined: Analagous to the flowbyfacility, with chemical and facilitymatches added

Data Processing

The following describes details related to dataset access, processing, and validation

DMR

Processing of the DMR uses the custom search option of the Water Pollutant Loading Tool with the following parameters:

  • Parameter grouping: On - applies a parameter grouping function to avoid double-counting loads for pollutant parameters that represent the same pollutant
  • Detection limit: Half - set all non-detects to ½ the detection limit
  • Estimation: On - estimates loads when monitoring data are not reported for one or more monitoring periods in a reporting year
  • Nutrient Aggregation: On - Nitrogen and Phosphorous flows are converted to N and P equivalents

For validation, the sum of facility releases (excluding N & P) are compared against reported state totals. Some validation issues are expected due to differences in default parameters used by the water pollutant loading tool for calculating state totals.

eGRID

eGRID data are sourced from EPA's eGRID site. For validation, the sum of facility releases are compared against reported U.S. totals by flow.

GHGRP

GHGRP data are sourced from EPA's Envirofacts API For validation, the sum of facility releases by subpart are compared against reported U.S. totals by subpart and flow. The validation of some flows (HFC, HFE, and PFCs) are reported in carbon dioxide equivalents. Mixed reporting of these flows in the source data in units of mass or carbon dioxide equivalents results in validation issues.

NEI

NEI data are downloaded from the EPA Emissions Inventory System (EIS) Gateway and hosted on EPA Data Commons for access by StEWI. For validation, the sum of facility releases are compared against reported totals by flow. Validation is only available for triennial datasets.

RCRAInfo

RCRAInfo data are sourced from the Public Data Files For validation, the sum of facility waste generation are compared against reported state totals as calculated for the National Biennial Report.

TRI

TRI data are sourced from the Basic Plus Data files For validation, the sum of facility releases are compared to national totals by flow from the TRI Explorer.

Combined Inventories

stewicombo module combines inventory data from within and across selected inventories by matching facilities in the Facility Registry Service and chemical flows using the Substance Registry Service. If the remove_overlap parameter is set to True (default), stewicombo combines records using the following default logic:

  • Records that share a common compartment, SRS ID and FRS ID within an inventory are summed.
  • Records that share a common compartment, SRS ID and FRS ID across an inventory are assessed by compartment preference (see INVENTORY_PREFERENCE_BY_COMPARTMENT).
  • Additional steps are taken to avoid overlap of:
    • nutrient flow releases to water between the TRI and DMR
    • particulate matter releases to air reflecting PM < 10 and PM < 2.5 in the NEI
    • Volatile Organic Compound (VOC) releases to air for individually reported VOCs and grouped VOCs

Installation Instructions

Install a release directly from github using pip. From a command line interface, run:

pip install git+https://github.com/USEPA/[email protected]#egg=StEWI

where you can replace 'v1.1.0' with the version you wish to use under Releases.

Alternatively, to install from the most current point on the repository:

git clone https://github.com/USEPA/standardizedinventories.git
cd standardizedinventories
pip install . # or pip install -e . for devs

Secondary Context Installation Steps

In order to enable calculation and assignment of urban/rural secondary contexts, please refer to esupy's README.md for installation instructions, which may require a copy of the env_sec_ctxt.yaml file included here.

Data Products

Output of StEWI can be accessed for selected releases without having to run StEWI. See the Data Product Links page for direct links to StEWI output files in Apache parquet format.

Wiki

See the Wiki for instructions on installation and use and for citation and contact information.

Disclaimer

The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.

standardizedinventories's People

Contributors

a-w-beck avatar astarr8181 avatar bergmamp avatar bl-young avatar dyoung11 avatar ericmbell1 avatar greatest125 avatar jodhernandezbe avatar liadoverg avatar matthewlchambers avatar moli7 avatar rwashing523 avatar tjlca avatar vlahm avatar wesingwersen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

standardizedinventories's Issues

stewicombo removing records where FRS_ID is null

e.g.
#! beware long run time
inventories_of_interest = {'eGRID': '2016', 'TRI': '2016', 'NEI': '2016', 'RCRAInfo': '2015'}
emissions_and_wastes_by_facility = stewicombo.combineInventoriesforFacilitiesinOneInventory("eGRID",inventories_of_interest,filter_for_LCI=True)
len(emissions_and_wastes_by_facility[emissions_and_wastes_by_facility['FRS_ID'].isnull()])
#0
#Null for FRS_ID removed


This should not be the case for eGRID where not all facilities have SRS_IDs

Refactor NEI.py to function like TRI/RCRAInfo

NEI.py would ideally function now like the recently refactored TRI.py and RCRAInfo.py, with a main function that gets parameters with argparse for year and other options. The code now requires a code update to run a different year which is not ideal.

Comparison with GHGRP flight tool, we find 2 facilities missing

These two facilities are missing from the STEWI generated facility file for GHGRP 2017. I have highlighted the GHGRP ID and Facility ID.

REPORTING YEAR FACILITY NAME GHGRP ID REPORTED ADDRESS LATITUDE LONGITUDE CITY NAME COUNTY NAME STATE ZIP CODE PARENT COMPANIES GHG QUANTITY (METRIC TONS CO2e) SUBPARTS FRSID

2017 Alamo San Antonio Cement Plant 1007208 6055 West Green Mountain Road 29.610148 -98.367792 San Antonio BEXAR COUNTY TX 78266 ALAMO CEMENT CO (100%) 807287 H

2017 Argos Puerto Rico, Corp. 1006164 Road PR-2, Km 26.7 18.3944 -66.2976 Dorado DORADO MUNICIPIO PR 646 ESSROC CEMENT CO (100%) 116460 C,H

Fix DMR urls

ECHO notification:

ECHO's data service URLs are changing. Please update your code and bookmarks as soon as possible to maintain access to ECHO data. Users will need to adjust their URLs to start with https://echodata.epa.gov instead of https://ofmpub.epa.gov. The service documentation listed below has been changed to use the new URL.

FRS ID to multiple eGRID IDs

I have at least one instance (FRS ID 110021350946), where there are clearly two eGRID IDs (55077 and 56944) associated with the same FRS ID, which is understandable, but all of the NEI emissions are tagged as 56944 (solar PV plant) while eGRID emissions are tagged as 55077 (NGCC plant). When this data gets pulled into the eLCI we end up with a large amount of NOx emissions (presumably from the NGCC plant) getting divided by relatively smaller electricity generation) and a really large NOx emission rate. I imagine the correct answer in this case would be to assign all reported emissions to the NGCC plant. Alternatively, both eGRID IDs could be passed along so that when generation data is matched later, the generation from both eGRID IDs can be pulled and summed.

[TRI] TRI -A download breaking

Tried downloading some TRI data, but ran into the same issue for both 2016 and 2017

➜ python -m stewi.TRI A -Y 2016
INFO downloading TRI files from source for 2016
Traceback (most recent call last):
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/TRI.py", line 413, in <module>
    extract_TRI_data_files(link_zip_TRI, TRIFiles, TRIyear)
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/TRI.py", line 76, in extract_TRI_data_files
    for line in txtfile:
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 153: invalid start byte

GHGRP column format change

downloaded csv tables from GHGRP have new string appended to the columns causing issues with the column parsing here:

# for all columns in the temporary dataframe, remove subpart-specific prefixes
for col in table_df:
table_df.rename(columns={col: col[len(table) + 1:]}, inplace=True)

`pyarrow` dependency causing problems

Line 67 in NEI.py specifies the pyarrow engine for pd.read_parquet, but the current Windows Anaconda distribution of pyarrow does not include support for the snappy codec, so I have to use the fastparquet engine instead. On the other hand, I see that @bl-young had some issues with fastparquet here. Could we leave the engine unspecified in the call to pd.read_parquet, so the user can use whichever engine works better for them?

Allow maximum reported results by inventory

But what if a facility reports 100 kg of formaldehyde to TRI and 10 kg via DMR, and DMR takes precedence in INVENTORY_PREFERENCE_BY_COMPARTMENT? Then we would want stewicombo to return 10 kg for DMR and 90 kg for TRI.
Is there a way to achieve this result?

Originally posted by @vlahm in #129 (comment)

case change for TRI flows

Data downloaded from the Basic Plus Data Files are no longer in ALL CAPS. This is causing issues with validation and likely with flow mapping, and maybe with chemical matcher

Error in DMR query

Query returns:

{'ErrorMessage': 'We could not process your query. Please consult our Web service documentation [https://echo.epa.gov/tools/web-services] and try again, or contact ECHO Support [https://echo.epa.gov/resources/general-info/contact-us] if you need assistance.'}

Changing INVENTORY_PREFERENCE_BY_COMPARTMENT yields different totals

Hello,

Am I right in thinking the INVENTORY_PREFERENCE_BY_COMPARTMENT parameter controls which inventory takes precedence when there is overlap? If so, changing this parameter shouldn't result in different flow totals--only different allocations of flow between inventories. The code and comments below demonstrate that changing the INVENTORY_PREFERENCE_BY_COMPARTMENT parameter currently causes different total (sum) flows to be returned.

# in stewicombo/globals.py:

INVENTORY_PREFERENCE_BY_COMPARTMENT = {"air": ["eGRID", "GHGRP", "NEI", "TRI"],
                                       "water": ["DMR", "TRI"],
                                       "soil": ["TRI"],
                                       "waste": ["RCRAInfo", "TRI"],
                                       "output": ["eGRID"]}

---

# separate script:

from stewicombo import combineFullInventories

cmb = combineFullInventories({'TRI':2015, 'NEI':2015, 'DMR':2015}, filter_for_LCI = False)
cmb['FlowAmount'].sum()

# result: 66912900555.14437

---

# back in stewicombo/globals.py:

INVENTORY_PREFERENCE_BY_COMPARTMENT = {"air": ["TRI", "NEI", "eGRID", "GHGRP"],
                                       "water": ["TRI", "DMR"],
                                       "soil": ["TRI"],
                                       "waste": ["TRI", "RCRAInfo"],
                                       "output": ["eGRID"]}

# reinstall StEWI-1.0.5

cmb = combineFullInventories({'TRI':2015, 'NEI':2015, 'DMR':2015}, filter_for_LCI = False)
cmb['FlowAmount'].sum()

# result: 67048867763.67558

[GHGRP] Subpart Z missing (2016+)

Subpart Z data after 2015 are not posted within envirofacts. The data exist and should be there and are included in PUB_DIM_FACILITY (used for validation). This will result in validation errors for subpart Z.

Envirofacts staff have been notified of the issue.

pip install errors

While running the command pip install git+https://github.com/USEPA/[email protected]#egg=standardizedinventories (also tried 0.9.7), I'm receiving the following error:

WARNING: Generating metadata for package standardizedinventories produced metadata for project name stewi. Fix your #egg=standardizedinventories fragments.
WARNING: Discarding git+https://github.com/USEPA/[email protected]#egg=standardizedinventories. Requested stewi from git+https://github.com/USEPA/[email protected]#egg=standardizedinventories has inconsistent name: filename has 'standardizedinventories', but metadata has 'StEWI'
ERROR: Could not find a version that satisfies the requirement standardizedinventories (unavailable) (from versions: none)
ERROR: No matching distribution found for standardizedinventories (unavailable)

Using python 3.8.10

config.yaml is not being included when installing

Receive error:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/jamiesom/AppData/Roaming/Python/Python37/site-packages/chemicalmatcher/config.yaml'
After installing electricitylci via pip. I suspect the package data for setup.py here needs to be modified to include the file.

Create way to save output as CSV instead of Parquet

Firstly, this is some amazing software. Thank you for creating and actively developing it.

Working with the Parquet file format can be a hurdle for some users, can there be an option that might save the Parquet to a CSV?

Thank you!

Server not compatible with RFC 5746 secure renegotiation

I realize this isn't the ideal place to raise such an issue, but maybe you can pass it on to the relevant parties. Modern SSL clients expect servers to adhere to this proposed TLS standard which prevents a specific type of MitM attack. Attempting to use facilitymatcher with OpenSSL3 results in the following error:

SSLError: HTTPSConnectionPool(host='ofmext.epa.gov', port=443): Max retries exceeded with url: /FLA/www3/state_files/national_combined.zip (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:997)

I received this error by running facilitymatches = facilitymatcher.get_matches_for_inventories(["TRI"]) on Ubuntu 20.04, under Python 3.10.5. More information can be found here.

Facility missing from GHGRP (2017) flow by facility/flow by process file

1007033 CEMEX de Puerto Rico, Inc. State Road 123, kilometer 8.0 Ponce PR 733 18.02305 -66.63888 PONCE MUNICIPIO 327310

1006164 Argos Puerto Rico, Corp. Road PR-2, Km 26.7 Dorado PR 646 18.3944 -66.2976 DORADO MUNICIPIO 212312

The facilities are not missing the GHGRP Facility file.

However, emissions from these facility is missing from the Flow by Facility and Flow by Process file.

I checked the raw GHGRP database (2017) and the facility has reported emissions.

These are both in Puerto Rico.

TRI inventory no longer accessible through stewi after webpage change

Hello. It looks like the download interface on https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-data-files-calendar-years-1987-present changed in the last few months, which now prevents link_zip from identifying the correct download URLs.

Here is an example of a query that fails:

>>> stewi.getInventory('TRI', 2017)

MissingSchema: Invalid URL '2017': No scheme supplied. Perhaps you meant http://2017?

And here is what the scraped webpage linked above used to look like.

GHGRP - C_CONFIG... download breaking

It now seems to be breaking while downloading the data. It might even be breaking when getting the table URL since I'm not seeing the log statement that proceeds the generate_url function call. Since it's breaking somewhere in there, it's causing the try block to fail, which means that table_df isn't actually assigned, resulting in the following error:

➜ python -m stewi.GHGRP A -Y 2019
INFO downloading and processing GHGRP data to /Users/michaellong/Library/Application Support/stewi/GHGRP Data Files/tables/2019/
INFO Downloading C_CONFIGURATION_LEVEL_INFO (rows: 15873)
Traceback (most recent call last):
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/GHGRP.py", line 813, in <module>
    main()
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/GHGRP.py", line 665, in main
    ghgrp1 = download_and_parse_subpart_tables(year)
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/GHGRP.py", line 304, in download_and_parse_subpart_tables
    table_df = import_or_download_table(filepath, subpart_emissions_table,
  File "/Users/michaellong/miniconda/envs/epa/lib/python3.8/site-packages/stewi/GHGRP.py", line 259, in import_or_download_table
    for col in table_df:
UnboundLocalError: local variable 'table_df' referenced before assignment

Some error statements in those blocks of code would also be useful for tracking down errors for those of us running from a pip install rather than running directly from a local copy of the repo.

Originally posted by @michael-long88 in #76 (comment)

Speed up stewicombo

Reported by @gschivley
I did some line profiling (%lprun magic)
and most of the time is in just a few places.
The function aggregate_and_remove_overlap(inventories) takes 99.7% of the time
Within aggregate_and_remove_overlap(), 62% of time is spent on line 111 (df_new =
grouped_by_src.agg(func_cols_map))
31% is on line 115 (df_new = grouped.apply(get_by_preference)).
I don't have a clear picture in my head of what's happening through the whole process but
I'm attaching the lprun results, broken out by a few of the functions.
Most of the time is spent on line 111 of aggregate_and_remove_overlap (original line number

  • I'm attaching a second set of lprun results with 2 new lines at 108/9). func_cols_map is a
    dictionary with functions to apply to each column. One of the functions is get_first_item,
    which uses .iloc[0] to select the first row of the column. This is relatively slow and can be
    replaced with the built-in pandas method .first(). To implement in the aggregate method you
    only need to assign the string "first" as the value for the each key in func_cols_map. The
    .first() method is around 50x faster than iloc[0], and shaves a couple min off the total time.
    You can see my change in lines 108/9.
    How to interpret the lprun results:
    Each function profiled has a "Total time" at the top, the number of time each line is run,
    the time per line (total), time per hit, and % of time on each line within that function.
    The results below are for the function get_by_preference, used in a groupby().apply() on
    line 115 of overlaphandler.py.
    55.5 seconds are spend within this function.
    91% of the time is spent iterating through the rows of each group.
    I might be missing something here but slicing the dataframes is probably much
    faster than iterating over rows.
    Total time: 55.5582 s
    File: /Users/standardizedinventories/
    stewicombo/overlaphandler.py
    Function: get_by_preference at line 36
Line Hits Time Per Hit % Time Line Contents
37 58570 130121.0 2.2 0.2 preferences = INVENTORY_PREFERENCE_BY_COMPARTMENT[group.name]
39 119678 98851.0 0.8 0.2 for pref in preferences:
40 180786 50576395.0 279.8 91.0 for index, row in group.iterrows():
41 119678 4543449.0 38.0 8.2 if pref == row[SOURCE_COL]:
42 58570 209420.0 3.6 0.4 return row

What's important to note here is that while 55 seconds are spent within the function, 261
seconds are spent on line 115 (again, these times are just for eGRID and TRI). So ~80% of the
time seems to be spent overhead apply the function. Figuring out a way to achieve the same
goal without the apply - maybe by slicing rather than using groupby - could save even more
time.

113 comment If we have 2 or more duplicates with same compartment use
INVENTORY_PREFERENCE_BY_COMPARTMENT

Line Hits Time Per Hit % Time Line Contents
114 29285 10662992.0 364.1 1.3 grouped =df_new.groupby(COMPARTMENT_COL)
115 29285 261084505.0 8915.3 30.8 df_new =grouped.apply(get_by_preference)

And finally, since there are so many loops over grouped items, it might be worth trying
to parallelize the code. It's pretty easy with joblib.

stewicombo - duplicates when facility has more than one FRS ID

found by @TJTapajyoti
inventories_of_interest ={'eGRID': 2016, 'TRI': 2016, 'NEI': 2016, 'RCRAInfo': 2015}
emissions_and_wastes_by_facility = stewicombo.combineInventoriesforFacilitiesinOneInventory("eGRID",inventories_of_interest,filter_for_LCI=True)

returns duplicates like this

FRS_ID FacilityID FlowAmount FlowName Source
110001536197 10123 1006345.47519044 Carbon dioxide eGRID
110017313423 10123 1006345.47519044 Carbon dioxide eGRID

Use git lfs for any binary formats

the NEI parquet was added before initiating git lfs..this is ok but please initiate it for this repo and add the .parquet extension we are used to the list of files for git-lfs to track.

GHGRP 2017 is not working with stewi or stewi combo.

Detailed error is here -

runfile('/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/fecm_data_exploration.py', wdir='/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory', current_namespace=True)
/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
warnings.warn(
INFO GHGRP_2017 not found in /Users/tghosh/Library/Application Support/stewi/flowbyfacility
INFO requested inventory does not exist in local directory, it will be generated...
INFO downloading and processing GHGRP data to /Users/tghosh/Library/Application Support/stewi/GHGRP Data Files/tables/2017
ERROR error in url request
Traceback (most recent call last):
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/GHGRP.py", line 130, in get_row_count
table_count = int(table_count[0].firstChild.nodeValue)
IndexError: list index out of range
--- Logging error ---
Traceback (most recent call last):
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/logging/init.py", line 1083, in emit
msg = self.format(record)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/logging/init.py", line 927, in format
return fmt.format(record)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/logging/init.py", line 663, in format
record.message = record.getMessage()
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/logging/init.py", line 367, in getMessage
msg = msg % self.args
TypeError: %i format: a number is required, not NodeList
Call stack:
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/spyder_kernels/console/main.py", line 24, in
start.main()
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/spyder_kernels/console/start.py", line 332, in main
kernel.start()
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 712, in start
self.io_loop.start()
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 199, in start
self.asyncio_loop.run_forever()
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
await self.process_one()
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 499, in process_one
await dispatch(*args)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
await result
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 730, in execute_request
reply_content = await reply_content
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 383, in do_execute
res = shell.run_cell(
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 528, in run_cell
return super().run_cell(*args, **kwargs)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2975, in run_cell
result = self._run_cell(
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell
return runner(coro)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 78, in pseudo_sync_runner
coro.send(None)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3257, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes
if (await self.run_code(code, result, async
=asy)):
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/var/folders/sm/spdh5zkx26v6vh_fk8w7p8456l6sp0/T/ipykernel_18885/414800848.py", line 1, in <cell line: 1>
runfile('/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/fecm_data_exploration.py', wdir='/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory', current_namespace=True)
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/spyder_kernels/customize/spydercustomize.py", line 585, in runfile
exec_code(file_code, filename, ns_globals, ns_locals,
File "/Users/tghosh/miniconda3/envs/stewi/lib/python3.9/site-packages/spyder_kernels/customize/spydercustomize.py", line 465, in exec_code
exec(compiled, ns_globals, ns_locals)
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/fecm_data_exploration.py", line 38, in
save_data(inventory,year)
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/fecm_data_exploration.py", line 14, in save_data
flow_by_facility = stewi.getInventory(inventory, year, 'flowbyfacility',filters=['filter_for_LCI'])
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/init.py", line 80, in getInventory
inventory = read_inventory(inventory_acronym, year, f,
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/globals.py", line 322, in read_inventory
generate_inventory(inventory_acronym, year)
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/globals.py", line 356, in generate_inventory
GHGRP.main(Option = 'A', Year = [year])
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/GHGRP.py", line 696, in main
ghgrp1 = download_and_parse_subpart_tables(year, m)
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/GHGRP.py", line 286, in download_and_parse_subpart_tables
table_df = import_or_download_table(filepath, subpart_emissions_table,
File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/GHGRP.py", line 240, in import_or_download_table
log.info('Downloading %s (rows: %i)', table, row_count)
Message: 'Downloading %s (rows: %i)'
Arguments: ('C_CONFIGURATION_LEVEL_INFO', [])
Traceback (most recent call last):

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/fecm_data_exploration.py", line 38, in
save_data(inventory,year)

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/fecm_data_exploration.py", line 14, in save_data
flow_by_facility = stewi.getInventory(inventory, year, 'flowbyfacility',filters=['filter_for_LCI'])

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/init.py", line 80, in getInventory
inventory = read_inventory(inventory_acronym, year, f,

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/globals.py", line 322, in read_inventory
generate_inventory(inventory_acronym, year)

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/globals.py", line 356, in generate_inventory
GHGRP.main(Option = 'A', Year = [year])

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/GHGRP.py", line 696, in main
ghgrp1 = download_and_parse_subpart_tables(year, m)

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/GHGRP.py", line 286, in download_and_parse_subpart_tables
table_df = import_or_download_table(filepath, subpart_emissions_table,

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/GHGRP.py", line 242, in import_or_download_table
table_df = download_chunks(table=table, table_count=row_count, m=m,

File "/Users/tghosh/OneDrive - NREL/work_NREL/FECM/Industrial-Emissions-Inventory/stewi/GHGRP.py", line 141, in download_chunks
while row_start <= table_count:

TypeError: '<=' not supported between instances of 'int' and 'NodeList'

[GHGRP] Validation issues in subparts I, L, T

Minor validation issues persist in GHGRP data for Subpart I, L, T for the less common GHGs. This is due to mixed reporting in mass or kg CO2eq in the source data which makes validation challenging.

issue saving mismatched type to parquet

e.g. the following code, where eGRID is passed as int instead of string will generate an error when saving the inventory

import stewicombo
df = stewicombo.combineInventoriesforFacilitiesinBaseInventory("GHGRP", {"NEI":"2018","GHGRP":"2018", "eGRID":2018}, remove_overlap=True)
stewicombo.saveInventory('my_file', df, {"NEI":"2018","GHGRP":"2018", "eGRID":2018})

pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 'Conversion failed for column Year with type object')

[GHGRP] Subpart AA data duplicating subpart C

There appears to be an error in the GHGRP subpart AA raw data on envirofacts. All the subpart AA data in the table ‘AA_SUBPART_LEVEL_INFORMATION’ are actually reporting subpart C data. This results in double-counting some subpart C data and not counting subpart AA data at all.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.