usepa / flowsa Goto Github PK
View Code? Open in Web Editor NEWLibrary that attributes resource use, waste, emissions, and loss to economic sectors
License: MIT License
Library that attributes resource use, waste, emissions, and loss to economic sectors
License: MIT License
"02270" "46113" "51515" exist in BEA State Employment but not in /flowsa/data/FIPS.csv.
@WesIngwersen I will work with @catherinebirney to investigate this.
includes for instance 2211 and 1125
applies to https://github.com/USEPA/flowsa/blob/master/flowsa/data/flowbysectormethods/Water_national_2010_m2.yaml
and
https://github.com/USEPA/flowsa/blob/master/flowsa/data/flowbysectormethods/Water_national_2015_m2.yaml
Version # in setup still show v0.0.1 even though there is a tag/release for a previous version as v0.0.2
Line 5 in c6032b8
I find the methods can get a little bit lost within the data
folder next to many of the other subfolders. Would it make sense to move flowbysectormethods and flowbyactivitymethods up a level into a folder called methods
?
This is an issue discovered by @bl-young with a NAICS code coming in from RCRA via stewi that was not a valid 2012 NAICS code.
Related issue is here USEPA/useeior#83
In any NAICS like sources we need to check that NAICS codes are in our NAICS code list. If they are not present, we probably need to check if they are present in an older NAICS schema and see if we can apply a mapping to get it into the current NAICS schema (2012 at the moment)
File "C:\Users\MelissaC\Envs\flowsa\lib\site-packages\xlrd_init_.py", line 170, in open_workbook
raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported
Occurred when running --year 2012 --source EIA_CBECS_Land.
Location is hard coded but needs to use the variable instead of string
https://github.com/USEPA/flowsa/blob/master/flowsa/EIA_MECS.py#L157
Allow flow-by-sector YAMLs to be kept outside of the package and passed for processing in a getFlowBySector() call. This will allow development of FBS methods outside of the main flowsa repo
Starting here and used in this function is pd.np.where
Line 105 in 72b7b18
Its throwing a warning of future deprecation.
FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead
fba = flowsa.getFlowByActivity(datasource="EIA_MECS_Energy", year=2017)
returns a bad zip file error because 2017 doesn't exist for EIA_MECS
Perhaps flowbyactivity could check against the list of years in the FBA yaml. This would also help prevent errors when new data are released (e.g. 2018 for EIA_MECS) but not yet tested
I've verified that the TSA data frame is correctly parsed by the tsa_parse()
function from BTS_TSA.py
, but then the FBA is empty. So somewhere between the data frame being parsed and the final FBA being written, all the rows are dropped. I've spent some time on it, but I may need some help figuring out this issue.
What is the case where the url_list is going to be empty?
Should this throw some kind of exception?
This check may be better off in assemble_urls
or in main()
flowsa/flowsa/flowbyactivity.py
Line 124 in e81ecbe
Differences in package version requirements between flowsa and esupy throw errors
See:
https://github.com/USEPA/esupy/blob/main/requirements.txt
and
https://github.com/USEPA/flowsa/blob/master/requirements.txt
When using USDA crop data as allocation source, need to aggregate down from 7 --> 6 digit NAICs for cases when the crosswalk is already based on a six digit NAICS.
Similarly need to aggregate up from 5 --> 6.
That is, an activity that is split between two NAICS eg: 111140 & 111920 in the crosswalk, will get missed in the get_fba_allocation_subset
because it only appears as 11192.
Or a crosswalk for 111120 will get missed as these only appear as 111120A
I'm getting an error occured message in this try/except here
Assigning stewicombo data to sectors uses FRS NAICS assignments. In some cases, multiple NAICS are listed for an individual dataset. The first listed NAICS for a specific inventory in the FRS system is used.
In some cases the available NAICS are wildly different.
flowsa/flowsa/data_source_scripts/stewiFBS.py
Lines 86 to 94 in c31cd44
If no git hash is identified, the log files are stored as e.g.,:
CRHW_national_2017_v0.3_None.log
When saving files in esupy there is a check first for this parameter before appending to the name.
https://github.com/USEPA/esupy/blob/main/esupy/processed_data_mgmt.py#L242-L243
Howdy,
I'm receiving the following error when following the installation instructions.
Collecting esupy@ git+git://github.com/USEPA/[email protected]#egg=esupy
Cloning git://github.com/USEPA/esupy (to revision v0.1.7) to /tmp/pip-install-v0p7lwqy/esupy_0e824dc63ad04367b1ae3e24b3bdde5d
Running command git clone -q git://github.com/USEPA/esupy /tmp/pip-install-v0p7lwqy/esupy_0e824dc63ad04367b1ae3e24b3bdde5d
Running command git checkout -q c04efa5aefc82a317776a2b2b1b20fae01b5fce7
INFO: pip is looking at multiple versions of fedelemflowlist to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of esupy to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of flowsa to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install flowsa and flowsa==0.2.1 because these package versions have conflicting dependencies.
The conflict is caused by:
flowsa 0.2.1 depends on esupy 0.1.7 (from git+https://github.com/USEPA/[email protected]#egg=esupy)
stewi 0.9.9 depends on esupy 0.1.7 (from git+git://github.com/USEPA/[email protected]#egg=esupy)
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
steps to recreate
$ mkdir flowsa-test
$ python3 -m venv flowsa-venv
$ source flowsa-venv/bin/activate
$ pip install git+https://github.com/USEPA/flowsa
pip version 21.0.1
python version 3.9.6
Following the installation instructions outside of a virtual environment was also unsuccessful.
I was able to successfully install a fork by changing the 'git' to 'https' in setup.py but then I ran into more import problems and wasn't sure if i was causing more problems than I was fixing.
When indirectly calling on the code to generate the FBA for the CoA Cropland, it creates this URL where the xxxx is a valid API key
2022-01-11 10:50:10 INFO Calling https://quickstats.nass.usda.gov/api/api_GET/?key=xxxxEsource_desc=CENSUS§or_desc=ECONOMICS&statisticcat_desc=AREA%26statisticcat_desc%3DAREA+OPERATED&commodity_desc=AG+LAND%26commodity_desc%3DFARM+OPERATIONS&unit_desc=ACRES%26unit_desc%3DOPERATIONS&agg_level_desc=NATIONAL&year=2017
ERROR Error in URL request!
I can see that after the key value there is no ampersand to indicate the next URL parameter
Per @a-w-beck's recommendation, set the packages in requirements.txt and setup.py to specific version numbers, rather than using ">=" or "<" to prevent flowsa code from breaking
For a particular activity, all flows are reported as zero. When that flow_subset is passed to agg_by_geoscale
(here), all flows that are 0 get dropped after aggregating. This generates an error later when sectors are added to that flow_subset.
The OPERATIONS flows are not a unit of Land, but rather of Other, so these should be stored in a separate parquet file. at this point flowsa doesn't support having more than one class stored together in a parquet. If that needs to change then we need to change source catalog to have a list for class instead of a string
Current method only allows for "allocation_source" and "helper_source" FBAs for FBS activity set allocation. Modify the FBS yaml to allow for unlimited FBAs to be called for allocation. Changing methodology will make methodology more transparent and limit the number of FBAs hardcoded into cleaning functions.
The pyarrow dependency conflicts with some other EPA packages. I'm not even sure its needed anymore with esupy
Need a table like that for flowbyactivity of all available flowbysector
Ideally the sources need to be consistent across these uses:
in fba data 'SourceName' column
in the file names of the parquet
in the Crosswalk file names
in the Source Catalog
The major issue is that we have generally have a provider, an inventory/report, and a specific table or flow type.
When expanding the NAICS list, more detailed sectors will get dropped if another entry in the mapping already maps to those sectors. Line 119-120
With NEI data, some activities are more specific than others. E.g. an SCC might apply to all Agriculture (NAICS: 111) while another SCC will apply to specific crops (NAICS: 11114). When both are present, the first SCC won't get assigned to 11114 because it drops out from the mapping.
My preference would be to exclude this step but I dont know how that would impact other mappings. @catherinebirney
Line 38 in 9a22cb0
from flowsa.USDA_CoA_Cropland_NAICS import disaggregate_usda_coa_cropland_naics
ImportError: cannot import name 'disaggregate_usda_coa_cropland_naics' from 'flowsa.USDA_CoA_Cropland_NAICS'
The Water_national_2015_m1
method is returning None for this field
FlowType needs to be one of the acceptable values; see the format specs
Raising this as an issue now for awareness and discussion. I think we are currently hardcoding too many things in the data source script .py
files and not making enough use of the .yaml
files. For example, information on how to modify the NEI data url for different years, or information on how columns should be renamed (again, I'm thinking about the NEI data source in particular at the moment) and how the renaming depends on the year, could be included as dicts in the EPA_NEI.yaml
files.
One advantage of offloading this information to the .yaml
files is that it should simplify the process of adding new data years (e.g. the 2020 NEI, or 2018 and 2019 EQUATES data).
I'm working on the FBA methods and scripts for the EQUATES data right now, and I'll link to them once I'm finished, by way of an example of what I'm thinking.
The NEI captures emissions from landings and takeoffs (LTO) which are assigned to airports in the NEI point dataset. In most cases, these would end up assigned to NAICS that would get assigned to 48A000 - Scenic and sightseeing transportation and support activities for transportation. Instead they should be assigned to Air Transportation (481000)
See NEI TSD: https://www.epa.gov/sites/production/files/2018-06/documents/nei2014v2_tsd_09may2018.pdf; section 3.2
Emissions from LTO are noted by specific SCCs, primarily 2275020000
After installing with pip, attempting to access from flowsa:
...
File "C:\Users\cbirney\git_projects\flowsa\flowsa\stewi.py", line 31, in stewicombo_to_sector
import stewicombo
File "C:\Users\cbirney\AppData\Local\Programs\Python\Python37\lib\site-packages\stewicombo_init_.py", line 5, in
from stewicombo.overlaphandler import aggregate_and_remove_overlap
File "C:\Users\cbirney\AppData\Local\Programs\Python\Python37\lib\site-packages\stewicombo\overlaphandler.py", line 3, in
from stewi.globals import log
ModuleNotFoundError: No module named 'stewi.globals'; 'stewi' is not a package
It appears that duplicate globals.py could be causing an issue here?
cc: @catherinebirney
I believe the nested structure of the .py files leads to import issues
see: https://github.com/USEPA/flowsa/runs/3907368775?check_suite_focus=true
I think this can be resolved by including inti.py in each subdirectory (e.g. data_source_scripts)
https://sweetcode.io/python-file-importation-multi-level-directory-modules-packages/
For the common use case of satellite tables in USEEIO, the need is generally to have one sector associated with a flow record.
Given the amount of information in the logger, it would be nice to save this output after generating an FBS
We are still trying to decide the best way to handle both the unit converts of flows and the mapping to the FEDEFL in the flowbysector logic
Error occurs here
Line 398 in c6032b8
Flowbysector calls this function and it fails ..user would not be expected to have the file it looks for in the output folder
The class parameter and in a list format that is awkward and perhaps not necessary to have in fba. We could easily remove the class requirement and users could apply a simple subset command if they wanted to get only one class.
In common.find_true_file_path()
, if the directory, filename, and extension do not combine into a valid path, and if removing chunks of the filename following _
does not lead to a valid path (e.g. due to a typo, or an error in the directory or extension), the function can get stuck in an infinite loop.
Running "CRHW_national_2017" returns error:
import materialflowlist as mfl
ModuleNotFoundError: No module named 'materialflowlist'
@bl-young We need to add materialflowlist to package requirements/setup. Do you have a specific version? Or did you want to make this package an optional requirement?
FlowNames stored in GHG fbas need to be corrected
Flownames in the GHGI generally are CO2, CH4, N2O for main gases
reported from stored fba files
examples:
in T_2_1: Recent_Trends...
this must be replaced with the gas name like above
in T_3_10: CH4 Emissions from Stationary Combustion
should be CH4
These all need to be reviewed and fixed.
fyi @catherinebirney
Reported by @MoLi7
Read in of flowbyactivity parquet files failing with footer error when created by another user
Accessing the FBS_outside_flowsa functions through the dynamic import is failing for stewi related data. I think the yaml's need to be revised
ModuleNotFoundError: No module named 'flowsa.data_source_scripts.stewicombo'
Currently these two parameters need to be set to None; if they are not present a KeyError is raised. As is done for some other parameters, the presence of these specs could be tested first. This would reduce unnecessary information in the FBS method files.
Since stewi is called it needs to be added. See the fedelemflowlist example for how to add a github package. Make sure the specific release is specified and not a >=.
If the joblib cache is located in '.cache'
, then it attempts to create the '.cache'
directory in the current working directory, wherever that may be. If the user does not have permissions there, then importing flowsa
fails.
The version number attached to a FBA or FBS parquet is based on the version in a user's directory flowsa.egg-info/PKG-INFO. The pkg-info is created when a user runs pip installl -e flowsa.
To update egg-info, a user can run: python setup.py bdist_egg
Consider making an updateable version parameter that is called on when naming parquets?
This function as of now is just messy to send a list as opposed to a single year when we don't have any use cases for multiple years.
It would be just as easy for the user to call the function multiple times for multiple years (like in a loop) and concatenate.
If it takes a single year it will also be more consistent with getFlowBySector.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.