Code Monkey home page Code Monkey logo

flowsa's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flowsa's Issues

Check that NAICS coming in from NAICS like sources is present in our NAICS list for the given year

This is an issue discovered by @bl-young with a NAICS code coming in from RCRA via stewi that was not a valid 2012 NAICS code.
Related issue is here USEPA/useeior#83

In any NAICS like sources we need to check that NAICS codes are in our NAICS code list. If they are not present, we probably need to check if they are present in an older NAICS schema and see if we can apply a mapping to get it into the current NAICS schema (2012 at the moment)

XLRD issue.

File "C:\Users\MelissaC\Envs\flowsa\lib\site-packages\xlrd_init_.py", line 170, in open_workbook
raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported

Occurred when running --year 2012 --source EIA_CBECS_Land.

Permit user passed FBS YAML

Allow flow-by-sector YAMLs to be kept outside of the package and passed for processing in a getFlowBySector() call. This will allow development of FBS methods outside of the main flowsa repo

pandas.np deprecation

Starting here and used in this function is pd.np.where

df.loc[:, 'FlowName'] = pd.np.where(df.Description.str.contains("fresh"), "fresh",

Its throwing a warning of future deprecation.
FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead

Need warning when attempting to obtain data for unavailable year

fba = flowsa.getFlowByActivity(datasource="EIA_MECS_Energy", year=2017)
returns a bad zip file error because 2017 doesn't exist for EIA_MECS

Perhaps flowbyactivity could check against the list of years in the FBA yaml. This would also help prevent errors when new data are released (e.g. 2018 for EIA_MECS) but not yet tested

Transportation Satellite Account FBA not generating properly

I've verified that the TSA data frame is correctly parsed by the tsa_parse() function from BTS_TSA.py, but then the FBA is empty. So somewhere between the data frame being parsed and the final FBA being written, all the rows are dropped. I've spent some time on it, but I may need some help figuring out this issue.

get fba subset issue

When using USDA crop data as allocation source, need to aggregate down from 7 --> 6 digit NAICs for cases when the crosswalk is already based on a six digit NAICS.

Similarly need to aggregate up from 5 --> 6.

That is, an activity that is split between two NAICS eg: 111140 & 111920 in the crosswalk, will get missed in the get_fba_allocation_subset because it only appears as 11192.
Or a crosswalk for 111120 will get missed as these only appear as 111120A

NAICS in stewicombo highly sensitive to FRS data

Assigning stewicombo data to sectors uses FRS NAICS assignments. In some cases, multiple NAICS are listed for an individual dataset. The first listed NAICS for a specific inventory in the FRS system is used.

In some cases the available NAICS are wildly different.

# use NAICS from facility matcher so drop them here
facility_mapping.drop(columns=['NAICS'], inplace=True)
# merge dataframes to assign facility information based on facility IDs
df = pd.merge(df, facility_mapping, how='left',
on='FacilityID')
all_NAICS = obtain_NAICS_from_facility_matcher(inventory_list)
df = pd.merge(df, all_NAICS, how='left', on=['FRS_ID', 'Source'])

esupy 'dependency conflict'

Howdy,
I'm receiving the following error when following the installation instructions.

Collecting esupy@ git+git://github.com/USEPA/[email protected]#egg=esupy
  Cloning git://github.com/USEPA/esupy (to revision v0.1.7) to /tmp/pip-install-v0p7lwqy/esupy_0e824dc63ad04367b1ae3e24b3bdde5d
  Running command git clone -q git://github.com/USEPA/esupy /tmp/pip-install-v0p7lwqy/esupy_0e824dc63ad04367b1ae3e24b3bdde5d
  Running command git checkout -q c04efa5aefc82a317776a2b2b1b20fae01b5fce7
INFO: pip is looking at multiple versions of fedelemflowlist to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of esupy to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of flowsa to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install flowsa and flowsa==0.2.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    flowsa 0.2.1 depends on esupy 0.1.7 (from git+https://github.com/USEPA/[email protected]#egg=esupy)
    stewi 0.9.9 depends on esupy 0.1.7 (from git+git://github.com/USEPA/[email protected]#egg=esupy)

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

steps to recreate

$ mkdir flowsa-test
$ python3 -m venv flowsa-venv
$ source flowsa-venv/bin/activate
$ pip install git+https://github.com/USEPA/flowsa

pip version 21.0.1
python version 3.9.6
Following the installation instructions outside of a virtual environment was also unsuccessful.

I was able to successfully install a fork by changing the 'git' to 'https' in setup.py but then I ran into more import problems and wasn't sure if i was causing more problems than I was fixing.

URL concatentation missing '&' when forming Quickstats

When indirectly calling on the code to generate the FBA for the CoA Cropland, it creates this URL where the xxxx is a valid API key

2022-01-11 10:50:10 INFO     Calling https://quickstats.nass.usda.gov/api/api_GET/?key=xxxxEsource_desc=CENSUS&sector_desc=ECONOMICS&statisticcat_desc=AREA%26statisticcat_desc%3DAREA+OPERATED&commodity_desc=AG+LAND%26commodity_desc%3DFARM+OPERATIONS&unit_desc=ACRES%26unit_desc%3DOPERATIONS&agg_level_desc=NATIONAL&year=2017
ERROR Error in URL request!

I can see that after the key value there is no ampersand to indicate the next URL parameter

Error when all flows are 0 for an activity

For a particular activity, all flows are reported as zero. When that flow_subset is passed to agg_by_geoscale (here), all flows that are 0 get dropped after aggregating. This generates an error later when sectors are added to that flow_subset.

USDA_CoA_Cropland has mixed flow classes

The OPERATIONS flows are not a unit of Land, but rather of Other, so these should be stored in a separate parquet file. at this point flowsa doesn't support having more than one class stored together in a parquet. If that needs to change then we need to change source catalog to have a list for class instead of a string

Allow for more than 2 FBAs in allocation of an FBS activity set

Current method only allows for "allocation_source" and "helper_source" FBAs for FBS activity set allocation. Modify the FBS yaml to allow for unlimited FBAs to be called for allocation. Changing methodology will make methodology more transparent and limit the number of FBAs hardcoded into cleaning functions.

Inconsistent use of source names

Ideally the sources need to be consistent across these uses:
in fba data 'SourceName' column
in the file names of the parquet
in the Crosswalk file names
in the Source Catalog

The major issue is that we have generally have a provider, an inventory/report, and a specific table or flow type.

FBS: some NAICS dropping

When expanding the NAICS list, more detailed sectors will get dropped if another entry in the mapping already maps to those sectors. Line 119-120

With NEI data, some activities are more specific than others. E.g. an SCC might apply to all Agriculture (NAICS: 111) while another SCC will apply to specific crops (NAICS: 11114). When both are present, the first SCC won't get assigned to 11114 because it drops out from the mapping.

My preference would be to exclude this step but I dont know how that would impact other mappings. @catherinebirney

Underutilization of `.yaml`s

Raising this as an issue now for awareness and discussion. I think we are currently hardcoding too many things in the data source script .py files and not making enough use of the .yaml files. For example, information on how to modify the NEI data url for different years, or information on how columns should be renamed (again, I'm thinking about the NEI data source in particular at the moment) and how the renaming depends on the year, could be included as dicts in the EPA_NEI.yaml files.

One advantage of offloading this information to the .yaml files is that it should simplify the process of adding new data years (e.g. the 2020 NEI, or 2018 and 2019 EQUATES data).

I'm working on the FBA methods and scripts for the EQUATES data right now, and I'll link to them once I'm finished, by way of an example of what I'm thinking.

Air transportation emissions (NEI) misaligned

The NEI captures emissions from landings and takeoffs (LTO) which are assigned to airports in the NEI point dataset. In most cases, these would end up assigned to NAICS that would get assigned to 48A000 - Scenic and sightseeing transportation and support activities for transportation. Instead they should be assigned to Air Transportation (481000)

See NEI TSD: https://www.epa.gov/sites/production/files/2018-06/documents/nei2014v2_tsd_09may2018.pdf; section 3.2

Emissions from LTO are noted by specific SCCs, primarily 2275020000

Import error: No module named 'stewi.globals'; 'stewi' is not a package

After installing with pip, attempting to access from flowsa:
...
File "C:\Users\cbirney\git_projects\flowsa\flowsa\stewi.py", line 31, in stewicombo_to_sector
import stewicombo
File "C:\Users\cbirney\AppData\Local\Programs\Python\Python37\lib\site-packages\stewicombo_init_.py", line 5, in
from stewicombo.overlaphandler import aggregate_and_remove_overlap
File "C:\Users\cbirney\AppData\Local\Programs\Python\Python37\lib\site-packages\stewicombo\overlaphandler.py", line 3, in
from stewi.globals import log
ModuleNotFoundError: No module named 'stewi.globals'; 'stewi' is not a package

It appears that duplicate globals.py could be causing an issue here?
cc: @catherinebirney

export log to txt

Given the amount of information in the logger, it would be nice to save this output after generating an FBS

Remove class param from flowbyactivity

The class parameter and in a list format that is awkward and perhaps not necessary to have in fba. We could easily remove the class requirement and users could apply a simple subset command if they wanted to get only one class.

Possible infinite loop in common.py

In common.find_true_file_path(), if the directory, filename, and extension do not combine into a valid path, and if removing chunks of the filename following _ does not lead to a valid path (e.g. due to a typo, or an error in the directory or extension), the function can get stuck in an infinite loop.

materialflowlist import error

Running "CRHW_national_2017" returns error:

import materialflowlist as mfl

ModuleNotFoundError: No module named 'materialflowlist'

@bl-young We need to add materialflowlist to package requirements/setup. Do you have a specific version? Or did you want to make this package an optional requirement?

flownames in EPA GHG fbas

FlowNames stored in GHG fbas need to be corrected
Flownames in the GHGI generally are CO2, CH4, N2O for main gases

reported from stored fba files

examples:
in T_2_1: Recent_Trends... this must be replaced with the gas name like above
in T_3_10: CH4 Emissions from Stationary Combustion should be CH4

These all need to be reviewed and fixed.
fyi @catherinebirney

dynamically_import_fxn breaking for stewi

Accessing the FBS_outside_flowsa functions through the dynamic import is failing for stewi related data. I think the yaml's need to be revised

ModuleNotFoundError: No module named 'flowsa.data_source_scripts.stewicombo'

Joblib cache arbitrarily located, can fail

If the joblib cache is located in '.cache', then it attempts to create the '.cache' directory in the current working directory, wherever that may be. If the user does not have permissions there, then importing flowsa fails.

parquet version number based on user's egg-info

The version number attached to a FBA or FBS parquet is based on the version in a user's directory flowsa.egg-info/PKG-INFO. The pkg-info is created when a user runs pip installl -e flowsa.

To update egg-info, a user can run: python setup.py bdist_egg

Consider making an updateable version parameter that is called on when naming parquets?

@WesIngwersen @bl-young

Consider only allowing getFlowByActivity to take a single year and use an int

This function as of now is just messy to send a list as opposed to a single year when we don't have any use cases for multiple years.
It would be just as easy for the user to call the function multiple times for multiple years (like in a loop) and concatenate.

If it takes a single year it will also be more consistent with getFlowBySector.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.