Code Monkey home page Code Monkey logo

rabpro's Introduction

DOI badge

Package to delineate watershed basins and compute attribute statistics using Google Earth Engine.

Setup

Software installation Data configuration Software configuration

Usage

See Example notebooks:

Data configuration Basic workflow Multiple basins workflow Basin stats examples

Citation

The following text is the current citation for rabpro:

Schwenk, J., T. Zussman, J. Stachelek, and J. Rowland. (2022). rabpro: global watershed boundaries, river elevation profiles, and catchment statistics. Journal of Open Source Software, 7(73), 4237, https://doi.org/10.21105/joss.04237.

If you delineate watersheds, you should cite either or both (depending on your method) of HydroBasins:

Lehner, B., Grill G. (2013). Global river hydrography and network routing: baseline data and new approaches to study the world’s large river systems. Hydrological Processes, 27(15): 2171–2186. https://doi.org/10.1002/hyp.9740

or MERIT-Hydro:

Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., & Pavelsky, T. M. (2019). MERIT Hydro: A high‐resolution global hydrography map based on latest topography dataset. Water Resources Research, 55(6), 5053-5073. https://doi.org/10.1029/2019WR024873

Development

Testing

python -m pytest
python -m pytest -k "test_img"

Local docs build

cd docs && make html

Contributing

We welcome all forms of user contributions including feature requests, bug reports, code, and documentation requests - simply open an issue.

Note that rabpro adheres to Black code style and NumPy-style docstrings for documentation. We ask that contributions adhere to these standards as much as possible. For code development contributions, please contact us via email (rabpro at lanl [dot] gov) to be added to our slack channel where we can hash out a plan for your contribution.

rabpro's People

Contributors

actions-user avatar dependabot[bot] avatar jonschwenk avatar jsta avatar rivfam avatar tzussman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

rabpro's Issues

Remove OpenCV dependency

See #48

Only one OpenCV function is used (findContours) in regionprops() in utils.py. If the skimage equivalent has the same behavior, use that instead. I think the function is currently optimized for OpenCV, so something like this could work, but long-term it'd be better to rewrite this for skimage.

Add function to build a GEE vector asset

Something like:

temp_dir = Path("temp")
basins.to_file(filename="temp/" + out_path + ".shp", driver='ESRI Shapefile')

with zipfile.ZipFile(out_path + ".zip", 'w') as zipf:
    for f in temp_dir.glob("*"):
        zipf.write(f, arcname=f.name)

get_merit_dem script error

Running the example in the docs I get the following error:

merit_dem(args.target, args.username, args.password)
  File "Data/scripts/get_merit_dem.py", line 32, in merit_dem
    url = [x["href"][2:] for x in soup.findAll("a", text=re.compile(filename), href=True)][0]
IndexError: list index out of range
make: *** [Makefile:2: merit_dem] Error 1

Add option to override appdirs folders

The entire merit dataset is huge. I can't store it on my C drive.

I favor checking for an environment variable like $rabpro_data:

os.environ['rabpro_data']

Linting/formatting

Per Slack conversion, going through and linting/formatting py files using black.

Include a multibasin test dataset

Currently out tests run on only a single coordinate pair (see tests/data/test_coords.shp). Visualizing rabpro output on a single subbasin is kind of underwhelming.

Design method for tracking and including user-added datasets

When users add their own datasets (images/imagecollections) to GEE, they will also need to incorporate the metadata information somehow so rabpro has what it needs to compute statistics. It's not clear to me how this is supposed to happen (perhaps it's already designed?), but we should have a clear procedure. For datasets that we (rabpro developers) make public, these could be included in the fetched metadata file so that all users would have access to them (i.e. not just on our local machines).

Mismatched coords and da in test and test reference

I can't get tests/test.py to pass. One thing I see is that the coords and da are mismatched between the code used to generate the reference file (tests/basic_no_subbasin_stats.py) and the test file (tests/test.py). The former has:

coords = (32.97287, -88.15829)
da = 18680

while the latter has:

coords = (56.22659, -130.87974)
da = 1994

My investigation shows that the test object matches the info specified in test.py so the basic_no_subbasin_stats.py info should probably be changed to match.

basin_stats trouble

I'm having trouble running basin_stats:

rpo.basin_stats([Dataset("JRC/GSW1_3/GlobalSurfaceWater", "occurrence")])

Computing subbasin stats for JRC/GSW1_3/GlobalSurfaceWater...
Traceback (most recent call last):
File "", line 1, in
File "/home/jemma/Documents/Science/LosAlamos/Projects/rabpro/rabpro/core.py", line 296, in basin_stats
self.stats = ss.main(self.basins, datasets, verbose=self.verbose)
File "/home/jemma/Documents/Science/LosAlamos/Projects/rabpro/rabpro/subbasin_stats.py", line 155, in main
for f, header in reducer_funcs:
TypeError: 'NoneType' object is not iterable

On L296 of core.py the subbasin_stats.main function is run with 3 arguments (reducer_funcs is unspecified) but it seems that subbasin_stats.main expects the reducer_funcs argument to be specified (see subbasin_stats.py#L139).

elev_profile error

Running this formerly working snippet:

import geopandas as gpd
import rabpro
from rabpro import utils
from rabpro.subbasin_stats import Dataset

coords_file = gpd.read_file(r"tests/data/Big Blue River.geojson")
rpo = rabpro.profiler(coords_file)
rpo.delineate_basins()
rpo.elev_profile()

Gives an error:

UnboundLocalError: local variable 'flowpath' referenced before assignment

Remove 'jschwenk' from paths

We should replace 'jschwenk' in utils.py.get_datapaths() with 'rabpro' or something more generic. I would rather just eliminate the extra embedded directly entirely, but if I recall correctly appdirs requires it...

Handling cases where the underlying raster resolution is ~ or > than the size of the polygon feature

We haven't thought about this much. Sometimes the watershed polygons are much smaller than a single pixel of the requested raster (e.g. GLDAS at 0.25 degrees). When the polygon overlaps only 3-4 of these pixels, originally I had code that would compute an areal-weighted average by intersecting the polygon with all the nearby pixels. I don't know how this could/should be handled in GEE, but it could be important.

See the comment here: https://groups.google.com/g/google-earth-engine-developers/c/2VG0uEFmKcU/m/PH-n8csCAwAJ which suggests weighted reducers. Actually the weighted reducers help page might offer a quick and easy solution: https://developers.google.com/earth-engine/guides/reducers_weighting

Turn off Windows in build workflow

Something is off with the file hash testing on the Windows build workflow causing unittest to fail. I feel like this is unrelated to actual package operation so we can turn it off.

Don't date-filter image collection bands that are not time indexed

subbasin_stats is working fine for me using the example "image" Dataset ("JRC/GSW1_3/GlobalSurfaceWater", "occurrence")

However, "image_collection" entries like Dataset("JRC/GSW1_3/MonthlyRecurrence", "monthly_recurrence") give me either an empty file on my GDrive or an error:

TypeError: cannot unpack non-iterable NoneType object

depending on if test is True.

cli downloading outside of the rabpro package

Inside the rabpro package I can download data with:

./rabpro/cli/rabpro download merit n30w090 <username> <password>

However, with rabpro installed in a different environment it seems like I should be able to run:

rabro download merit n30e150 <username> <password>

However, I get an error:

Command 'rabro' not found, did you mean:...

Change default export directory

When I exported the basin shapefiles (self.export('all')), it defaulted to c:\users\jon\results\name_of_run. I think the exports should probably be in the appdirs folder as well, no? Either that or force the user to supply a directory to result outputs...

Commit data directory structure

My preference would be to commit the Data directory structure expected by rabpro. The data itself would be added to the .gitignore and the git tracked folders would only contain .gitkeep files. I plan to open a PR showing this for us to discuss.

Including a geopackage layer of the MERIT grid

Then we could intersect a given "coords file" to figure out what data needs to be downloaded for a job.

The MERIT data is prepared as 5 degree x 5 degree tiles (6000 pixel x 6000 pixel) but it's packaged as 30 degree x 30 degree "megatiles". These megatile codes are the important piece of information needed to point to a specific data download.

image

Make data directory structure by default

When a user instantiates the profiler(), if they have not downloaded any data, they will get an error message stating that the ../rabpro/jschwenk data directory doesn't exist. I think we should create the data folders where they belong so that the structure is there if they download e.g. MERIT tiles via web browser. They'll know where to put them instead of trying to figure it out.

So rabpro should just try to create the empty folders and print a message that says "No DEM/HydroBasins data were found. Empty directories have been created at {} to store them. You can download the MERIT data with [name of merit downloading script]".

Or something like that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.