deltares / pyflwdir Goto Github PK

View Code? Open in Web Editor NEW

68.0 10.0 23.0 31.16 MB

Fast methods to work with hydro- and topography data in pure Python.

Home Page: https://deltares.github.io/pyflwdir/latest

License: MIT License

Python 100.00%

hydrography ldd d8 hydrology topography watershed basins

pyflwdir's Introduction

PyFlwDir: Fast methods to work with hydro- and topography data in pure Python

Intro

PyFlwDir contains a series of methods to work with gridded DEM and flow direction datasets, which are key to many workflows in many earth sciences. PyFlwDir supports several flow direction data conventions and can easily be extended to include more. The package contains some unique methods such as Iterative Hydrography Upscaling (IHU) method to upscale flow directions from high resolution data to coarser model resolution.

PyFlwDir is in pure python and powered by numba to keep it fast.

Featured methods

flow directions from elevation data using a steepest gradient algorithm
strahler stream order
flow direction upscaling
(sub)basin delineation
pfafstetter subbasins delineation
classic stream order
height above nearest drainage (HAND)
geomorphic floodplain delineation
up- and downstream tracing and arithmetics
hydrologically adjusting elevation
upstream accumulation
vectorizing streams
many more!

Installation

See installation guide

Quickstart

See user guide

Reference API

See reference API

Development and Testing

Welcome to the PyFlwDir project. All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome. See Contributing to PyFlwDir for how we work.

pyflwdir's People

Contributors

Stargazers

Watchers

pyflwdir's Issues

unexpected results adjust_dem

np.random.seed(11)
dem = np.random.rand(15, 10)
flwdir = pyflwdir.from_dem(dem, outlets='min')
dem1 = flwdir.dem_adjust(dem)
np.all((dem1 - flwdir.downstream(dem1))>=0)
>> False

prioritize non-boundary cells with same elevation over boundary cells in dem.fill_depressions

prettify IHU upscaling code

streams array can be simplified, no need to save cell indices
lots of duplicate code between upscale_check and upscale_error methods
remove unused code
improve comments. consistent use of pixel vs cell in terminology.

Distance to river network

My apologies for opening this issue as I don't know how else to do this, but I was wondering if the pyfldir toolkit can be used to calculate the distance to a river network? I've been desperately looking for an open-source tool to do this. So a method that uses a flow direction raster to calculate the flow distance to the nearest stream.

error in plotting the river mouths using the nextxy.bin dataset

Hi @DirkEilander ,

I want to identify all of the river mouth and make a plot using the 3 arcmin river network dataset nextxy.bin.

But it is odd that the figure I plotted here seems reversed in the latitude. Could you please have a look? Many thanks.

Best regards,
Jiangchao

my scripts:

import pyflwdir
import numpy as np

bbox=[-180, -90, 180, 90]
data, transform = pyflwdir.read_nextxy(
fn='glb_03min/nextxy.bin', nrow=3600, ncol=7200, bbox=bbox)

import xarray as xr
xr.DataArray(data[0]).plot()

x = data[0]
lon = np.linspace(-180,180,7200)
lat = np.linspace(-90,90,3600)
lon[np.where(x==-9)[1]],lat[np.where(x==-9)[0]]

import geopandas as gpd
from shapely.geometry import Point
gdf = gpd.GeoDataFrame([Point(x, y) for x,y in zip(lon[np.where(x==-9)[1]], lat[np.where(x==-9)[0]])])
gdf.columns = ['geometry']

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
fig, ax = plt.subplots()
gdf.plot(ax=ax,markersize=0.1, color='red')
world.plot(ax=ax);

check compatibility with latest numpy and py3.11

downstream hydromt fails because of what i think is an issue with a new numpy version

https://github.com/Deltares/hydromt/actions/runs/6430182597/job/17460685968?pr=535

Question: How can I create shapefiles of the upstream area of >10,000 points?

Hi,

I would like to create polygons of the contributing area for > 10,000 points. I have all information that I need for this, including a 30m DEM, a river network (LineString geodataframe file) and derived flow direction raster. All compiled using pyflwdir.

I'm currently using the pyflwdir tool, but this takes around 5 minutes per point.. Has anyone any clues on how to approach this easier and quicker? Also, to insert the streams from a LineString geodataframe (instead of this flow.stream_order() > 4 command). This is my current script:

with rasterio.open(flwdir_fn, "r") as src:
flwdir = src.read(1)
crs = src.crs
flw = pyflwdir.from_array(
flwdir,
ftype="d8",
transform=src.transform,
latlon=crs.is_geographic,
cache=True,
)

for point in geo_df["geometry"].head():
x = point.x
y = point.y
print(x, y)
subbasins = flw.basins(xy=(x, y), streams=flw.stream_order() >= 4)

Many many thanks!

increase short river lengths

Subgrid river lengths can locally be very short which slows down the model. Potentially we can redistribute the subgrid river lengths within a single river branch to speed up calculations without making a large computational error.

method in flwdir.py

inputs: river length, min threshold

numba jitted method in streams.py

input: idxs_ds; idxs_seq (sorted from down- to upstream); confluences (based on number of upstream cells), river lengths, min threshold

suggested algorithm

from up- to downstream:
   while not at next confluence or less than n cells:
      get lengths & indices
   average lengths

assert that the total river length is the same

User guide instructions for plotting import error

Dear all,

thanks for this package. I am running conda 4.11 on Win10 and have installed pyflwdir as instructed. When following the user guide instructions for plotting I stumble upon below error.

Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:22:46) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from utils import quickplot, colors, cm
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'utils'

Any hints on how to alleviate the situation are highly welcome. Same counts for install under Ubuntu WSL.

Kind regards

Sebastian

use pyflwdir package to parse nextxy data of CaMa-Flood!

Dear @DirkEilander ,

This is Jiangchao Qiu, a Ph.D. student from Sun Yat-sen University, China. I am major in simulating storm surge using physical model.

I know you by reading your paper : The effect of surge on riverine flood hazard and impact in deltas globally. Great works!

Recently, my collaborator provides me some global modeling result using CaMa-Flood in 15 arcmin resolution. I am trying to analyze these result and make some visualization, especially the river network and river mouths along the coastlines.

Firstly, I used the function of cmf_maps_io.py (in your compound_hotpots repositories) to transform the nextxy.bin file to nextxy.tif file
glb_15min.zip

Secondly, I want to use the pydlwdir package to visualize the river network, but some errors occurred when I parse the nextxy.tif file to
the actionable common format.

the following is the screensnaps

I try to use the function of pyflwdir.FlwdirRaster and pyflwdir.from_array, but both failed, I don't know how to fix it, could you please help me check and give some suggestion about how to solve this problem.

By the way, the environment of my package seems no problem, since all of the example in the notebook folder can be carried out smothly.

Many thanks to you and looking forward to your reply.

Best regards,
Jiangchao

Subcatchment routing hierarchy by landuse

Dear all,

I am trying to resemble a functionality similar to GisToSWMM using pyflwdir. In my understanding of this algorithm a subcatchment hierarchy is determined considering landuse. Every (sub-)subcatchment contains only one landuse and routes either to another (sub-)subcatchment or an outlet. Outlets are given in the form of enlarged nodes, thus rasterized sink information derived from outlet nodes and DEM sinks.

Reading through pyflwdir documentation I figure that the Pfafstetter method allows for deriving hierarchical subcatchment information. If understood correctly it permits to figure out a form of subcatchment routing, thus which subcatchment drains to another subcatchment or outlet.

I am seeking ways to mask this routing method using landuse, so that every subcatchment contains only one landuse. Any ideas if and/or how to achieve this goal are highly appreciated.

Kind regards

Sebastian

fix version number in docs

No implementation of function Function(<built-in function heappush>) found for signature

I'm trying to get started with pyflwdir -- it looks very powerful. But the code in the quickstart at https://deltares.github.io/pyflwdir/latest/quickstart.html fails when executed in a newly created Python 3.11.6 virtualenv (on Linux, Debian sid, if that matters) after pip installing pyflwdir and rasterio:
(and sorry for the formatting -- I'm using "code <>" but github's markdown seems to be treating python interpreter output like an email message instead of code):

import rasterio
with rasterio.open("n35_w107_1arc_v3.tif", "r") as src:
... elevtn = src.read(1)
... nodata = src.nodata
... transform = src.transform
... crs = src.crs
...
import pyflwdir
flw = pyflwdir.from_dem(
... data=elevtn,
... nodata=src.nodata,
... transform=transform,
... latlon=crs.is_geographic,
... )
Traceback (most recent call last):
File "", line 1, in
File "/home/akkana/Data/demnames/dem-full-env/lib/python3.11/site-packages/pyflwdir/pyflwdir.py", line 97, in from_dem
d8 = dem.fill_depressions(
^^^^^^^^^^^^^^^^^^^^^
File "/home/akkana/Data/demnames/dem-full-env/lib/python3.11/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
error_rewrite(e, 'typing')
File "/home/akkana/Data/demnames/dem-full-env/lib/python3.11/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function() found for signature:

heappush(list(Tuple(int16, uint8, uint32, uint32))<iv=None>, Tuple(int64, uint8, uint32, uint32))

There are 2 candidate implementations:

Of which 2 did not match due to:
Overload in function 'heappush': File: numba/cpython/heapq.py: Line 150.
With argument(s): '(list(Tuple(int16, uint8, uint32, uint32))<iv=None>, Tuple(int64, uint8, uint32, uint32))':
Rejected as the implementation raised a specific error:
TypingError: heap type must be the same as item type
raised from /home/akkana/Data/demnames/dem-full-env/lib/python3.11/site-packages/numba/cpython/heapq.py:119

During: resolving callee type: Function()
During: typing of call at /home/akkana/Data/demnames/dem-full-env/lib/python3.11/site-packages/pyflwdir/dem.py (130)

File "dem-full-env/lib/python3.11/site-packages/pyflwdir/dem.py", line 130:
def fill_depressions(

if ~queued[r, c]: # add to queue
heapq.heappush(q, (z1, np.uint8(0), np.uint32(r), np.uint32(c)))
^
`

This error was mentioned in Issue #10 and there was a request to try a more recent numba, but that was in 2021 and I see this with the numba currently in pip.

The elevation data I'm using is SRTM downloaded four years ago -- maybe there's a newer format that pyflwdir is expecting?

support SAGA and GRASS D8 conventions

see
https://gis.stackexchange.com/questions/254939/legend-of-flow-direction-calculated-with-processing-toolbox-qgis-saga

document vector flow direction capabilities

FlwdirRaster.accuflux() segmentation faults with a raster ~ (63000, 80000) on machine with >1Tb RAM

pyFlwDir version checks

I have checked that this issue has not already been reported.
I have checked that this bug exists on the latest version of pyFlwDir.

Reproducible Example

import numpy as np
import rasterio as rio
import pyflwdir 

def accumulate_flow(
    flow_direction_filename,
    headwaters_filename
):

   with rio.open(flow_direction_filename) as src:
        data = src.read(1)
        nodata = src.nodata
        profile = src.profile

    temp = np.ndarray(shape=np.shape(data), dtype=np.uint8)

    temp[data == 1] = 1
    temp[data == 2] = 128
    temp[data == 3] = 64
    temp[data == 4] = 32
    temp[data == 5] = 16
    temp[data == 6] = 8
    temp[data == 7] = 4
    temp[data == 8] = 2
    temp[data == nodata] = 247

    flw = pyflwdir.from_array(temp, ftype='d8', check_ftype=False)

    del temp

    # Read the flow direction raster
    with rio.open(headwaters_filename) as src:
        headwaters = src.read(1)
        nodata = src.nodata

    flowaccum = flw.accuflux(headwaters, nodata=nodata, direction='up')

    ...

Current behaviour

In trying to accumulate flow (accuflux) on larger rasters generated from 1m LiDAR Data, Segmentation Faults are occurring.
The specifics:
1m LiDAR flow direction file & headwaters file (same size raster)

>>> np.shape(data)
(63708, 79780)

flow_direction_filename is 253M
head_waters_filename is 66M

Rasters generated from 3m LiDAR data will not seg fault, and process successfully.

>>> np.shape(data)
(21236, 26593)

flow_direction_filename is 107M
head_waters_filename is 7.4M

When the Python script is called from a shell script, an Exit Status of 139 is observed. Further debugging:

export PYTHONFAULTHANDLER=1 
root@f7f7e3af5d8b:/# python3 $srcDir/accumulate_headwaters.py -fd flowdir_d8_burned_filled_0.tif -wg headwaters_0.tif

Fatal Python error: Segmentation fault

Current thread 0x00007ff9300d8000 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 215 in order_cells
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/pyflwdir.py", line 272 in idxs_seq
  File "/usr/local/lib/python3.10/dist-packages/pyflwdir/flwdir.py", line 555 in accuflux
  File "/accumulate_headwaters.py", line 73 in accumulate_flow
  File "/accumulate_headwaters.py", line 110 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yaml._yaml, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, scipy._lib._ccallback_c, _cffi_backend, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, rasterio._version, rasterio._err, rasterio._filepath, rasterio._env, _brotli, rasterio._transform, rasterio._base, rasterio.crs, rasterio._features, rasterio._warp, rasterio._io (total: 98)
Segmentation fault (core dumped)

Naively, the source code was modified (call to order_cells) hardcoding method="sort". This did not work. As seen above, line 215 in order_cells, calls core.idxs_seq which appears to be the root of the problem. No further investigation has been made past this point.

Desired behaviour

Ideally larger rasters would process without segmentation faults. If not, the exception could potentially be handled a little more elegantly from python with a message stating that the raster is too big to process, or....

Another option might entail providing documentation/examples to users on how to split larger rasters into chunks, and then provide the tool/utility to join/concatenate 'blocked' or 'chunked' rasters back into a single pyflwdir.FlwdirRaster object once processing (whether it be accuflux or other FlwdirRaster methods) is finished.

Additional context

Memory usage was tracked, and it was observed that less than 1/3 of the available RAM was in use when the segmentation fault occurred.

river branch map

method to give unique ids to branches

method to get all upstream nodes for vector based flow directions

check minimal required numba version

we get errors in from_dem method with numba 0.51
see #6 for context

subgrid floodplain schematization

a method that maps subgrid cells to river cell (at model resolution) outflow points based on D8, sorts the HAND values (in future also manning?) and returns the contributing area for fixed HAND intervals

working with circular grids

Dear dev team,
first of all, thank you very much for your great work.
I would like to request the feature to be able to set up a FlwdirRaster that has a circular raster referenced. I apply your tools on a global grid defined in a lat-lon grid with lon ranging from -180 to 180 deg. The discontinuity at the boundary causes basins to be split. It would be great to have a way of defining the grid so that the longitutes are angular.
Thanks!

FlwdirRaster.stream crashed on a ~44000x20000 raster on a machine with 16 GB or less

pyFlwDir version checks

I have checked that this issue has not already been reported.
I have checked that this bug exists on the latest version of pyFlwDir.

Reproducible Example

import os.path
from datetime import datetime

import geopandas as gpd
import numpy as np
import rasterio
import pyflwdir
filename = r"c:\temp\hand\altaisk_krai_30m.tif"
min_sto = 6

start_time = datetime.now()
start_time_global = start_time
print(f"Read file: {filename}")
print(f"Start time: {start_time:%Y-%m-%d %H:%M:%S}")
with rasterio.open(filename, "r") as src:
    elevtn = src.read(1)
    nodata = src.nodata
    transform = src.transform
    crs = src.crs
    extent = np.array(src.bounds)[[0, 2, 1, 3]]
    latlon = src.crs.is_geographic
    prof = src.profile
end_time = datetime.now()
print(f"End time: {end_time:%Y-%m-%d %H:%M:%S}")
print(f"Duration: {end_time - start_time}")

start_time = datetime.now()
print(f"Calculate flow directions")
print(f"Start time: {start_time:%Y-%m-%d %H:%M:%S}")

flw = pyflwdir.from_dem(
    data=elevtn,
    nodata=src.nodata,
    transform=transform,
    latlon=latlon,
    outlets="edge",
)

print(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Duration: {datetime.now() - start_time}")

start_time = datetime.now()
print(f"Calculate stream network")
print(f"Start time: {datetime.now():%Y-%m-%d %H:%M:%S}")

feats = flw.streams(min_sto=4)

print(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Duration: {datetime.now() - start_time}")

start_time = datetime.now()
print(f"Vectorize stream network")
print(f"Start time: {datetime.now():%Y-%m-%d %H:%M:%S}")
gdf = gpd.GeoDataFrame.from_features(feats, crs=crs)
print(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Duration: {datetime.now() - start_time}")

start_time = datetime.now()
print(f"Write stream network to file")
print(f"Start time: {datetime.now():%Y-%m-%d %H:%M:%S}")
gdf.to_file(os.path.splitext(filename)[0] + '.shp')
print(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
print(f"Duration: {datetime.now() - start_time}")

print(f"Total duration: {datetime.now() - start_time_global}")

Current behaviour

You cannot use rasters larger than ~44000x20000 on a computer with 16gb of memory or less

Crashed here on this line of my script

feats = flw.streams(min_sto=4)

pyflwdir.pyflwdir.FlwdirRaster.streams

pyflwdir/pyflwdir.py:977

...
        if xs is None or ys is None:
            idxs0 = np.arange(self.size, dtype=np.intp)
->            xs, ys = gis.idxs_to_coords(idxs0, self.transform, self.shape)
...

pyflwdir/gis_utils.py:222

...
->    xs, ys = transform * transform.translation(coff, roff) * (cols, rows)
...

Desired behaviour

Do not create the whole array with pixel coordinates at once, but only those that are necessary to form geometry in pyflwdir.gis_utils.features

Additional context

I tried to specify the coordinate accuracy to float32

pyflwdir/gis_utils.py:207

pyflwdir.gis_utils.xy

rows, cols = np.asarray(rows, dtype=np.float32), np.asarray(cols, dtype=np.float32)

But I also got the error

I tried to specify the coordinate accuracy to float16

pyflwdir/gis_utils.py:207

pyflwdir.gis_utils.xy

rows, cols = np.asarray(rows, dtype=np.float16), np.asarray(cols, dtype=np.float16)

I was able to do the calculations, but the accuracy was very poor

Failed read MERIT flowdir

ValueError: The flow direction data with type "d8" is invalid.

Helper function to get coordinates from linear indices

Feature request: This may be in the code but I haven't see it in the docs anywhere. The flow path trace function returns a list of pixel indices in the linear index format. A helper function to convert to the coordinate system of the raster would be helpful.

deal with missing nodata values in from_dem method

Numba error creating flow direction raster

I think I have the error narrowed down to how the JIT compiler views either the rasterio object or the parameters sent to pyflwdir.from_dem() function. I have a simple example based on the Quickstart from the documentation. The error occurs when i try to create a flow direction raster from elevation read through rasterio.
https://colab.research.google.com/drive/1engswQ4uaKB0UMMM4zt8YzC8wEAnK1o1?usp=sharing

deltares / pyflwdir Goto Github PK

pyflwdir's Introduction

PyFlwDir: Fast methods to work with hydro- and topography data in pure Python

Intro

Featured methods

Installation

Quickstart

Reference API

Development and Testing

pyflwdir's People

Contributors

Stargazers

Watchers

Forkers

pyflwdir's Issues

pyFlwDir version checks

Reproducible Example

Current behaviour

Desired behaviour

Additional context

pyFlwDir version checks

Reproducible Example

Current behaviour

Desired behaviour

Additional context

Recommend Projects

Recommend Topics

Recommend Org