noaa-owp / hydrofabric Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 6.0 364.22 MB

hydrofabric meta-package

Home Page: https://noaa-owp.github.io/hydrofabric/

License: MIT License

R 64.91% Dockerfile 0.74% QML 34.21% Makefile 0.13%

hydrofabric's People

Contributors

Stargazers

Watchers

Forkers

joshuafu-noaa program-- pseudoszechwaniens anguswg-ucsb davecasson

hydrofabric's Issues

Emergent Errors in Refactor

VPU 01

1

Mainstem 1964010 - a single flowpath is made up of three pieces. Two of these (10040781 and 10040780) are represented by duplicated divides shown in red.

Fix:

Merge flowlines AND divides 10040781, 10040780, and 10012125 make the toID 10012500
10012415 --> 10012226
10012226 --> 10012125

2

Mainstem 1970573. A single flowpath crosses two divides with an extra dangler hanging out (ID 10016831)

Remove: flowpath 10016831
Merge: catchments 10016831 and 10016708, keep ID 10016708

3

Mainstem 2611598. There is a disjoin flowpath divide pair directly upstream of a POI (Gage 01017290)

Fix:

Merge flowline and divide 10040738, 10040737. Retain ID 10040737 and flow to 10002480

3

Remove Flowpath: 10003899

Change End Node of 10003900 to the start node of 10004000

Topo change

10003900 --> 10004000
10003943 --> 10003900

NextGen Hydrolocation Inventory

To support v3.0 of the NextGen hydrofabric, we need to build a more robust method for hydrolocation identification. Below is the (evolving) notion of this process:

graph LR
  subgraph Hydrolocation Inventory
    direction LR
    GFv20_POIs --> conus_hl.gpkg
    coastal_gages --> coastal_POIs
    coastal_domain --> coastal_POIs
    coastal_POIs --> conus_hl.gpkg
    nwm_rl --> nwm_POIs
    nwm_lakes -->nwm_POIs
    nwm_reservoirs --> nwm_POIs
    calibration_gages --> nwm_POIs
    nwm_POIs --> conus_hl.gpkg
    nws_lid --> fim_POIs
    fim_POIs --> conus_hl.gpkg
    reference_fabric --> conus_hl.gpkg
  end

lakes and segments share similar IDs

Here is a list of segments and lakes that share common IDs. This is an issue for inland routing (t-route) when we create our network of connections. Common IDs confuse the 'toid' nature of the network, routing flows to incorrect places.

717696
1311881
3133581
1010832
1023120
1813525
1531545
1304859
1320604
1233435
11816
1312051
2723765
2613174
846266
1304891
1233595
1996602
2822462
2384576
1021504
2360642
1326659
1826754
572364
1336910
1332558
1023054
3133527
3053788
3101661
2043487
3056866
1296744
1233515
2045165
1230577
1010164
1031669
1291638
1637751

Incorrect NWM streamline

In the image below (HUC 08040102), the 6th order blue streamline (also identified as mainstem) leaves the main channel and follows a side loop while the red segment on the main channel is a first order, non-mainstem stream

VPU 03w, 03s, and 03n behave differently than the rest

Working on additional definition here. Essentially, there appears to be a different spatial reference for the one VPU that is split compared to the other VPUs.

This is coming up in the process of using the geopackage-defined "divides" polygons as bounds for an interpolation from current NWM forcing files.

steps to reproduce:

wget -P 03n https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_03N.gpkg
wget -P 02 https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_02.gpkg
wget -P data https://noaa-nwm-retrospective-2-1-pds.s3.amazonaws.com/forcing/1980/198012310800.LDASIN_DOMAIN1

import geopandas as gpd
import xarray as xr
import rasterio
from rasterstats import zonal_stats

g2_d = gpd.read_file("02/nextgen_02.gpkg", layers="divides")
g3n_d = gpd.read_file("03n/nextgen_03N.gpkg", layers="divides")
xds = xr.open_dataset("data/198012310800.LDASIN_DOMAIN1", engine="rasterio")
src = xds["U2D"]
aff = src.rio.transform()
arr = src.values[0]

zonal_stats(g3n_d, arr, affine=aff)  # Works -- lots of warnings
zonal_stats(g2_d, arr, affine=aff)  # produces pure Nans -- no overlap

Waterbodies missing lake parameter info

In the conus.gpkg, there are lakes missing lake parameters such as 'LkArea', 'LkMxE', 'OrificeA', etc.

It appears these waterbodies are new, meaning they are not in the NHDNetwork. Will these parameters be added at some point?

Here is a list of the waterbody IDs ('hl_link' in the lakes table):
2277833, 22302965, 120053727, 18421865, 167679168, 167679192, 167679193, 166997626, 167114801, 120049055, 120053465, 167245953, 167297531, 30000600551311, 30000600524767, 5778809, 30000400156418, 10038590, 120051957, 41000400070122, 24381227, 23794331, 8009481, 120054083, 120054085

Missing info from reservoir_index_*_NWMv2.1.nc files

Prior to v4, t-route used reservoir_index* files to get information linking lake IDs to gage IDs and to determine what time of reservoir any given waterbody was (1=Levelpool, 2=USGS, 3=USACE, 4=RFC, 5=Glacially Dammed Waterbody). Could this information be included in the hydrofabric? USGS and USACE gage data is available in the 'network' table, but they are attached to segment IDs rather than waterbodies, and there are instances where a single segment has multiple gages. RFC gages are missing completely.

Create lake feature data

Need to create some lake feature data. Initially this can be a small test set.

Suggested change for Reference Topology 07

Where does it occur: VPU 10U right on the Idaho border. Mainstem ID: 1284412

All images come from the refactor_07.gpkg

The Problem:

The red line (ID 10049093) flow to ID: 10049122. The U shaped line (10047655) also flows to 10049122. The double entry to the U Shaped line causes trouble when aggregating.

Solution:

Remove flowpath 10049093
Merge divides 10049093 and 10047655
Keep ID 10047655, with the current toID of 10049122

Break in subset: error

library(hydrofabric)
#> ── Attaching packages ───────────────────────────────────── hydrofabric 0.0.9 ──
#> ✔ dplyr         1.1.4      ✔ hydrofab      0.5.0 
#> ✔ ngen.hydrofab 0.0.3      ✔ zonal         0.0.2 
#> ✔ climateR      0.3.3      ✔ sf            1.0.15
#> ✔ nhdplusTools  1.0.1      ✔ terra         1.7.67
#> Warning: package 'terra' was built under R version 4.2.3
#> ── Conflicts ──────────────────────────────────────── hydrofabric_conflicts() ──
#> ✖ terra::plot() masks climateR::plot()
#> 
#> Attaching package: 'hydrofabric'
#> The following object is masked _by_ 'package:hydrofab':
#> 
#>     hf_dm
x. = subset_network(id = 'cat-113060')
#> Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'unique': ℹ In argument: `hf_id == comid`.
#> Caused by error:
#> ! `..1` must be of size 3123543 or 1, not size 0.

^{Created on 2024-02-15 with reprex v2.0.2}

hydrofabrics derived attributes for developing CONUS 3D channel ML model

A list of variables needed to develop the ML model for CONUS using ClimateR, Zonal, and Hydrofabrics.

Variable name	Aggregation	Description	Reduction	Source	Literature
length_divide	Divide	Length of fowline feature	NA	Reference Fabric	link
length_catchment	Catchment	Length of fowline feature	Sum	Reference Fabric	link
area_divide	Divide	Feature area	NA	Reference Fabric	link
area_catchment	Catchment	Feature area	Sum	Reference Fabric	link
arbolatesu_divide	Divide	the sum of the lengths of all digitized flowlines upstream from the downstream end of the immediate flowline	NA	Reference Fabric	link
arbolatesu_catchment	Catchment	the sum of the lengths of all digitized flowlines upstream	Sum	Reference Fabric	link
pathlength_divide	Divide	The distance from the bottom of a flowline to the bottom of the terminal flowline along the main path	NA	Reference Fabric	link
pathlength_catchment	Catchment	The distance from the bottom of a flowline to the bottom of the terminal flowline along the main path	Sum	Reference Fabric	link
streamorde_divide	Divide	Modified Strahler stream order	NA	Reference Fabric	link
streamleve_divide	Divide	Stream level	NA	Reference Fabric	link
slope_divide	Divide	Slope of flowline	NA	Reference Fabric/DEM	link
slope_catchment	Catchment	Slope of flowline	Ave	Reference Fabric/DEM	link
roughness_divide	Divide	roughness	NA	Reference Fabric	link
roughness_catchment	Catchment	roughness	Ave	Reference Fabric	link
elevation_divide	Divide	elevation	Ave	DEM	link
elevation_catchment	Catchment	elevation	Ave	DEM	link
aspect_ave_divide	Divide	Average aspect	Ave	DEM	link
aspect_ave_catchment	Catchment	Average aspect	Ave	DEM	link
flow_acc_sum_divide	Divide	Flow accumulation sum	Sum	DEM	link
flow_acc_sum_catchment	Catchment	Flow accumulation sum	Sum	DEM	link
flow_dir_ave_divide	Divide	Flow direction average	Ave	DEM	link
flow_dir_ave_catchment	Catchment	Flow direction average	Ave	DEM	link
clay_divide	Divide	Average % clay	Ave	POLARIS	link
clay_catchment	Catchment	Average % clay	Ave	POLARIS	link
sand_divide	Divide	Average % sand	Ave	POLARIS	link
sand_catchment	Catchment	Average % sand	Ave	POLARIS	link
silt_divide	Divide	Average % silt	Ave	POLARIS	link
silt_catchment	Catchment	Average % silt	Ave	POLARIS	link
bd_divide	Divide	Average soil bulk density, (g cm-3)	Ave	POLARIS	link
bd_catchment	Catchment	Average soil bulk density, (g cm-3)	Ave	POLARIS	link
ksat_divide	Divide	Average effective saturated hydraulic conductivity, (cm hr-1)	Ave	POLARIS	link
ksat_catchment	Catchment	Average effective saturated hydraulic conductivity,	Ave	POLARIS	link
om_divide	Divide	Average organic matter content, (%)	Ave	POLARIS	link
om_catchment	Catchment	Average organic matter content, (%)	Ave	POLARIS	link
ph_mean_divide	Divide	Average soil PH	Ave	POLARIS	link
ph_mean_catchment	Catchment	Average soil PH	Ave	POLARIS	link
theta_r_divide	Divide	Average residual soil water content, (cm3 cm-3)	Ave	POLARIS	link
theta_r_catchment	Catchment	Average residual soil water content, (cm3 cm-3)	Ave	POLARIS	link
theta_s_divide	Divide	Average saturated soil water content, (cm3 cm-3)	Ave	POLARIS	link
theta_s_catchment	Catchment	Average saturated soil water content, (cm3 cm-3)	Ave	POLARIS	link
hb_mean_divide	Divide	Average Brooks-Corey parameter related to the air-entry pressure (cm)	Ave	POLARIS	link
hb_mean_catchment	Catchment	Average Brooks-Corey parameter related to the air-entry pressure (cm)	Ave	POLARIS	link
lambda_mean_divide	Divide	Average Brooks-Corey parameter the pore size distribution index, (dimensionless)	Ave	POLARIS	link
lambda_mean_catchment	Catchment	Average Brooks-Corey parameter the pore size distribution index, (dimensionless)	Ave	POLARIS	link
n_mean_divide	Divide	Average empirical shape-defining parameters in the van Genuchten equation, (dimensionless)	Ave	POLARIS	link
n_mean_catchment	Catchment	Average empirical shape-defining parameters in the van Genuchten equation, (dimensionless)	Ave	POLARIS	link
alpha_mean_divide	Divide	Average parameter of the van Genuchten equation corresponding approximately to the inverse of the air-entry value, (cm-1)	Ave	POLARIS	link
alpha_mean_catchment	Catchment	Average parameter of the van Genuchten equation corresponding approximately to the inverse of the air-entry value, (cm-1)	Ave	POLARIS	link
LAI_ave_divide	Divide	Average Leaf Area Index	Ave	MODIS	link
LAI_ave_catchment	Catchment	Average Leaf Area Index	Ave	MODIS	link
LAI_min_divide	Divide	Sum of areas with Leaf Area Index <= 5	Sum	MODIS	link
LAI_min_catchment	Catchment	Sum of areas with Leaf Area Index <= 5	Sum	MODIS	link
LAI_max_divide	Divide	Sum of areas with Leaf Area Index >= 15	Sum	MODIS	link
LAI_max_catchment	Catchment	Sum of areas with Leaf Area Index >= 15	Sum	MODIS	link
NDVI_ave_divide	Divide	Average Normalized difference vegetation index	Ave	MODIS	link
NDVI_ave_catchment	Catchment	Average Normalized difference vegetation index	Ave	MODIS	link
NDVI_min_divide	Divide	Sum of areas with Normalized difference vegetation index <= 0.2	Sum	MODIS	link
NDVI_min_catchment	Catchment	Sum of areas with Normalized difference vegetation index <= 0.2	Sum	MODIS	link
NDVI_max_divide	Divide	Sum of areas with Normalized difference vegetation index > 0.2	Sum	MODIS	link
NDVI_max_catchment	Catchment	Sum of areas with Normalized difference vegetation index <= 0.2	Sum	MODIS	link
humid_divide	Divide	Average specific_humidity	Ave	NLDAS	link
humid_catchment	Catchment	Average specific_humidity	Ave	NLDAS	link
temperature_ave_divide	Divide	Average temperature	Ave	NLDAS	link
temperature_ave_catchment	Catchment	Average temperature	Ave	NLDAS	link
temperature_min_divide	Divide	Minimum temperature	Min	NLDAS	link
temperature_min_catchment	Catchment	Minimum temperature	Min	NLDAS	link
wind_u_divide	Divide	Average U wind component at 10 meters above the surface	Ave	NLDAS	link
wind_u_catchment	Catchment	Average U wind component at 10 meters above the surface	Ave	NLDAS	link
wind_v_divide	Divide	Average V wind component at 10 meters above the surface	Ave	NLDAS	link
wind_v_catchment	Catchment	Average V wind component at 10 meters above the surface	Ave	NLDAS	link
SoilMoi0_10cm_inst_min_divide	Divide	Minimum soil moister	Min	GLDAS	link
SoilMoi0_10cm_inst_min_catchment	Catchment	Minimum soil moister	Min	GLDAS	link
SoilMoi0_10cm_inst_ave_divide	Divide	average soil moister	Ave	GLDAS	link
SoilMoi0_10cm_inst_ave_catchment	Catchment	average soil moister	Ave	GLDAS	link
SoilMoi0_10cm_inst_max_divide	Divide	maximum soil moister	Max	GLDAS	link
SoilMoi0_10cm_inst_max_catchment	Catchment	maximum soil moister	Max	GLDAS	link
SoilTMP0_10cm_inst_min_divide	Divide	Area of soil temperature less than or qual to zero	Sum	GLDAS	link
SoilTMP0_10cm_inst_min_divide	Catchment	Area of soil temperature less than or qual to zero	Sum	GLDAS	link
SoilTMP0_10cm_inst_ave_divide	Divide	Average soil temperature	Ave	GLDAS	link
SoilTMP0_10cm_inst_ave_divide	Catchment	Average soil temperature	Ave	GLDAS	link
ESoil_tavg_divide	Divide	Average Direct evaporation from bare soil	Ave	GLDAS	link
ESoil_tavg_catchment	Catchment	Average Direct evaporation from bare soil	Ave	GLDAS	link
Evap_tavg_divide	Divide	Average Evapotranspiration	Ave	GLDAS	link
Evap_tavg_catchment	Catchment	Average Evapotranspiration	Ave	GLDAS	link
Qs_acc_ave_divide	Divide	Average Storm surface runoff	Ave	GLDAS	link
Qs_acc_ave_catchment	Catchment	Average Storm surface runoff	Ave	GLDAS	link
Qs_acc_sum_divide	Divide	Sum Storm surface runoff	Sum	GLDAS	link
Qs_acc_sum_catchment	Catchment	Sum Storm surface runoff	Sum	GLDAS	link
Qsb_acc_ave_divide	Divide	Average Baseflow- Ave groundwater runoff	Ave	GLDAS	link
Qsb_acc_ave_catchment	Catchment	Average Baseflow-groundwater runoff	Ave	GLDAS	link
Qsb_acc_sum_divide	Divide	Sum Baseflow-groundwater runoff	Sum	GLDAS	link
Qsb_acc_sum_catchment	Catchment	Sum Baseflow-groundwater runoff	Sum	GLDAS	link
Qsm_acc_ave_divide	Divide	Average Snow melt	Ave	GLDAS	link
Qsm_acc_ave_catchment	Catchment	Average Snow melt	Ave	GLDAS	link
Qsm_acc_sum_divide	Divide	Sum Snow melt	Sum	GLDAS	link
Qsm_acc_sum_catchment	Catchment	Sum Snow melt	Sum	GLDAS	link
Snowf_tavg_ave_divide	Divide	Average Snow precipitation rate	Ave	GLDAS	link
Snowf_tavg_ave_catchment	Catchment	Average Snow precipitation rate	Ave	GLDAS	link
Snowf_tavg_max_divide	Divide	Max Snow precipitation rate	Max	GLDAS	link
Snowf_tavg_max_catchment	Catchment	Max Snow precipitation rate	Max	GLDAS	link
SnowDepth_inst_ave_divide	Divide	Average Snow depth	Ave	GLDAS	link
SnowDepth_inst_ave_catchment	Catchment	Average Snow depth	Ave	GLDAS	link
SnowDepth_inst_max_divide	Divide	Max Snow depth	Max	GLDAS	link

conus.gpkg v2.01 is missing lake attribute table

Need to align ref_mainstem with lp_mainstem

Recent changes in geoconnex now use a persistent ref_mainstem_id, and a reference fabric based levelpath lp_mainstem. Previously, it was expected that the lp_mainstem would be 1:1 with the ref_mainstem...

This is actually a very good thing for our processing! Now it allows us to retain the lp_mainstem as the mainstem ID, and we can add a ref_mainstem_uri that point to the proper persistent geoconnex feature.

Overall, it better disambiguates the persistent identification system, and the reference fabric based processing

segments 'toid' pointing to an upstream nexus

Looking at conus.gpkg, segments wb-1578432 and wb-1578518 seem to point to a nexus point upstream (see image below)

I believe they should be pointing to nex-1577901 just like wb-1577900.

Need to preserve outlet/teminal ID through NHD workflow

In order to use NHD aggregate outlets in the hyRefactor workflow we need to preserve the terminal id/type through the aggregation. This can be done within each step, or, can be executed at the end on the identities of the the member_COMIDs.

Will need to test both...

Grant contributor access to hydrofabric team

Dealing with persistent "short" paths.

@shorvath-noaa raised some issues with flow paths shorter then our allowed 1km. Currently these are expected (at least all that I have looked at). An example is here:

The upper POI is a HUC12 outlet, while the lower one is a gage. These distinct features are only .4 km apart, and since they are enforced in the network manipulations, they are persistent.

The options for dealing with these are:

Keep as is
Set a priority for exclusion (e.g. a gage takes precedent over a HUC12 outlet, so in this case the HUC12 is droped) - I do not like this approach personally
Collapse them into the downstream POI and associate them to the same hl_id.
Others?

@NOAA-OWP/hydrofabric have any thoughts?

Suggestion: Add version, license, and other relevant metadata to `gpkg_metadata` table

It would really nice if version and other metadata information traveled with a hydrofabric asset. This is a big can of worms over what to include, but at a minimum, it would be great to know the HF version and a copy of the license. Im sure this will not play nicely with all HF distribution formats (e.g. geoparquet?), but it looks like this is possible with the gpkg_metadata table in the geopackage spec.

Add VPU to divides layer.

I'm suggesting we add an optional VPU ID to divides layer DM. If we are advocating for a divides view in NextGen, the segmentation should be available w/o moving into the flowpaths. @program-- ?

Land cover and soil classification using generic categories

Current behavior

The hydrofabric determines the land cover classification for a basin by using the mode of land cover categories within its boundaries. Sometimes this leads to conflicts with the soil type, particularly if water is involved.

The below screenshot, from @mikejohnson51, shows an area in which the soil (ISLTYP = 14) is classified as water, while the land cover (IVGTYP = 15) is classified as mixed forest.

This caused runs of Noah-OWP-Modular to fail because of inconsistencies in the soil and land cover categories: NOAA-OWP/DMOD#472 (comment)

Expected behavior

The hydrofabric should return self-consistent values of soil and land cover.

Suggested changes

Add new, lumped generic categories for soil and land cover. I provide suggestions below:

Soil generic categories

From the STAS category in Noah-OWP-Modular:

land = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 17, 18, 19)
water = 14
ice = 16

Land cover generic categories

From the USGS category in Noah-OWP-Modular:

urban = 1
agricultural = c(2, 3, 4, 5, 6)
grassland = c(7, 8, 9, 10)
forest = c(11, 12, 13, 14, 15)
water = 16
wetland = c(17, 18)
barren = c(19, 25, 26, 27)
tundra = c(20, 21, 22, 23)
ice = 24

Code changes

Implement a two-step classification option that:

Provides generic category mapping as shown above
Computes the mode of all generic land cover (e.g., forest, water, urban, agriculture) and soil categories in the basin
Then computes the mode (using the specific land cover and soil categories) of the dominant generic categories from step 2
Then checks for internal consistency between soil and land cover categories. The most important check is that if soil == land then the land cover corresponds to land_cover %in% c(urban, agricultural, grassland, forest, wetland, barren, tundra) or soil == water and land_cover == water or soil == ice and land_cover == ice

Misplaced 1st order streams along 7th order main flow paths

@mluck @JamesColl-NOAA @CarsonPruitt-NOAA @RyanSpies-NOAA It seems like I may be overlooking something, and there's a chance I don't have a complete grasp but during my testing of the machine learning-derived volume on HAND data, . While navigating through the primary stem rivers, which are predominantly 7th order streams, I noticed instances where lower-ordered streams, often 1st order, are in place of 7order streams that throws of actual flood extent due to significantly different discharge values from the National Water Model. discharge. Here are two examples:

First:
HydroID: 12970099 (dark green catchment in image) mapping to feature_id: 2240709 which is a 1st order stream
whereas looking at reference fabric catchments (dark blue boundaries in image) and flow lines this HydroID should be associated with a 7th order stream (feature_id: 2242125)

Second:
HydroID: 12970044 mapping to feature_id: 2242309 which is a 2nd order stream
whereas looking at reference fabric catchments and flow lines this HydroID should be associated with a 7th order stream (feature_id: 2248065)

NWM v2.1 flowline issue on Carson River

The FIM dev team has noted a potential issue with the NWM flow line in HUC 16050201.

Issue: The downstream flow routing deviates off of the main channel of the Carson River and instead follows the Mexican Ditch flow line (located to the west of the mainstem Carson River). This is evident in tracking the stream order attribute (see attached screenshot)

Where: The NWM 2.1 flow line on the Carson River near Carson City, NV. Featureids: 11432977, 11432991, 11432989

Impacts: flow routing is incorrectly appropriating Carson River flow values to the smaller ditch stream line; FIM for the mainstem Carson River is severely underestimated.

Reference feature characteristics needed for bankfull width and depth ML model

Variable name	Aggregation	Description	Reduction	Source	Literature
length_catchment	Catchment	Length of fowline feature	Sum	Reference Fabric	link
area_catchment	Catchment	Feature area	Sum	Reference Fabric	link
arbolatesu_catchment	Catchment	the sum of the lengths of all digitized flowlines upstream	Sum	Reference Fabric	link
pathlength_catchment	Catchment	The distance from the bottom of a flowline to the bottom of the terminal flowline along the main path	Sum	Reference Fabric	link
streamorde_divide	Divide	Modified Strahler stream order	NA	Reference Fabric	link
streamleve_divide	Divide	Stream level	NA	Reference Fabric	link
slope_divide	Divide	Slope of flowline	NA	Reference Fabric/DEM	link
roughness_divide	Divide	roughness	NA	Reference Fabric	link

Suggested change for Reference Topology 01

Problem area: Mainstem 2111164 in VPU 01

Currently a tiny little sliver is a single mainstem

Change: 10018008 Mainstem ID to 2111164

Bad digitization leads to bad topology

Redigitize: 10018009

Change IDs

10040692 --> 10018009
10018009 --> 10018008

subset_reference function not exported to hydrofabric namespace

I've been trying to run some of the network manipulation workflow (https://noaa-owp.github.io/hydrofabric/articles/03-processing-deep-dive.html) and ran into an issue with the subset_reference function.

After attaching the hydrofabric package to my R session, the subset_reference function is not available as an auto-complete and when I call it directly I get the error:

Error in subset_reference() : could not find function "subset_reference"

Digging into the package namespace, I see that the subset_reference function is not exposed.

# Generated by roxygen2: do not edit by hand

S3method(print,hydrofabric_conflicts)
export(hydrofabric_conflicts)
export(hydrofabric_packages)
export(subset_network)
import(climateR)
import(dplyr, except = c(intersect, union))
import(glue, except = trim)
import(hydrofab)
import(ngen.hydrofab)
import(nhdplusTools)
import(sf)
import(terra)
import(zonal)
importFrom(DBI,dbConnect)
importFrom(DBI,dbDisconnect)
importFrom(RSQLite,SQLite)
importFrom(arrow,open_dataset)
importFrom(magrittr,"%>%")
importFrom(purrr,keep)
importFrom(purrr,map)

My workaround has been to copy the function at line 13 of subset_network.R (https://github.com/NOAA-OWP/hydrofabric/blob/main/R/subset_network.R) and paste into my R script. This seems to be working well for now. Can you add subset_reference as an export to this package, or is there something I'm doing wrong here?

Thanks!

How do I integrate it with others?

I have two issues.
Issue 1. I tentatively build without Docker container. The number of depending R packages grows and I give up.
Issue 2: I build with Dockerfile. The build is successful. But, I do not see entry point (normally CMD).
How do I runt it?
How do I integrate it with other components, such as ngen?
What types of knowledge am I lack of?
I appreciate helps.

Some catchments/divides have incorrectly formatted divide_id

The divides layer of the current nextgen_02.gpkg file (MD5: a4cd50cd666f4bb177e7671f253a3393) has a row with divide_id of cat-7e+05 (index 33534). No other rows in this file appear to use this irregular format, though I haven't checked all the other hydrofabric files.

Some waterbodies are missing WBOut node

In the conus.gpkg, there are waterbodies that have one or more WBIn nodes, but no WBOut node.

Here is a list of waterbodies (lake_ids, or 'hl_link' in the lakes attribute table) that have a WBIn node, but no WBOut node:

1320604
4415746
7924833
18714340
20152469
20318000
23016866
23704173
24052303
120049589
120051936
120052233
120054084
166759841
167484062

Unable to call subset_reference() function

Scenario:

Yesterday, I created a Docker image based on rocker/geospatial for using the hydrofabric package.

Then I installed the package in a container from this image with:

$ installGithub.r NOAA-OWP/hydrofabric

When I tried to run the script:

library(hydrofabric)

nldi_feature = list(featureSource = "nwis", featureID = "05102490")

subset_reference( nldi_feature = nldi_feature,  gpkg = "...", export_gpkg = "...")

I got Error in subset_reference(...) : could not find function "subset_reference" .

It seems that the function subset_reference is not reachable from outside the package, as the output of lsf.str("package:hydrofabric") only lists the function subset_network:

hydrofabric_conflicts : function ()  
hydrofabric_packages : function (include_self = TRUE)  
subset_network : function (id = NULL, comid = NULL, hl_id = NULL, network = "data/conus_net.parquet", 
    pattern = "/Volumes/Transcend/ngen/CONUS-hydrofabric/05_nextgen/nextgen_{vpu}.gpkg", 
    lyrs = c("divides", "nexus", "flowpaths", "network", "hydrolocations"), 
    export_gpkg = NULL)

I was able to run my script by copy+pasting the whole subset_reference function from the source into my script prior to the function call.

Question:

Is it expected?
Maybe the package is missing some "export" for the subset_reference function?
Maybe I am doing something wrong as I am learning the own R language as I learn to use the package itself?

Misindexed POI (Gages-06719505)

Somewhere in the hydrolocation creation, NWIS gage 06719505 has been assigned to 2 flowpaths in different VPUs:

ATTN: @joshsturtevant

> filter(l, hl_reference == "Gages") %>% 
+   filter(hl_link == '06719505')
Simple feature collection with 2 features and 8 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -105.2193 ymin: 39.75805 xmax: -103.2906 ymax: 41.99998
Geodetic CRS:  WGS 84
# A tibble: 2 × 9
  hl_link  hl_reference    hf_id       ID     X     Y VPUID hl_id                 geom
* <chr>    <chr>           <dbl>    <dbl> <dbl> <dbl> <chr> <int>          <POINT [°]>
1 06719505 Gages         2885178 10024633 -105.  39.8 10L   46845 (-105.2193 39.75805)
2 06719505 Gages        16057156 10169551 -105.  39.8 10U   56858 (-103.2906 41.99998)

lake_id 1710676 missing in lake table

Lake ID 1710676 is missing in the "lake" table of the conus.gpkg. It is listed as the waterbody_id for several segments in the "flowpath_attributes" table, and exists in the NWMv2.1 LAKEPARM file, but it looks like it was incorrectly given the ID 1711354 in the "lake" table based on the WBIn and WBOut nexus points. This lake (1711354), however, is missing all lake parameters in the "lake" table.

Waterbody ID 5569731 has two outlets

In the latest hydrofabric, v20.1, waterbody ID 5569731 has two outlets, segment 2409607 and 2409629. This confuses t-route as t-route relies on connections always having only 1 downstream segment. vpu: 12

Catchment Weights

Catchment weights (the indices within the lat/lon divides of a particular catchment relative to some grid) are needed for extracting forcing data. The method of calculating these weights can vary and yield different weights depending on methods used See issue 28 in hfsubset. Also, these weights may change with each hydrofabric release. To help ensure catchment weights are consistent for a particular hydrofabric and grid, I suggest generating conus weight files for common grids(projections) that can be subsetted via a tool like hfsubset.

For example, NWM v3 uses a lambert conformal conic projection.

https://noaa-nwm-pds.s3.amazonaws.com/nwm.20240112/forcing_medium_range/nwm.t00z.medium_range.forcing.f001.conus.nc

>>> nwm_data.crs.esri_pe_string
'PROJCS["Lambert_Conformal_Conic",GEOGCS["GCS_Sphere",DATUM["D_Sphere",SPHEROID["Sphere",6370000.0,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["false_easting",0.0],PARAMETER["false_northing",0.0],PARAMETER["central_meridian",-97.0],PARAMETER["standard_parallel_1",30.0],PARAMETER["standard_parallel_2",60.0],PARAMETER["latitude_of_origin",40.0],UNIT["Meter",1.0]];-35691800 -29075200 10000;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision'

The projection type and hydrofabric can be stored as fields in the weights.json. This way forcings engines could check that the weight projection is appropriate for the nwm file being processed. Information about the grid should be stored in the json as well (dx, dy, nx, ny, x0, y0).

This will place more responsibility on the hydrofabric, but might be worth it in the long run to avoid weight generation algorithms yielding different weights for the same hydrofabric and grid.

Waterbody outlet is disconnected from waterbody

Looking at file nextgen_03S.gpkg, waterbody (hl_link) 16769792 has two WBIn nodes and one WBOut node. However, the WBOut node is not connected either of the WBIn nodes (see image below).

Obtaining waterbody connections from the NWMv2.1 route_link file and comparing them to the wb-IDs in the 'network' table yields the following crosswalk table:

NHDSegmentID	lake_id	link	hydroseq
16770354	16769792	wb-1578901	953.0
16770384	16769792	wb-1578901	953.0
16770780	16769792	wb-1578904	949.0
16770782	16769792	wb-1578903	950.0
16770784	16769792	wb-1578910	954.0
16770786	16769792	wb-1578903	950.0
16770788	16769792	wb-1578901	953.0
16770790	16769792	wb-1578907	956.0
16770792	16769792	wb-1578907	956.0
16770794	16769792	wb-1578901	953.0
16770796	16769792	wb-1578902	957.0
16771128	16769792	wb-1578901	953.0
16771132	16769792	wb-1578901	953.0
16771136	16769792	wb-1578905	952.0
16771138	16769792	wb-1578905	952.0
16771140	16769792	wb-1578905	952.0
16771142	16769792	wb-1578901	953.0
16771144	16769792	wb-1578901	953.0
16771146	16769792	wb-1578903	950.0
16771148	16769792	wb-1578903	950.0
16771176	16769792	wb-1578901	953.0
16771178	16769792	wb-1578901	953.0
16771184	16769792	wb-1578901	953.0
16771186	16769792	wb-1578911	951.0
16771190	16769792	wb-1578904	949.0
16771192	16769792	wb-1578904	949.0
16771194	16769792	wb-1578901	953.0
16771196	16769792	wb-1578901	953.0
16771198	16769792	wb-1578907	956.0
16771200	16769792	wb-1578901	953.0

The WBOut node is nex-1578896, with segments wb-1578895 and wb-1578897 feeding into it. It appears these segments are not part of the NHD waterbody connection from NWMv2.1.

Erroneous flowpath

I ran into a flowpath that seems to be in error. It's one of the MultiLineString features, but this one has a segment in New Jersey, and another segment on Long Island. The feature's fid is 18671 (id'=wb-36545').

Generally, I'm wondering about how these flowpaths get to be multilines in the first place. There are 289 multilinestring flowpaths, and I'm wondering how many of them are wacky like this one. I'd be interested in the reason why there are disjoint flowpaths in the first place...

This was encountered in service of finding the endpoints of a stream reach, and the multi-component geometries are throwing a spanner in the works. I now believe that the flowpath geometry is unreliable, and I should instead be looking for some clever strategy to join with the nexus table to get the endpoints I'm after. Any insight into this query would useful. Thanks!

hydroATLAS use

Here is an example of using HydroAtas data with the NextGen fabric from www.lynker-spatial.com

ATTN: @rappjer1 and @jmframe

library(dplyr); library(sf); library(arrow)

# Point to hydrofabric file (subset used here)
p = 'poudre.gpkg'

# Define HydroATLAS variables you want (can skip this and remove `select("hf_id", any_of(vars))` to get all)
vars = 'pet_mm_s01'

## READ NETWORK

net = read_sf(p,  'network') %>% 
  select(divide_id, hf_id) %>% 
  filter(complete.cases(.)) %>% 
  group_by(divide_id) %>% 
  slice(1)

## EXRTACT HydroATLAS vars

ha = open_dataset('s3://lynker-spatial/hydroATLAS/hydroatlas_vars.parquet') %>% 
  filter(hf_id %in% net$hf_id) %>% 
  select("hf_id", any_of(vars)) %>% 
  collect()

## JOIN features and variables

divide = read_sf(p,  'divides') %>% 
  left_join(net, by = "divide_id") %>% 
  left_join(ha, by = "hf_id") 

## PLOT
plot(divide[c(vars)])

^{Created on 2023-10-16 by the reprex package (v2.0.1)}