Code Monkey home page Code Monkey logo

hydrofabric's People

Contributors

anguswg-ucsb avatar arashmodrad avatar mikejohnson51 avatar program-- avatar snowhydrology avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hydrofabric's Issues

Emergent Errors in Refactor

VPU 01

1

Mainstem 1964010 - a single flowpath is made up of three pieces. Two of these (10040781 and 10040780) are represented by duplicated divides shown in red.

image

Fix:

  • Merge flowlines AND divides 10040781, 10040780, and 10012125 make the toID 10012500
  • 10012415 --> 10012226
  • 10012226 --> 10012125

2

Mainstem 1970573. A single flowpath crosses two divides with an extra dangler hanging out (ID 10016831)

image

Remove: flowpath 10016831
Merge: catchments 10016831 and 10016708, keep ID 10016708

3

Mainstem 2611598. There is a disjoin flowpath divide pair directly upstream of a POI (Gage 01017290)

image

Fix:

Merge flowline and divide 10040738, 10040737. Retain ID 10040737 and flow to 10002480

3

image

Remove Flowpath: 10003899

Change End Node of 10003900 to the start node of 10004000

Topo change

10003900 --> 10004000
10003943 --> 10003900

NextGen Hydrolocation Inventory

To support v3.0 of the NextGen hydrofabric, we need to build a more robust method for hydrolocation identification. Below is the (evolving) notion of this process:

graph LR
  subgraph Hydrolocation Inventory
    direction LR
    GFv20_POIs --> conus_hl.gpkg
    coastal_gages --> coastal_POIs
    coastal_domain --> coastal_POIs
    coastal_POIs --> conus_hl.gpkg
    nwm_rl --> nwm_POIs
    nwm_lakes -->nwm_POIs
    nwm_reservoirs --> nwm_POIs
    calibration_gages --> nwm_POIs
    nwm_POIs --> conus_hl.gpkg
    nws_lid --> fim_POIs
    fim_POIs --> conus_hl.gpkg
    reference_fabric --> conus_hl.gpkg
  end

lakes and segments share similar IDs

Here is a list of segments and lakes that share common IDs. This is an issue for inland routing (t-route) when we create our network of connections. Common IDs confuse the 'toid' nature of the network, routing flows to incorrect places.

717696
1311881
3133581
1010832
1023120
1813525
1531545
1304859
1320604
1233435
11816
1312051
2723765
2613174
846266
1304891
1233595
1996602
2822462
2384576
1021504
2360642
1326659
1826754
572364
1336910
1332558
1023054
3133527
3053788
3101661
2043487
3056866
1296744
1233515
2045165
1230577
1010164
1031669
1291638
1637751

Incorrect NWM streamline

In the image below (HUC 08040102), the 6th order blue streamline (also identified as mainstem) leaves the main channel and follows a side loop while the red segment on the main channel is a first order, non-mainstem stream
image

VPU 03w, 03s, and 03n behave differently than the rest

Working on additional definition here. Essentially, there appears to be a different spatial reference for the one VPU that is split compared to the other VPUs.

This is coming up in the process of using the geopackage-defined "divides" polygons as bounds for an interpolation from current NWM forcing files.

steps to reproduce:

wget -P 03n https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_03N.gpkg
wget -P 02 https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_02.gpkg
wget -P data https://noaa-nwm-retrospective-2-1-pds.s3.amazonaws.com/forcing/1980/198012310800.LDASIN_DOMAIN1
import geopandas as gpd
import xarray as xr
import rasterio
from rasterstats import zonal_stats

g2_d = gpd.read_file("02/nextgen_02.gpkg", layers="divides")
g3n_d = gpd.read_file("03n/nextgen_03N.gpkg", layers="divides")
xds = xr.open_dataset("data/198012310800.LDASIN_DOMAIN1", engine="rasterio")
src = xds["U2D"]
aff = src.rio.transform()
arr = src.values[0]

zonal_stats(g3n_d, arr, affine=aff)  # Works -- lots of warnings
zonal_stats(g2_d, arr, affine=aff)  # produces pure Nans -- no overlap

Waterbodies missing lake parameter info

In the conus.gpkg, there are lakes missing lake parameters such as 'LkArea', 'LkMxE', 'OrificeA', etc.

It appears these waterbodies are new, meaning they are not in the NHDNetwork. Will these parameters be added at some point?

Here is a list of the waterbody IDs ('hl_link' in the lakes table):
2277833, 22302965, 120053727, 18421865, 167679168, 167679192, 167679193, 166997626, 167114801, 120049055, 120053465, 167245953, 167297531, 30000600551311, 30000600524767, 5778809, 30000400156418, 10038590, 120051957, 41000400070122, 24381227, 23794331, 8009481, 120054083, 120054085

Missing info from reservoir_index_*_NWMv2.1.nc files

Prior to v4, t-route used reservoir_index* files to get information linking lake IDs to gage IDs and to determine what time of reservoir any given waterbody was (1=Levelpool, 2=USGS, 3=USACE, 4=RFC, 5=Glacially Dammed Waterbody). Could this information be included in the hydrofabric? USGS and USACE gage data is available in the 'network' table, but they are attached to segment IDs rather than waterbodies, and there are instances where a single segment has multiple gages. RFC gages are missing completely.

Suggested change for Reference Topology 07

Where does it occur: VPU 10U right on the Idaho border. Mainstem ID: 1284412

All images come from the refactor_07.gpkg

image

The Problem:

The red line (ID 10049093) flow to ID: 10049122. The U shaped line (10047655) also flows to 10049122. The double entry to the U Shaped line causes trouble when aggregating.

Solution:

  • Remove flowpath 10049093
  • Merge divides 10049093 and 10047655
  • Keep ID 10047655, with the current toID of 10049122

Break in subset: error

library(hydrofabric)
#> ── Attaching packages ───────────────────────────────────── hydrofabric 0.0.9 ──
#> ✔ dplyr         1.1.4      ✔ hydrofab      0.5.0 
#> ✔ ngen.hydrofab 0.0.3      ✔ zonal         0.0.2 
#> ✔ climateR      0.3.3      ✔ sf            1.0.15
#> ✔ nhdplusTools  1.0.1      ✔ terra         1.7.67
#> Warning: package 'terra' was built under R version 4.2.3
#> ── Conflicts ──────────────────────────────────────── hydrofabric_conflicts() ──
#> ✖ terra::plot() masks climateR::plot()
#> 
#> Attaching package: 'hydrofabric'
#> The following object is masked _by_ 'package:hydrofab':
#> 
#>     hf_dm
x. = subset_network(id = 'cat-113060')
#> Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'unique': ℹ In argument: `hf_id == comid`.
#> Caused by error:
#> ! `..1` must be of size 3123543 or 1, not size 0.

Created on 2024-02-15 with reprex v2.0.2

hydrofabrics derived attributes for developing CONUS 3D channel ML model

A list of variables needed to develop the ML model for CONUS using ClimateR, Zonal, and Hydrofabrics.

Variable name Aggregation Description Reduction Source Literature
length_divide Divide Length of fowline feature NA Reference Fabric link
length_catchment Catchment Length of fowline feature Sum Reference Fabric link
area_divide Divide Feature area NA Reference Fabric link
area_catchment Catchment Feature area Sum Reference Fabric link
arbolatesu_divide Divide the sum of the lengths of all digitized flowlines upstream from the downstream end of the immediate flowline NA Reference Fabric link
arbolatesu_catchment Catchment the sum of the lengths of all digitized flowlines upstream Sum Reference Fabric link
pathlength_divide Divide The distance from the bottom of a flowline to the bottom of the terminal flowline along the main path NA Reference Fabric link
pathlength_catchment Catchment The distance from the bottom of a flowline to the bottom of the terminal flowline along the main path Sum Reference Fabric link
streamorde_divide Divide Modified Strahler stream order NA Reference Fabric link
streamleve_divide Divide Stream level NA Reference Fabric link
slope_divide Divide Slope of flowline NA Reference Fabric/DEM link
slope_catchment Catchment Slope of flowline Ave Reference Fabric/DEM link
roughness_divide Divide roughness NA Reference Fabric link
roughness_catchment Catchment roughness Ave Reference Fabric link
elevation_divide Divide elevation Ave DEM link
elevation_catchment Catchment elevation Ave DEM link
aspect_ave_divide Divide Average aspect Ave DEM link
aspect_ave_catchment Catchment Average aspect Ave DEM link
flow_acc_sum_divide Divide Flow accumulation sum Sum DEM link
flow_acc_sum_catchment Catchment Flow accumulation sum Sum DEM link
flow_dir_ave_divide Divide Flow direction average Ave DEM link
flow_dir_ave_catchment Catchment Flow direction average Ave DEM link
clay_divide Divide Average % clay Ave POLARIS link
clay_catchment Catchment Average % clay Ave POLARIS link
sand_divide Divide Average % sand Ave POLARIS link
sand_catchment Catchment Average % sand Ave POLARIS link
silt_divide Divide Average % silt Ave POLARIS link
silt_catchment Catchment Average % silt Ave POLARIS link
bd_divide Divide Average soil bulk density, (g cm-3) Ave POLARIS link
bd_catchment Catchment Average soil bulk density, (g cm-3) Ave POLARIS link
ksat_divide Divide Average effective saturated hydraulic conductivity, (cm hr-1) Ave POLARIS link
ksat_catchment Catchment Average effective saturated hydraulic conductivity, Ave POLARIS link
om_divide Divide Average organic matter content, (%) Ave POLARIS link
om_catchment Catchment Average organic matter content, (%) Ave POLARIS link
ph_mean_divide Divide Average soil PH Ave POLARIS link
ph_mean_catchment Catchment Average soil PH Ave POLARIS link
theta_r_divide Divide Average residual soil water content, (cm3 cm-3) Ave POLARIS link
theta_r_catchment Catchment Average residual soil water content, (cm3 cm-3) Ave POLARIS link
theta_s_divide Divide Average saturated soil water content, (cm3 cm-3) Ave POLARIS link
theta_s_catchment Catchment Average saturated soil water content, (cm3 cm-3) Ave POLARIS link
hb_mean_divide Divide Average Brooks-Corey parameter related to the air-entry pressure (cm) Ave POLARIS link
hb_mean_catchment Catchment Average Brooks-Corey parameter related to the air-entry pressure (cm) Ave POLARIS link
lambda_mean_divide Divide Average Brooks-Corey parameter the pore size distribution index, (dimensionless) Ave POLARIS link
lambda_mean_catchment Catchment Average Brooks-Corey parameter the pore size distribution index, (dimensionless) Ave POLARIS link
n_mean_divide Divide Average empirical shape-defining parameters in the van Genuchten equation, (dimensionless) Ave POLARIS link
n_mean_catchment Catchment Average empirical shape-defining parameters in the van Genuchten equation, (dimensionless) Ave POLARIS link
alpha_mean_divide Divide Average parameter of the van Genuchten equation corresponding approximately to the inverse of the air-entry value, (cm-1) Ave POLARIS link
alpha_mean_catchment Catchment Average parameter of the van Genuchten equation corresponding approximately to the inverse of the air-entry value, (cm-1) Ave POLARIS link
LAI_ave_divide Divide Average Leaf Area Index Ave MODIS link
LAI_ave_catchment Catchment Average Leaf Area Index Ave MODIS link
LAI_min_divide Divide Sum of areas with Leaf Area Index <= 5 Sum MODIS link
LAI_min_catchment Catchment Sum of areas with Leaf Area Index <= 5 Sum MODIS link
LAI_max_divide Divide Sum of areas with Leaf Area Index >= 15 Sum MODIS link
LAI_max_catchment Catchment Sum of areas with Leaf Area Index >= 15 Sum MODIS link
NDVI_ave_divide Divide Average Normalized difference vegetation index Ave MODIS link
NDVI_ave_catchment Catchment Average Normalized difference vegetation index Ave MODIS link
NDVI_min_divide Divide Sum of areas with Normalized difference vegetation index <= 0.2 Sum MODIS link
NDVI_min_catchment Catchment Sum of areas with Normalized difference vegetation index <= 0.2 Sum MODIS link
NDVI_max_divide Divide Sum of areas with Normalized difference vegetation index > 0.2 Sum MODIS link
NDVI_max_catchment Catchment Sum of areas with Normalized difference vegetation index <= 0.2 Sum MODIS link
humid_divide Divide Average specific_humidity Ave NLDAS link
humid_catchment Catchment Average specific_humidity Ave NLDAS link
temperature_ave_divide Divide Average temperature Ave NLDAS link
temperature_ave_catchment Catchment Average temperature Ave NLDAS link
temperature_min_divide Divide Minimum temperature Min NLDAS link
temperature_min_catchment Catchment Minimum temperature Min NLDAS link
wind_u_divide Divide Average U wind component at 10 meters above the surface Ave NLDAS link
wind_u_catchment Catchment Average U wind component at 10 meters above the surface Ave NLDAS link
wind_v_divide Divide Average V wind component at 10 meters above the surface Ave NLDAS link
wind_v_catchment Catchment Average V wind component at 10 meters above the surface Ave NLDAS link
SoilMoi0_10cm_inst_min_divide Divide Minimum soil moister Min GLDAS link
SoilMoi0_10cm_inst_min_catchment Catchment Minimum soil moister Min GLDAS link
SoilMoi0_10cm_inst_ave_divide Divide average soil moister Ave GLDAS link
SoilMoi0_10cm_inst_ave_catchment Catchment average soil moister Ave GLDAS link
SoilMoi0_10cm_inst_max_divide Divide maximum soil moister Max GLDAS link
SoilMoi0_10cm_inst_max_catchment Catchment maximum soil moister Max GLDAS link
SoilTMP0_10cm_inst_min_divide Divide Area of soil temperature less than or qual to zero Sum GLDAS link
SoilTMP0_10cm_inst_min_divide Catchment Area of soil temperature less than or qual to zero Sum GLDAS link
SoilTMP0_10cm_inst_ave_divide Divide Average soil temperature Ave GLDAS link
SoilTMP0_10cm_inst_ave_divide Catchment Average soil temperature Ave GLDAS link
ESoil_tavg_divide Divide Average Direct evaporation from bare soil Ave GLDAS link
ESoil_tavg_catchment Catchment Average Direct evaporation from bare soil Ave GLDAS link
Evap_tavg_divide Divide Average Evapotranspiration Ave GLDAS link
Evap_tavg_catchment Catchment Average Evapotranspiration Ave GLDAS link
Qs_acc_ave_divide Divide Average Storm surface runoff Ave GLDAS link
Qs_acc_ave_catchment Catchment Average Storm surface runoff Ave GLDAS link
Qs_acc_sum_divide Divide Sum Storm surface runoff Sum GLDAS link
Qs_acc_sum_catchment Catchment Sum Storm surface runoff Sum GLDAS link
Qsb_acc_ave_divide Divide Average Baseflow- Ave groundwater runoff Ave GLDAS link
Qsb_acc_ave_catchment Catchment Average Baseflow-groundwater runoff Ave GLDAS link
Qsb_acc_sum_divide Divide Sum Baseflow-groundwater runoff Sum GLDAS link
Qsb_acc_sum_catchment Catchment Sum Baseflow-groundwater runoff Sum GLDAS link
Qsm_acc_ave_divide Divide Average Snow melt Ave GLDAS link
Qsm_acc_ave_catchment Catchment Average Snow melt Ave GLDAS link
Qsm_acc_sum_divide Divide Sum Snow melt Sum GLDAS link
Qsm_acc_sum_catchment Catchment Sum Snow melt Sum GLDAS link
Snowf_tavg_ave_divide Divide Average Snow precipitation rate Ave GLDAS link
Snowf_tavg_ave_catchment Catchment Average Snow precipitation rate Ave GLDAS link
Snowf_tavg_max_divide Divide Max Snow precipitation rate Max GLDAS link
Snowf_tavg_max_catchment Catchment Max Snow precipitation rate Max GLDAS link
SnowDepth_inst_ave_divide Divide Average Snow depth Ave GLDAS link
SnowDepth_inst_ave_catchment Catchment Average Snow depth Ave GLDAS link
SnowDepth_inst_max_divide Divide Max Snow depth Max GLDAS link

Need to align ref_mainstem with lp_mainstem

Recent changes in geoconnex now use a persistent ref_mainstem_id, and a reference fabric based levelpath lp_mainstem. Previously, it was expected that the lp_mainstem would be 1:1 with the ref_mainstem...

This is actually a very good thing for our processing! Now it allows us to retain the lp_mainstem as the mainstem ID, and we can add a ref_mainstem_uri that point to the proper persistent geoconnex feature.

Overall, it better disambiguates the persistent identification system, and the reference fabric based processing

segments 'toid' pointing to an upstream nexus

Looking at conus.gpkg, segments wb-1578432 and wb-1578518 seem to point to a nexus point upstream (see image below)

wb_to_nexus_upstream

I believe they should be pointing to nex-1577901 just like wb-1577900.

Need to preserve outlet/teminal ID through NHD workflow

In order to use NHD aggregate outlets in the hyRefactor workflow we need to preserve the terminal id/type through the aggregation. This can be done within each step, or, can be executed at the end on the identities of the the member_COMIDs.

Will need to test both...

Dealing with persistent "short" paths.

@shorvath-noaa raised some issues with flow paths shorter then our allowed 1km. Currently these are expected (at least all that I have looked at). An example is here:

image

The upper POI is a HUC12 outlet, while the lower one is a gage. These distinct features are only .4 km apart, and since they are enforced in the network manipulations, they are persistent.

The options for dealing with these are:

  1. Keep as is
  2. Set a priority for exclusion (e.g. a gage takes precedent over a HUC12 outlet, so in this case the HUC12 is droped) - I do not like this approach personally
  3. Collapse them into the downstream POI and associate them to the same hl_id.
  4. Others?

@NOAA-OWP/hydrofabric have any thoughts?

Suggestion: Add version, license, and other relevant metadata to `gpkg_metadata` table

It would really nice if version and other metadata information traveled with a hydrofabric asset. This is a big can of worms over what to include, but at a minimum, it would be great to know the HF version and a copy of the license. Im sure this will not play nicely with all HF distribution formats (e.g. geoparquet?), but it looks like this is possible with the gpkg_metadata table in the geopackage spec.

Add VPU to divides layer.

I'm suggesting we add an optional VPU ID to divides layer DM. If we are advocating for a divides view in NextGen, the segmentation should be available w/o moving into the flowpaths. @program-- ?

Land cover and soil classification using generic categories

Current behavior

The hydrofabric determines the land cover classification for a basin by using the mode of land cover categories within its boundaries. Sometimes this leads to conflicts with the soil type, particularly if water is involved.

The below screenshot, from @mikejohnson51, shows an area in which the soil (ISLTYP = 14) is classified as water, while the land cover (IVGTYP = 15) is classified as mixed forest.

image

This caused runs of Noah-OWP-Modular to fail because of inconsistencies in the soil and land cover categories: NOAA-OWP/DMOD#472 (comment)

Expected behavior

The hydrofabric should return self-consistent values of soil and land cover.

Suggested changes

Add new, lumped generic categories for soil and land cover. I provide suggestions below:

Soil generic categories

From the STAS category in Noah-OWP-Modular:

land = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 17, 18, 19)
water = 14
ice = 16

Land cover generic categories

From the USGS category in Noah-OWP-Modular:

urban = 1
agricultural = c(2, 3, 4, 5, 6)
grassland = c(7, 8, 9, 10)
forest = c(11, 12, 13, 14, 15)
water = 16
wetland = c(17, 18)
barren = c(19, 25, 26, 27)
tundra = c(20, 21, 22, 23)
ice = 24

Code changes

Implement a two-step classification option that:

  1. Provides generic category mapping as shown above
  2. Computes the mode of all generic land cover (e.g., forest, water, urban, agriculture) and soil categories in the basin
  3. Then computes the mode (using the specific land cover and soil categories) of the dominant generic categories from step 2
  4. Then checks for internal consistency between soil and land cover categories. The most important check is that if soil == land then the land cover corresponds to land_cover %in% c(urban, agricultural, grassland, forest, wetland, barren, tundra) or soil == water and land_cover == water or soil == ice and land_cover == ice

Misplaced 1st order streams along 7th order main flow paths

@mluck @JamesColl-NOAA @CarsonPruitt-NOAA @RyanSpies-NOAA It seems like I may be overlooking something, and there's a chance I don't have a complete grasp but during my testing of the machine learning-derived volume on HAND data, . While navigating through the primary stem rivers, which are predominantly 7th order streams, I noticed instances where lower-ordered streams, often 1st order, are in place of 7order streams that throws of actual flood extent due to significantly different discharge values from the National Water Model. discharge. Here are two examples:

First:
HydroID: 12970099 (dark green catchment in image) mapping to feature_id: 2240709 which is a 1st order stream
whereas looking at reference fabric catchments (dark blue boundaries in image) and flow lines this HydroID should be associated with a 7th order stream (feature_id: 2242125)

Second:
HydroID: 12970044 mapping to feature_id: 2242309 which is a 2nd order stream
whereas looking at reference fabric catchments and flow lines this HydroID should be associated with a 7th order stream (feature_id: 2248065)

error

NWM v2.1 flowline issue on Carson River

The FIM dev team has noted a potential issue with the NWM flow line in HUC 16050201.

Issue: The downstream flow routing deviates off of the main channel of the Carson River and instead follows the Mexican Ditch flow line (located to the west of the mainstem Carson River). This is evident in tracking the stream order attribute (see attached screenshot)

Where: The NWM 2.1 flow line on the Carson River near Carson City, NV. Featureids: 11432977, 11432991, 11432989

Impacts: flow routing is incorrectly appropriating Carson River flow values to the smaller ditch stream line; FIM for the mainstem Carson River is severely underestimated.

HUC_16050201_STWN2_flow_line_issue
HUC_16050201_STWN2_zoom_mexican_dam

Reference feature characteristics needed for bankfull width and depth ML model

Variable name Aggregation Description Reduction Source Literature
length_catchment Catchment Length of fowline feature Sum Reference Fabric link
area_catchment Catchment Feature area Sum Reference Fabric link
arbolatesu_catchment Catchment the sum of the lengths of all digitized flowlines upstream Sum Reference Fabric link
pathlength_catchment Catchment The distance from the bottom of a flowline to the bottom of the terminal flowline along the main path Sum Reference Fabric link
streamorde_divide Divide Modified Strahler stream order NA Reference Fabric link
streamleve_divide Divide Stream level NA Reference Fabric link
slope_divide Divide Slope of flowline NA Reference Fabric/DEM link
roughness_divide Divide roughness NA Reference Fabric link

Suggested change for Reference Topology 01

Problem area: Mainstem 2111164 in VPU 01

Currently a tiny little sliver is a single mainstem

image

Change: 10018008 Mainstem ID to 2111164

Bad digitization leads to bad topology

Redigitize: 10018009

image

Change IDs

10040692 --> 10018009
10018009 --> 10018008

subset_reference function not exported to hydrofabric namespace

I've been trying to run some of the network manipulation workflow (https://noaa-owp.github.io/hydrofabric/articles/03-processing-deep-dive.html) and ran into an issue with the subset_reference function.

After attaching the hydrofabric package to my R session, the subset_reference function is not available as an auto-complete and when I call it directly I get the error:

Error in subset_reference() : could not find function "subset_reference"

Digging into the package namespace, I see that the subset_reference function is not exposed.

# Generated by roxygen2: do not edit by hand

S3method(print,hydrofabric_conflicts)
export(hydrofabric_conflicts)
export(hydrofabric_packages)
export(subset_network)
import(climateR)
import(dplyr, except = c(intersect, union))
import(glue, except = trim)
import(hydrofab)
import(ngen.hydrofab)
import(nhdplusTools)
import(sf)
import(terra)
import(zonal)
importFrom(DBI,dbConnect)
importFrom(DBI,dbDisconnect)
importFrom(RSQLite,SQLite)
importFrom(arrow,open_dataset)
importFrom(magrittr,"%>%")
importFrom(purrr,keep)
importFrom(purrr,map)

My workaround has been to copy the function at line 13 of subset_network.R (https://github.com/NOAA-OWP/hydrofabric/blob/main/R/subset_network.R) and paste into my R script. This seems to be working well for now. Can you add subset_reference as an export to this package, or is there something I'm doing wrong here?

Thanks!

How do I integrate it with others?

I have two issues.
Issue 1. I tentatively build without Docker container. The number of depending R packages grows and I give up.
Issue 2: I build with Dockerfile. The build is successful. But, I do not see entry point (normally CMD).
How do I runt it?
How do I integrate it with other components, such as ngen?
What types of knowledge am I lack of?
I appreciate helps.

Some catchments/divides have incorrectly formatted divide_id

The divides layer of the current nextgen_02.gpkg file (MD5: a4cd50cd666f4bb177e7671f253a3393) has a row with divide_id of cat-7e+05 (index 33534). No other rows in this file appear to use this irregular format, though I haven't checked all the other hydrofabric files.

Some waterbodies are missing WBOut node

In the conus.gpkg, there are waterbodies that have one or more WBIn nodes, but no WBOut node.

Here is a list of waterbodies (lake_ids, or 'hl_link' in the lakes attribute table) that have a WBIn node, but no WBOut node:

1320604
4415746
7924833
18714340
20152469
20318000
23016866
23704173
24052303
120049589
120051936
120052233
120054084
166759841
167484062

Unable to call subset_reference() function

Scenario:

Yesterday, I created a Docker image based on rocker/geospatial for using the hydrofabric package.

Then I installed the package in a container from this image with:

$ installGithub.r NOAA-OWP/hydrofabric

When I tried to run the script:

library(hydrofabric)

nldi_feature = list(featureSource = "nwis", featureID = "05102490")

subset_reference( nldi_feature = nldi_feature,  gpkg = "...", export_gpkg = "...")

I got Error in subset_reference(...) : could not find function "subset_reference" .

It seems that the function subset_reference is not reachable from outside the package, as the output of lsf.str("package:hydrofabric") only lists the function subset_network:

hydrofabric_conflicts : function ()  
hydrofabric_packages : function (include_self = TRUE)  
subset_network : function (id = NULL, comid = NULL, hl_id = NULL, network = "data/conus_net.parquet", 
    pattern = "/Volumes/Transcend/ngen/CONUS-hydrofabric/05_nextgen/nextgen_{vpu}.gpkg", 
    lyrs = c("divides", "nexus", "flowpaths", "network", "hydrolocations"), 
    export_gpkg = NULL) 

I was able to run my script by copy+pasting the whole subset_reference function from the source into my script prior to the function call.

Question:

Is it expected?
Maybe the package is missing some "export" for the subset_reference function?
Maybe I am doing something wrong as I am learning the own R language as I learn to use the package itself?

Misindexed POI (Gages-06719505)

Somewhere in the hydrolocation creation, NWIS gage 06719505 has been assigned to 2 flowpaths in different VPUs:

ATTN: @joshsturtevant

> filter(l, hl_reference == "Gages") %>% 
+   filter(hl_link == '06719505')
Simple feature collection with 2 features and 8 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -105.2193 ymin: 39.75805 xmax: -103.2906 ymax: 41.99998
Geodetic CRS:  WGS 84
# A tibble: 2 × 9
  hl_link  hl_reference    hf_id       ID     X     Y VPUID hl_id                 geom
* <chr>    <chr>           <dbl>    <dbl> <dbl> <dbl> <chr> <int>          <POINT [°]>
1 06719505 Gages         2885178 10024633 -105.  39.8 10L   46845 (-105.2193 39.75805)
2 06719505 Gages        16057156 10169551 -105.  39.8 10U   56858 (-103.2906 41.99998)

lake_id 1710676 missing in lake table

Lake ID 1710676 is missing in the "lake" table of the conus.gpkg. It is listed as the waterbody_id for several segments in the "flowpath_attributes" table, and exists in the NWMv2.1 LAKEPARM file, but it looks like it was incorrectly given the ID 1711354 in the "lake" table based on the WBIn and WBOut nexus points. This lake (1711354), however, is missing all lake parameters in the "lake" table.

Waterbody ID 5569731 has two outlets

In the latest hydrofabric, v20.1, waterbody ID 5569731 has two outlets, segment 2409607 and 2409629. This confuses t-route as t-route relies on connections always having only 1 downstream segment. vpu: 12

image

Catchment Weights

Catchment weights (the indices within the lat/lon divides of a particular catchment relative to some grid) are needed for extracting forcing data. The method of calculating these weights can vary and yield different weights depending on methods used See issue 28 in hfsubset. Also, these weights may change with each hydrofabric release. To help ensure catchment weights are consistent for a particular hydrofabric and grid, I suggest generating conus weight files for common grids(projections) that can be subsetted via a tool like hfsubset.

For example, NWM v3 uses a lambert conformal conic projection.

https://noaa-nwm-pds.s3.amazonaws.com/nwm.20240112/forcing_medium_range/nwm.t00z.medium_range.forcing.f001.conus.nc

>>> nwm_data.crs.esri_pe_string
'PROJCS["Lambert_Conformal_Conic",GEOGCS["GCS_Sphere",DATUM["D_Sphere",SPHEROID["Sphere",6370000.0,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["false_easting",0.0],PARAMETER["false_northing",0.0],PARAMETER["central_meridian",-97.0],PARAMETER["standard_parallel_1",30.0],PARAMETER["standard_parallel_2",60.0],PARAMETER["latitude_of_origin",40.0],UNIT["Meter",1.0]];-35691800 -29075200 10000;-100000 10000;-100000 10000;0.001;0.001;0.001;IsHighPrecision'

The projection type and hydrofabric can be stored as fields in the weights.json. This way forcings engines could check that the weight projection is appropriate for the nwm file being processed. Information about the grid should be stored in the json as well (dx, dy, nx, ny, x0, y0).

This will place more responsibility on the hydrofabric, but might be worth it in the long run to avoid weight generation algorithms yielding different weights for the same hydrofabric and grid.

Waterbody outlet is disconnected from waterbody

Looking at file nextgen_03S.gpkg, waterbody (hl_link) 16769792 has two WBIn nodes and one WBOut node. However, the WBOut node is not connected either of the WBIn nodes (see image below).
disconnected_waterbody

Obtaining waterbody connections from the NWMv2.1 route_link file and comparing them to the wb-IDs in the 'network' table yields the following crosswalk table:

NHDSegmentID lake_id link hydroseq
16770354 16769792 wb-1578901 953.0
16770384 16769792 wb-1578901 953.0
16770780 16769792 wb-1578904 949.0
16770782 16769792 wb-1578903 950.0
16770784 16769792 wb-1578910 954.0
16770786 16769792 wb-1578903 950.0
16770788 16769792 wb-1578901 953.0
16770790 16769792 wb-1578907 956.0
16770792 16769792 wb-1578907 956.0
16770794 16769792 wb-1578901 953.0
16770796 16769792 wb-1578902 957.0
16771128 16769792 wb-1578901 953.0
16771132 16769792 wb-1578901 953.0
16771136 16769792 wb-1578905 952.0
16771138 16769792 wb-1578905 952.0
16771140 16769792 wb-1578905 952.0
16771142 16769792 wb-1578901 953.0
16771144 16769792 wb-1578901 953.0
16771146 16769792 wb-1578903 950.0
16771148 16769792 wb-1578903 950.0
16771176 16769792 wb-1578901 953.0
16771178 16769792 wb-1578901 953.0
16771184 16769792 wb-1578901 953.0
16771186 16769792 wb-1578911 951.0
16771190 16769792 wb-1578904 949.0
16771192 16769792 wb-1578904 949.0
16771194 16769792 wb-1578901 953.0
16771196 16769792 wb-1578901 953.0
16771198 16769792 wb-1578907 956.0
16771200 16769792 wb-1578901 953.0

The WBOut node is nex-1578896, with segments wb-1578895 and wb-1578897 feeding into it. It appears these segments are not part of the NHD waterbody connection from NWMv2.1.

Erroneous flowpath

I ran into a flowpath that seems to be in error. It's one of the MultiLineString features, but this one has a segment in New Jersey, and another segment on Long Island. The feature's fid is 18671 (id'=wb-36545').

Generally, I'm wondering about how these flowpaths get to be multilines in the first place. There are 289 multilinestring flowpaths, and I'm wondering how many of them are wacky like this one. I'd be interested in the reason why there are disjoint flowpaths in the first place...

This was encountered in service of finding the endpoints of a stream reach, and the multi-component geometries are throwing a spanner in the works. I now believe that the flowpath geometry is unreliable, and I should instead be looking for some clever strategy to join with the nexus table to get the endpoints I'm after. Any insight into this query would useful. Thanks!

hydroATLAS use

Here is an example of using HydroAtas data with the NextGen fabric from www.lynker-spatial.com

ATTN: @rappjer1 and @jmframe

library(dplyr); library(sf); library(arrow)

# Point to hydrofabric file (subset used here)
p = 'poudre.gpkg'

# Define HydroATLAS variables you want (can skip this and remove `select("hf_id", any_of(vars))` to get all)
vars = 'pet_mm_s01'

## READ NETWORK

net = read_sf(p,  'network') %>% 
  select(divide_id, hf_id) %>% 
  filter(complete.cases(.)) %>% 
  group_by(divide_id) %>% 
  slice(1)

## EXRTACT HydroATLAS vars

ha = open_dataset('s3://lynker-spatial/hydroATLAS/hydroatlas_vars.parquet') %>% 
  filter(hf_id %in% net$hf_id) %>% 
  select("hf_id", any_of(vars)) %>% 
  collect()

## JOIN features and variables

divide = read_sf(p,  'divides') %>% 
  left_join(net, by = "divide_id") %>% 
  left_join(ha, by = "hf_id") 

## PLOT
plot(divide[c(vars)])

Created on 2023-10-16 by the reprex package (v2.0.1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.