Comments (10)
Works for me! I'll add it to our docs next week and get a new HF run made with this and AHPS gages included
from hydrofabric.
Also found cat-1e+06
in nextgen_05.gpkg.
from hydrofabric.
@mikejohnson51 should these id formats be considered invalid?
from hydrofabric.
No I dont see any reason they should be invalid as long as we are using strings. I'd like to clean them up for consistency in readability but a unique string is a unique string
from hydrofabric.
I'd like to make the case for limiting the set of characters used in identifiers, specifically to exclude punctuation that has meaning in shells or regular expressions. Yes, everything downstream could be coded to handle arbitrary characters in the strings, but I think it's less work for the ecosystem at large if the hydrofabric comes with a [edit:] reasonably limited specification of what characters it will actually use.
from hydrofabric.
Sure 100% agree! That said we haven't given identifiers the full care they need (e.g. should they actually be integers?). I think having an evolving - documented - best practice is great but saying things are invalid is potentially extreme given the hydrofabric can (will?) serve many applications outside of NextGen. In the case of these, they were an odd by product of R concatenation - getting ride of them is easy. I don't want this tiny issue to turn into a major thing at this point though. Larger concern would be if this ID made NextGen fail, why did it?
from hydrofabric.
This particular ID caused ngen
to fail because it contained a +
character, and the ID was concatenated into a string that was then used as a regular expression pattern to match filenames for forcing data.
The overall NWM project is going to have a huge range of tools and languages consuming names from the hydrofabric. Those IDs are being used in a broad range of ways, including in file names, as database keys, and so forth. Given that, I'd expect that any character appearing in an ID that has special meaning to any bit of code it's passing through will lead to a bug.
Given that we do seem to want to encode kinda-arbitrary information beyond a number into IDs for catchments, nexuses, waterbodies, etc, allowing for strings in general seems reasonable enough.
My proposal for the character set to allow is A-Za-z0-9_-
(latin alphabet in upper and lower case, digits, dash, and hyphen). None of those have special meanings in regex, shell syntax, shell globs, or SQL to my knowledge.
from hydrofabric.
This is unlikely to come up, but I realize it may be worth specifying that the first character is not -
(and maybe not _
either) to avoid command-line utilities misparsing filenames starting with an ID as flag arguments.
I'll see if someone who knows Javascript/JSON can say whether there are other more-specific patterns that may cause it to parse things in ways it shouldn't
from hydrofabric.
Other input from the framework team was that it would additionally be good if we could assure that the first character of these IDs were a letter, other than x
, to additionally guard against mis-parsing as numbers if consumer code tries to be stupid.
- The initial
x
apparently has been seen interpreted like0x
, meaning a number in hexadecimal.
from hydrofabric.
In the latest run we get:
net = read_parquet('/Volumes/MyBook/conus-hydrofabric/v22/conus_net.parquet')
sum(grepl("[+]", net$id))
#> [1] 0
sum(grepl("[+]", net$divide_id))
#> [1] 0
sum(grepl("[+]", net$toid))
#> [1] 0
Created on 2023-11-28 by the reprex package (v2.0.1)
Therefore I think this is solved! Will add the context to the Rmds before AGU.
from hydrofabric.
Related Issues (20)
- hydrofabrics derived attributes for developing CONUS 3D channel ML model HOT 2
- Dealing with persistent "short" paths. HOT 1
- Reference feature characteristics needed for bankfull width and depth ML model
- Missing info from reservoir_index_*_NWMv2.1.nc files
- Add VPU to divides layer. HOT 3
- hydroATLAS use HOT 3
- Suggested change for Reference Topology 07
- Suggested change for Reference Topology 01
- Emergent Errors in Refactor HOT 6
- Need to align ref_mainstem with lp_mainstem
- Catchment Weights HOT 12
- Land cover and soil classification using generic categories HOT 7
- Waterbody ID 5569731 has two outlets
- Break in subset: error HOT 1
- Misplaced 1st order streams along 7th order main flow paths HOT 5
- Misindexed POI (Gages-06719505) HOT 3
- NextGen Hydrolocation Inventory HOT 1
- conus.gpkg v2.01 is missing lake attribute table
- Suggestion: Add version, license, and other relevant metadata to `gpkg_metadata` table HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hydrofabric.