Code Monkey home page Code Monkey logo

Comments (17)

jmframe avatar jmframe commented on September 25, 2024 1

My recommendation is to completely delete the examples "hourly_all_attributes_and_forcings" because as I understand it most of the attributes are not available in the hydrofabric anyway, and we have had mondo problems trying to recreate them. So we can't use that trained model anyway.

Does this solve the problem?

from lstm.

jmframe avatar jmframe commented on September 25, 2024 1

Nice, it is settled. By the way, tests by @EmilyDeardorff et al, at Summer Institute, 2021 showed that including the basin area as a static input did not help the LSTM performance, particularly in small basins, (experiments done specifically for 3-15km^2 basins). That information is already included in the model during pre- and post- processing, going to and from area normalized runoff [L T^-1] to volume flux [L^3 T^-1]. So I guess the best thing to do is avoid any area estimates as LSTM inputs.

from lstm.

SnowHydrology avatar SnowHydrology commented on September 25, 2024 1

@jmframe I reckon you could write a good, useful LSTM state of practice review paper (if it doesn't exist already).

from lstm.

madMatchstick avatar madMatchstick commented on September 25, 2024

From *_forcings.yml,

area_sqkm: 620.38
       ...
area_gages2: 573.60000

These are both in km^2. Is one flavor preferred over the other? See comments in previous/forked version's readme under #meta-data & #static-attributes.

Know that area_sqkm is seen here in bmi.initialize(),

        # ----------- The output is area normalized, this is needed to un-normalize it
        #                         mm->m                             km2 -> m2          hour->s    
        self.output_factor_cms =  (1/1000) * (self.cfg_bmi['area_sqkm'] * 1000*1000) * (1/3600)

@lcunha0118 or @mikejohnson51 - can you speak to these variables at all, and perhaps why are the values different? -thanks!

from lstm.

mikejohnson51 avatar mikejohnson51 commented on September 25, 2024

Hey @madMatchstick,

See section 3 here for the detailed answer to this. Bottom line is they come from different delineation methods.

Most critically:

We recommend not using catchments with large error discrepancies with GAGES-II, as they are most likely erroneous in the geospatial fabric (e.g., Bock et al., 2016). Note that, in general, catchment delineation is more challenging in flat areas, but here errors in flat areas are relatively well contained, except in Florida.

I'll add that most geographic variables were summarized over the geospatial fabric mentioned above, the hydrologic data was computed for gages at the outlet of the gagesII dataset. So they are saying that in cases were these diverge there may not be an accurate accounting between the landscape and the hydrology in the included data.

from lstm.

madMatchstick avatar madMatchstick commented on September 25, 2024

Thank you @mikejohnson51 for clarifying. And for the linked article; a much needed read on my end!

Per conversation, let's stick with area_sqkm for now.

from lstm.

SnowHydrology avatar SnowHydrology commented on September 25, 2024

@madMatchstick, @mikejohnson51, @jmframe Following up on this old issue, it seems like knowing the difference between the two delineated areas is important, but it may not be necessary to keep both in the forcing file. In other words, can we remove area_gages2?

from lstm.

mikejohnson51 avatar mikejohnson51 commented on September 25, 2024

It makes sense to me given how the characteristics are determined!

from lstm.

jmframe avatar jmframe commented on September 25, 2024

This isn't a forcing file, this is a model configuration file. These attributes come from Camels. The answer to this question depends on what attributes the model was trained on. These config files are examples. Are we talking about how to turn these examples into an operational version? There is no reason to keep anything in the config file that won't be used in the model. But removing a variable that is used by the model would be bad news. As for the question of using a basin area in general, the best approach is to train the LSTM with the best estimate of basin areas, and use the best estimates of basin area during run time. It doesn't matter what the static attribute is called. Camels just happens to have two estimates of basin area, and to know what any particular trained LSTM used, you should look at the original training config, which, for this example, is here: https://github.com/NOAA-OWP/lstm/blob/master/trained_neuralhydrology_models/hourly_all_attributes_and_forcings/config_original.yml. It shows that the camels attribute that was used in training is "area_gages2". Now, the basin area used at runtime won't come from the "Gauges 2" dataset, obviously. In an ideal world, we will train an LSTM on basin area that is calculated from the exact same method as the basin area calculation from the hydrofabric.

Does that make sense?

from lstm.

SnowHydrology avatar SnowHydrology commented on September 25, 2024

Thanks for the info, @jmframe!

To confirm:

  • The original training was done with area_gages2?
  • If we delete either of the variables from the config, it should be area_sqkm? Although that is the hydrofabric-derived variable, it was not used in the original training.
  • However, we should also update initialization code to use area_gages2 because it currently uses area_sqkm?

from lstm.

SnowHydrology avatar SnowHydrology commented on September 25, 2024

The nuclear option works for me. @madMatchstick, any objections? If not, I'll make the edits and submit a PR.

from lstm.

jmframe avatar jmframe commented on September 25, 2024

That might be a great thing to do under the CIROH deep learning project, specifically for the application of NextGen, if that ever gets off the ground!

from lstm.

madMatchstick avatar madMatchstick commented on September 25, 2024

The nuclear option works for me. @madMatchstick, any objections? If not, I'll make the edits and submit a PR.

No objections on my end. Thanks you two!

from lstm.

andywood avatar andywood commented on September 25, 2024

from lstm.

andywood avatar andywood commented on September 25, 2024

from lstm.

jmframe avatar jmframe commented on September 25, 2024

@andywood I am hoping that we can completely disentangle the area calculations, and their influence on 1) model training, and 2) model runtime. the calculations used to train the model should be representative of the "true area", and the calculations used to run the model should also be representative of the "true area". It is up to the LSTM trainer (us hopefully) to make sure they use a realistic basin area for training, and it is up to the NextGen Hydrofabric team to make sure that the areas used at runtime are realistic. We should definitely continue this conversation, and make sure as best we can that we are using a similar method to train the model as is done to run it in NextGen. We should coordinate that with @mikejohnson51. By removing the example LSTM with basin area as a static attribute, hopefully we'll just avoid confusion over variable names for now. When the CIROH project "officially" starts, our first task should be to train an operational ready LSTM model, as opposed to these example models. And we'll do all we can to make sure the areas are calculated correctly then. Sound good?

from lstm.

andywood avatar andywood commented on September 25, 2024

from lstm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.