Code Monkey home page Code Monkey logo

climatexml's Introduction

⛵️ About me

  • 🤓 I’m Nic (@nannau) and I'm a physical scientist who is interested in data science, high performance computing, and software development.
  • 👀 Interested in computer vision to solve problems in climate science, atmospheric physics, climate services, and statistical downscaling.

climatexml's People

Contributors

kdaust avatar nannau avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

kdaust

climatexml's Issues

Implement Fully Stochastic Generator

  • Option for Generator with noise injection
  • Stochastic training option (create batch of realisations for each field)
  • Option for switching MAE for CRPS metric in content loss

Choose one msssim package

There are multiple msssim metrics being used one in losses.py and one from torch metrics. Which one should be used. Window size was an issue for smaller fields.

ClimatEx domains longitudes and latitudes

Need lon and lats for created climatex domains, on top of rlon and rlat. Most ClimatEx pre-processed domains do not seem to have corresponding lat lons. For each grid of the ClimatEx fields created from nc2pt, an additional file that contains all Lon lat and rlon rlat values would be really useful for analysis and plotting, something similar to the hr and lr topography fields we have in ClimatEx. This then allows us to better compare the models to other fields such as observations. May be something we would need to add to nc2pt. Or, just an additional file that has the lon/lats of hr_ref.nc.

ERA5 to WRF Inference Module

Build an inference portion of the code base to download ERA5 data, preprocess it, and perform a downscaling for a time slice. A script would be nice, or potentially build a Emulator object that performs this operation.

Define sprint objectives and discuss workflow

The first thing to do is describe what our objectives are for this first sprint.

We can discuss them and document them here for future reference.

Then we should discuss the workflow using GitHub and how we should implement changes.

  • Create new branch
  • Implement some changes
  • Open pull request
  • Tag issues in PR
  • Obtain code review by me and Kiri
  • Merge

More Generalized Data Handling Capabilities

Each of us in the group will have different data needs that the GAN needs to handle and load. Right now it's specific for super resolution.

One way of helping us handle different data types is to factor out the data config and instantiate "translator" objects that determine how PyTorch Lightning/PyTorch Dataloaders load the data. This is related to #2.

  • This is also closely related to stochastic methods that we want to eventually add.
  • We want the ability to customize the dataloaders for each persons problem.

Improve Setup Documentation and README

After working through the install with @sbeale007, it's painfully clear that the now empty README requires some installation instructions!

I'll take lead on this, but @kdaust and @sbeale007 your input would be very helpful because you're working with slightly different systems and might encounter things I don't.

Restructure Config with Hydra and Instantiate

Currently, the file wgan-gp is a bit crazy. Specifically SuperResolutionWGANGP -- our PyTorch Lightning class. Throughout the code, there is a complex nested heirarchy that comes from the hydra config dictionary which is not ideal. I propoe that we use a heirarchical structure based on dataclasses and inheritance to separate out some of this configuration information to make the code a bit cleaner and bit more explicit in how to access certain data related to training. Ultimately, my hope is that it will lead to more configurable code for scientific experimentation where we track changes to configurations rather than changes to the underlying code.

One way of helping reduce the number of lines of code, and factor out the config in a more logicla sense is to use hydra's instantiate feature, where you can instantiate Python objects based on a __target__: ClimatExML.object.class style header in the yaml file. https://hydra.cc/docs/advanced/instantiate_objects/overview/

Basically what I'm imaginging is to factor out some of the initalization code in wgan-gp.py to inherit logical groups of parameters like:

@dataclass
class HyperParams:
    batch_sice: int = None
    beta: float = 0.1
    alpha: float = 10.0
    gamma: float = 5.0

which is inherited by

class SuperResolutionWGANGP(pl.LightningModule, HyperParams):
    ...
    instantiate(hyperparams)

Or something. I'm not sure if that synax will work and where exactly to instantiate the object, but something like this will work.

Implement an HR topography input in ClimatExML

@kdaust has made a ton of progress using HR topography in the model architecture. It would be really nice to implement this in ClimatExML. We should use this issue to track/discuss design ideas for how exactly to implement it.

At a high level, I'm thinking we can specify some HR topography file that has been processed as a .pt file like what you would get from ClimatExPrep. We can then specify the location of that file in the config.yaml file and load somewhere in the pipeline.

It would make sense to load this file perhaps in the loader.py portion of the pipeline. Although, we should be careful that we aren't loading the same file over and over again. At a first glance, this might be the right function to add it to:

def setup(self, stage: str):
.

Trigger validation steps before training begins

Currently MLFlow logs all runs, even the failed runs. This is useful if we are part way through training and something fails, but not useful for all of the bugs/user errors that accompany doing science research.

I therefore recommend that we do some checks before starting training to try and catch as many bugs as possible, possibly using pydantic validators to automatically perform checks on the inputs: https://docs.pydantic.dev/latest/concepts/validators/

Documentation Sprint!

Write your suggestions or notes for things you want to cover during our next sprint. I'll start:

  • Work out kinks in docs
  • Complete docs for each existing empty category

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.