Code Monkey home page Code Monkey logo

Comments (5)

dcherian avatar dcherian commented on May 28, 2024

See https://daops.readthedocs.io/en/latest/readme.html for ideas?

from xmip.

agstephens avatar agstephens commented on May 28, 2024

@jbusecke I think your file/dataset/member/experiment approach makes sense for most use cases. I tend to think of CMIP6 as a multi-dimensional hypercube of facets, e.g. MIP, Institute, Model, Experiment, EnsembleMember etc.

I think that there are some slices through the hypercube that will not be covered by your approach. An extension over multiple experiments that can be logically aggregated. Because of that, we wanted to generalise our approach in daops so that everything was reduced to a simple dataset ID string. This has its pros and cons. The main pro is that it avoids the software knowing about any complex hierarchies and relationships. The main con is that you need one "fix" record per ESGF Dataset - which means you need a big data store to hold them. I'm still not sure what the best approach is.

(Part of me has been thinking a lot about STAC/Intake/Xarray/others in terms of aggregations and hierarchies...in some cases, there is value in recording information at the granular (e.g. file) level and in others it is much more useful/efficient to look at higher order aggregations - unfortunately, we seem to need both at the same time...and the ultimate software will transition between them seamlessly).

from xmip.

jbusecke avatar jbusecke commented on May 28, 2024

Thanks for the input @agstephens, indeed I myself have been switching back and forth between processing a full experiment and single members. I think ultimately I favor the more fine-grained approach here, purely for computational reasons related to xarray/dask (they regularly blow up in my processing pipeline, even with just a single member).

I think as a short-term goal I will try to migrate some of my more fine-grained fixes to the daops framework to get acquainted with it and then we should see how to migrate the other features. That will take some time since my current approach is rather 'check and process if necessary' for every dataset, whereas with a daops based approach this would be separated into a scan and parse step or manual entry that is carried out before the datasets are actually modified, if I understand correctly.

Let me actually start with some easy metadata corrections. Ill start an issue over at daops for that.

from xmip.

jbusecke avatar jbusecke commented on May 28, 2024

Hey everyone, I did not make progress on the daops issue, but I had some need to refactor my postprocessing (parsing members, combining different types of datasets (multiple variables, grid_labels etc; will be published soon). While thinking about this, I tried to visualize a typical scientific workflow with cmip6 data and where the various parts fit in.

Made a little diagram, that I want to add to the docs.

CMIP6 Workflow-2

I will also add an actual tutorial with the new postprocessing functions, but I was just wondering if I could solicit some feedback on the chart over here 😁

from xmip.

jbusecke avatar jbusecke commented on May 28, 2024

I have added a modified version of this graph and some text in the docs. I will close this issue for now.

from xmip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.