I have been thinking about a major restructuring of the package which would go along w

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the input <a class="user-mention notranslate" data-hovercard-type="user" da

Better structure for different 'levels' of preprocessing. about xmip HOT 5 CLOSED

jbusecke commented on May 28, 2024

Better structure for different 'levels' of preprocessing.

from xmip.

Comments (5)

dcherian commented on May 28, 2024

See https://daops.readthedocs.io/en/latest/readme.html for ideas?

from xmip.

agstephens commented on May 28, 2024

@jbusecke I think your file/dataset/member/experiment approach makes sense for most use cases. I tend to think of CMIP6 as a multi-dimensional hypercube of facets, e.g. MIP, Institute, Model, Experiment, EnsembleMember etc.

I think that there are some slices through the hypercube that will not be covered by your approach. An extension over multiple experiments that can be logically aggregated. Because of that, we wanted to generalise our approach in daops so that everything was reduced to a simple dataset ID string. This has its pros and cons. The main pro is that it avoids the software knowing about any complex hierarchies and relationships. The main con is that you need one "fix" record per ESGF Dataset - which means you need a big data store to hold them. I'm still not sure what the best approach is.

(Part of me has been thinking a lot about STAC/Intake/Xarray/others in terms of aggregations and hierarchies...in some cases, there is value in recording information at the granular (e.g. file) level and in others it is much more useful/efficient to look at higher order aggregations - unfortunately, we seem to need both at the same time...and the ultimate software will transition between them seamlessly).

from xmip.

jbusecke commented on May 28, 2024

Thanks for the input @agstephens, indeed I myself have been switching back and forth between processing a full experiment and single members. I think ultimately I favor the more fine-grained approach here, purely for computational reasons related to xarray/dask (they regularly blow up in my processing pipeline, even with just a single member).

I think as a short-term goal I will try to migrate some of my more fine-grained fixes to the daops framework to get acquainted with it and then we should see how to migrate the other features. That will take some time since my current approach is rather 'check and process if necessary' for every dataset, whereas with a daops based approach this would be separated into a scan and parse step or manual entry that is carried out before the datasets are actually modified, if I understand correctly.

Let me actually start with some easy metadata corrections. Ill start an issue over at daops for that.

from xmip.

jbusecke commented on May 28, 2024

Hey everyone, I did not make progress on the daops issue, but I had some need to refactor my postprocessing (parsing members, combining different types of datasets (multiple variables, grid_labels etc; will be published soon). While thinking about this, I tried to visualize a typical scientific workflow with cmip6 data and where the various parts fit in.

Made a little diagram, that I want to add to the docs.

I will also add an actual tutorial with the new postprocessing functions, but I was just wondering if I could solicit some feedback on the chart over here 😁

from xmip.

jbusecke commented on May 28, 2024

I have added a modified version of this graph and some text in the docs. I will close this issue for now.

from xmip.

Better structure for different 'levels' of preprocessing. about xmip HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent