Comments (5)
See https://daops.readthedocs.io/en/latest/readme.html for ideas?
from xmip.
@jbusecke I think your file/dataset/member/experiment approach makes sense for most use cases. I tend to think of CMIP6 as a multi-dimensional hypercube of facets, e.g. MIP, Institute, Model, Experiment, EnsembleMember etc.
I think that there are some slices through the hypercube that will not be covered by your approach. An extension over multiple experiments that can be logically aggregated. Because of that, we wanted to generalise our approach in daops
so that everything was reduced to a simple dataset ID string. This has its pros and cons. The main pro is that it avoids the software knowing about any complex hierarchies and relationships. The main con is that you need one "fix" record per ESGF Dataset - which means you need a big data store to hold them. I'm still not sure what the best approach is.
(Part of me has been thinking a lot about STAC/Intake/Xarray/others in terms of aggregations and hierarchies...in some cases, there is value in recording information at the granular (e.g. file) level and in others it is much more useful/efficient to look at higher order aggregations - unfortunately, we seem to need both at the same time...and the ultimate software will transition between them seamlessly).
from xmip.
Thanks for the input @agstephens, indeed I myself have been switching back and forth between processing a full experiment and single members. I think ultimately I favor the more fine-grained approach here, purely for computational reasons related to xarray/dask (they regularly blow up in my processing pipeline, even with just a single member).
I think as a short-term goal I will try to migrate some of my more fine-grained fixes to the daops
framework to get acquainted with it and then we should see how to migrate the other features. That will take some time since my current approach is rather 'check and process if necessary' for every dataset, whereas with a daops
based approach this would be separated into a scan and parse
step or manual entry that is carried out before the datasets are actually modified, if I understand correctly.
Let me actually start with some easy metadata corrections. Ill start an issue over at daops for that.
from xmip.
Hey everyone, I did not make progress on the daops issue, but I had some need to refactor my postprocessing
(parsing members, combining different types of datasets (multiple variables, grid_labels etc; will be published soon). While thinking about this, I tried to visualize a typical scientific workflow with cmip6 data and where the various parts fit in.
Made a little diagram, that I want to add to the docs.
I will also add an actual tutorial with the new postprocessing functions, but I was just wondering if I could solicit some feedback on the chart over here 😁
from xmip.
I have added a modified version of this graph and some text in the docs. I will close this issue for now.
from xmip.
Related Issues (20)
- Drop Python 3.7
- Missing dependency for cf-xarray
- Docs build broken
- use datatree instead of dictionary of datasets HOT 2
- Pint issue for undecoded times HOT 4
- Change license badge
- manually changing dataframe for catalog HOT 3
- Change license type in feedstock
- CI failing due to ESMF import error HOT 1
- Construct 'member_id' as part of the preprocessing
- XMIP Initial Reprocessing Does Not Work as Expected
- `replace_x_y_nominal_lat_lon` does not work for > 360 `lon` coordinates HOT 2
- `longitude` and `latitude` dimensions lost in `rename_cmip6`
- CI is failing due to upstream error in xarrrayutils HOT 1
- Eliminate `xarrayutils` dependency
- Make `_drop_duplicate_grid_labels` public
- Renaming dimensions 'i'/'j' not working properly HOT 1
- Add convenience function to apply Hausfather et al. 2022 'screening' methodology
- correct_units fails on CMIP6 historical tos data HOT 15
- "TypeError: 'ABCMeta' object is not subscriptable " when importing xmip.preprocessing HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xmip.