Comments (14)
Very impressive @jbusecke. I say DO IT! its easy to set up a cron job...
from xmip.
@jbusecke, you should be seeing MANY fewer experiment_ids as I move to the new naming structure. The csv file referred to in your .json file ONLY HAS THE ORIGINAL dataset URLs. They are all available in the csv file referred to in the *-testing.json. The old datasets are being copied with a new naming structure and then the old datasets are removed. I have left the 'CMIP' and 'ScenarioMIP' datasets for last since they will take many weeks to move.
So try out the *-testing.json - if it works then I will point the original .json file to the new csv file which can use both old and new style URLs and will include the whole CMIP6 collection again.
from xmip.
OK I have just merged a new workflow with slightly expanded parameters in #78 . Will check how it goes and then expand further.
from xmip.
Ok so it seems that I will have to add a lot of exceptions for this, but that's not too hard. Ill try to do that at the end of the day when the brain is empty.
from xmip.
@jbusecke , there is now a temporary version of the CMIP6 json/csv combo for testing the transition to the new naming system - with dataset names starting with gs://cmip6/CMIP6
and ending with version
in the path name. So the catalog is identical to the old (all the same columns/keys) but some of the zstore
URLs are now using the new naming convention.
Just to make sure nothing will break in intake-esm
and cmip6_preprocessing
, I have tested the old and new json files and both seem fine for this simple test. I am not sure what else to test before changing over to the new URLs and the new catalog and removing the data with old object names. Suggestions? If you have a chance, you could also try pangeo-cmip6-testing.json
in a few of your notebooks. I plan to start deleting the datasets with old names this week.
col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6-testing.json")
#col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")
exper = 'piSST-pdSIC'
#exper = 'historical'
query = dict(variable_id=['tas'],
experiment_id = exper,
table_id=['Amon'],
source_id=['NorESM2-LM'],
grid_label=['gn'],
)
cat = col.search(**query)
display(cat.df)
data_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True, 'decode_times':False}, aggregate=True)
from xmip.
Thanks for the update @naomi-henderson.
As far as I can see, this example does not actually use cmip6_preprocessing. You would have to do something like this:
from cmip6_preprocessing.preprocessing import combined_preprocessing
...
data_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True, 'decode_times':False}, aggregate=True, preprocess=combined_preprocessing)
to activate it. But I think for best results you would want to use the upstream master of cmip6_pp...it might just be easier to do the testing from over here?
from xmip.
@jbusecke, Sorry, I did not mean to imply that I was testing cmip6_processing
directly, just making sure that intake-esm
was working since the preprocessing requires it. It would really be great if you could switch to the new json file and test from there.
from xmip.
Ah ok, sorry for the misunderstanding. Ill try to implement that today, might have to wait until tomorrow...
from xmip.
@naomi-henderson, what do you envision the naming structure to be like? Will "https://storage.googleapis.com/cmip6/pangeo-cmip6-testing.json"
just be a temporary location? And "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
stays the main catalog (with replaced zstores)?
from xmip.
Ah, it really has less to do with the json file than to the link to a *.csv file within. We will stick to "https://storage.googleapis.com/cmip6/pangeo-cmip6.json" in the future as the name of the main intake-esm
file.
Actually, I don't really think of the json file as a main catalog - since it is just for one interface to the data and not one I actually use much - but I understand what you mean. The *.csv file which is referenced in the .json file is what is undergoing a big change and will have the zstores replaced. I am thinking that, unless you have somehow used the actual structure of the zstore URLs to determine your preprocessing rules, then everything should still work in your cmip6_preprocessing
. But it would be useful to check out the *-testing
version to make sure.
from xmip.
Alright, it seems a full sweep is still a bit out of reach. It seems that the limit for a single workflow is dispatching 256 jobs (in the current config a job is running a single combo of grid_label, experiment_id, and variable_id over all models). That is still quite a lot haha.
I have dialed back the amount of variables and grid labels in my matrix...lets see how that goes.
from xmip.
There are still some kinks to be worked out, but I will have to focus on that paper review that is already way to late 😱. So I might have to postpone the results to tomorrow.
from xmip.
@naomi-henderson I just merged a much expanded testing suite for the cloud data in #89. I ran every one of my tests here on the 'main' and the 'testing' catalog and have found nothing out of order!!! 🍾
I had to catalog a bunch of exceptions (see example here), which can serve as reminder of which datasets are not working with cmip6_pp yet. I plan on addressing these issues piece by piece in the next weeks, but I thought this might be of interest for you too.
from xmip.
As a summary (before I close this via #89 ): I am testing a much wider range of experiments and variables in a weekly/manual workflow. This is not getting all the files, but we could easily expand this if e.g. problems with certain variables are raised by users.
from xmip.
Related Issues (20)
- Installation docs mention xgcm HOT 1
- The stripe emerged after masking the ERSSTv5 SST data HOT 12
- Using `interpolate_grid_label()` to regrid data HOT 4
- Trimming grid halo as part of the preprocessing HOT 1
- Drop Python 3.7
- Missing dependency for cf-xarray
- Docs build broken
- use datatree instead of dictionary of datasets HOT 2
- Pint issue for undecoded times HOT 4
- Change license badge
- manually changing dataframe for catalog HOT 3
- Change license type in feedstock
- CI failing due to ESMF import error HOT 1
- Construct 'member_id' as part of the preprocessing
- XMIP Initial Reprocessing Does Not Work as Expected
- `replace_x_y_nominal_lat_lon` does not work for > 360 `lon` coordinates HOT 2
- `longitude` and `latitude` dimensions lost in `rename_cmip6`
- CI is failing due to upstream error in xarrrayutils HOT 1
- Eliminate `xarrayutils` dependency
- Make `_drop_duplicate_grid_labels` public
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xmip.