<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="77

Very impressive <a class="user-mention notranslate" data-hovercard-type="user" data-ho

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the update <a class="user-mention notranslate" data-hovercard-type="user" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Do we dare to test the full CMIP6 cloud archive? about xmip HOT 14 CLOSED

jbusecke commented on May 28, 2024

Do we dare to test the full CMIP6 cloud archive?

from xmip.

Comments (14)

dcherian commented on May 28, 2024 1

Very impressive @jbusecke. I say DO IT! its easy to set up a cron job...

from xmip.

naomi-henderson commented on May 28, 2024 1

@jbusecke, you should be seeing MANY fewer experiment_ids as I move to the new naming structure. The csv file referred to in your .json file ONLY HAS THE ORIGINAL dataset URLs. They are all available in the csv file referred to in the *-testing.json. The old datasets are being copied with a new naming structure and then the old datasets are removed. I have left the 'CMIP' and 'ScenarioMIP' datasets for last since they will take many weeks to move.

So try out the *-testing.json - if it works then I will point the original .json file to the new csv file which can use both old and new style URLs and will include the whole CMIP6 collection again.

from xmip.

jbusecke commented on May 28, 2024

OK I have just merged a new workflow with slightly expanded parameters in #78 . Will check how it goes and then expand further.

from xmip.

jbusecke commented on May 28, 2024

Ok so it seems that I will have to add a lot of exceptions for this, but that's not too hard. Ill try to do that at the end of the day when the brain is empty.

from xmip.

naomi-henderson commented on May 28, 2024

@jbusecke , there is now a temporary version of the CMIP6 json/csv combo for testing the transition to the new naming system - with dataset names starting with gs://cmip6/CMIP6 and ending with version in the path name. So the catalog is identical to the old (all the same columns/keys) but some of the zstore URLs are now using the new naming convention.

Just to make sure nothing will break in intake-esm and cmip6_preprocessing, I have tested the old and new json files and both seem fine for this simple test. I am not sure what else to test before changing over to the new URLs and the new catalog and removing the data with old object names. Suggestions? If you have a chance, you could also try pangeo-cmip6-testing.json in a few of your notebooks. I plan to start deleting the datasets with old names this week.

col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6-testing.json")
#col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")

exper = 'piSST-pdSIC' 
#exper = 'historical'

query = dict(variable_id=['tas'],
             experiment_id = exper,
             table_id=['Amon'], 
             source_id=['NorESM2-LM'],
             grid_label=['gn'],
            )
cat = col.search(**query)
display(cat.df)

data_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True, 'decode_times':False}, aggregate=True)

from xmip.

jbusecke commented on May 28, 2024

Thanks for the update @naomi-henderson.

As far as I can see, this example does not actually use cmip6_preprocessing. You would have to do something like this:

from cmip6_preprocessing.preprocessing import combined_preprocessing

...

data_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True, 'decode_times':False}, aggregate=True, preprocess=combined_preprocessing)

to activate it. But I think for best results you would want to use the upstream master of cmip6_pp...it might just be easier to do the testing from over here?

from xmip.

naomi-henderson commented on May 28, 2024

@jbusecke, Sorry, I did not mean to imply that I was testing cmip6_processing directly, just making sure that intake-esm was working since the preprocessing requires it. It would really be great if you could switch to the new json file and test from there.

from xmip.

jbusecke commented on May 28, 2024

Ah ok, sorry for the misunderstanding. Ill try to implement that today, might have to wait until tomorrow...

from xmip.

jbusecke commented on May 28, 2024

@naomi-henderson, what do you envision the naming structure to be like? Will "https://storage.googleapis.com/cmip6/pangeo-cmip6-testing.json" just be a temporary location? And "https://storage.googleapis.com/cmip6/pangeo-cmip6.json" stays the main catalog (with replaced zstores)?

from xmip.

naomi-henderson commented on May 28, 2024

Ah, it really has less to do with the json file than to the link to a *.csv file within. We will stick to "https://storage.googleapis.com/cmip6/pangeo-cmip6.json" in the future as the name of the main intake-esm file.

Actually, I don't really think of the json file as a main catalog - since it is just for one interface to the data and not one I actually use much - but I understand what you mean. The *.csv file which is referenced in the .json file is what is undergoing a big change and will have the zstores replaced. I am thinking that, unless you have somehow used the actual structure of the zstore URLs to determine your preprocessing rules, then everything should still work in your cmip6_preprocessing. But it would be useful to check out the *-testing version to make sure.

from xmip.

jbusecke commented on May 28, 2024

Alright, it seems a full sweep is still a bit out of reach. It seems that the limit for a single workflow is dispatching 256 jobs (in the current config a job is running a single combo of grid_label, experiment_id, and variable_id over all models). That is still quite a lot haha.
I have dialed back the amount of variables and grid labels in my matrix...lets see how that goes.

from xmip.

jbusecke commented on May 28, 2024

There are still some kinks to be worked out, but I will have to focus on that paper review that is already way to late 😱. So I might have to postpone the results to tomorrow.

from xmip.

jbusecke commented on May 28, 2024

@naomi-henderson I just merged a much expanded testing suite for the cloud data in #89. I ran every one of my tests here on the 'main' and the 'testing' catalog and have found nothing out of order!!! 🍾

I had to catalog a bunch of exceptions (see example here), which can serve as reminder of which datasets are not working with cmip6_pp yet. I plan on addressing these issues piece by piece in the next weeks, but I thought this might be of interest for you too.

from xmip.

jbusecke commented on May 28, 2024

As a summary (before I close this via #89 ): I am testing a much wider range of experiments and variables in a weekly/manual workflow. This is not getting all the files, but we could easily expand this if e.g. problems with certain variables are raised by users.

from xmip.

Do we dare to test the full CMIP6 cloud archive? about xmip HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent