Code Monkey home page Code Monkey logo

Comments (12)

jetesdal avatar jetesdal commented on May 24, 2024 1

Ah, that's good to know. I will switch to https://storage.googleapis.com/cmip6/pangeo-cmip6.json and I will report any issues in https://github.com/pangeo-forge/cmip6-pipeline. Thanks, @jbusecke and @naomi-henderson!

from xmip.

jbusecke avatar jbusecke commented on May 24, 2024

Yikes, that looks like a nasty bug. Could you tell me a bit more about the version you are using? Did you install from conda/pip or from source?

from xmip.

jetesdal avatar jetesdal commented on May 24, 2024

Thanks, @jbusecke for the quick response! I think I am using the latest version from Github. I used the command
pip install git+https://github.com/jbusecke/cmip6_preprocessing.git --upgrade

Also, I checked the following:

In [1]: import cmip6_preprocessing
   ...: cmip6_preprocessing.__version__
Out[1]: '0.1.5.dev319+g20e3868.d20210215'

from xmip.

jbusecke avatar jbusecke commented on May 24, 2024

I assume this is on the pangeo google cloud deployment?

Could you paste the full code (including the catalog URL you used) here? Ill see what is going on there.

from xmip.

jetesdal avatar jetesdal commented on May 24, 2024

Sure. I followed the steps described in intake-esm tutorial and I use the same URL:

url = 'https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json'
col = intake.open_esm_datastore(url)

You can find a notebook with the relevant code here. Currently, I have found three models with the issue, but I only looked at ~10 models (out of 53).

from xmip.

jbusecke avatar jbusecke commented on May 24, 2024

Awesome. I think this is caused by the reordering of longitudes, which has caused me all kinds of trouble. I am actually thinking of getting rid of that functionality altogether (#94). Checking this now.

from xmip.

jbusecke avatar jbusecke commented on May 24, 2024

Ok I was able to reproduce the error and it seems indeed related to the longitude ordering.

Here is a quick workaround while I try to fix that bug:

### 'HadGEM3-GC31-MM'
from cmip6_preprocessing.preprocessing import (
    rename_cmip6, 
    promote_empty_dims, 
    correct_coordinates, 
    correct_lon, 
    correct_units, 
    broadcast_lonlat,
    parse_lon_lat_bounds,
    sort_vertex_order,
    maybe_convert_bounds_to_vertex, 
    maybe_convert_vertex_to_bounds,
)
    
def modified_preprocessing(ds):
    ds = ds.copy()
    # fix naming
    ds = rename_cmip6(ds)
    # promote empty dims to actual coordinates
    ds = promote_empty_dims(ds)
    # demote coordinates from data_variables
    ds = correct_coordinates(ds)
    # broadcast lon/lat
    ds = broadcast_lonlat(ds)
    # shift all lons to consistent 0-360
    ds = correct_lon(ds)
    # fix the units
    ds = correct_units(ds)
    # replace x,y with nominal lon,lat
#     ds = replace_x_y_nominal_lat_lon(ds)
    # rename the `bounds` according to their style (bound or vertex)
    ds = parse_lon_lat_bounds(ds)
    # sort verticies in a consistent manner
    ds = sort_vertex_order(ds)
    # convert vertex into bounds and vice versa, so both are available
    ds = maybe_convert_bounds_to_vertex(ds)
    ds = maybe_convert_vertex_to_bounds(ds)
    return ds

for si in ['HadGEM3-GC31-MM', 'CMCC-ESM2', 'CMCC-CM2-HR4']:
    cat = col.search(activity_id='CMIP', grid_label='gn', source_id=si, variable_id=['areacello'])

    fig, axs = plt.subplots(ncols=2, constrained_layout=True, figsize=(20,6))

    # without combined_preprocessing
    ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':True})
    ddict[next(iter(ddict))].areacello[0].plot(ax=axs[0])

    # with combined_preprocessing
    ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':True},
                                preprocess=modified_preprocessing)
    ddict[next(iter(ddict))].areacello[0].plot(ax=axs[1])

    plt.show()

Let me know if that works for you.

from xmip.

jetesdal avatar jetesdal commented on May 24, 2024

I think that works for me. Reordering of longitudes is indeed very useful but might not be essential for my analysis. Thanks a lot for looking into it so quickly!

from xmip.

jetesdal avatar jetesdal commented on May 24, 2024

A follow-up question that I'm just going to ask here (even though it is probably not the right place): I am seeing faulty data across various CMIP6 datasets obtained from
'https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json'
Those erroneous data are not related to using combined_preprocessing but must be in the underlying dataset or introduced when downloading the data. I'm not sure where I should report these issues. Is cmip6_preprocessing the right place?

from xmip.

jbusecke avatar jbusecke commented on May 24, 2024

I am actually not sure that is the most up to date catalog. @naomi-henderson has recently refactored a lot of the cloud data.

Can you try:

import intake
col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")
col

and see if the problems persist?

Otherwise I think here is always a good spot to report but https://github.com/pangeo-forge/cmip6-pipeline might be the even more appropriate spot? @naomi-henderson, are there official guidelines for reporting on the new catalog?

from xmip.

naomi-henderson avatar naomi-henderson commented on May 24, 2024

Hmmm, I am still trying to understand why the very old NCAR version of the Pangeo CMIP6 Google Cloud's JSON file is still being used. They have a JSON file for their own collection at NCAR, but anyone using the GC collection should use the JSON file in GC. Yes, @jbusecke, your link to https://storage.googleapis.com/cmip6/pangeo-cmip6.json is correct.

The re-organization of the GC version is now complete. If you are still having trouble, please report here: https://github.com/pangeo-forge/cmip6-pipeline

The AWS copy might still be out of sync for a few more days.

from xmip.

jbusecke avatar jbusecke commented on May 24, 2024

Hmmm, I am still trying to understand why the very old NCAR version of the Pangeo CMIP6 Google Cloud's JSON file is still being used. They have a JSON file for their own collection at NCAR, but anyone using the GC collection should use the JSON file in GC. Yes, @jbusecke, your link to https://storage.googleapis.com/cmip6/pangeo-cmip6.json is correct.

Probably partially my fault, since I put that one into the cmip6-preprocessing readme back at the cmip6-hackathon. I have to thoroughly refactor the docs and make it really clear that people need to switch!

from xmip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.