ecmwf / climetlab Goto Github PK

Python package for easy access to weather and climate data

License: Apache License 2.0

Python 78.73% CSS 0.01% JavaScript 0.08% Jinja 0.06% Shell 0.07% Jupyter Notebook 21.05%

climetlab's Introduction

CliMetLab

CliMetLab is a Python package aiming at simplifying access to climate and meteorological datasets, allowing users to focus on science instead of technical issues such as data access and data formats. It is mostly intended to be used in Jupyter notebooks, and be interoperable with all popular data analytic packages, such as Numpy, Pandas, Xarray, SciPy, Matplotlib, etc. as well as machine learning frameworks, such as Tensorflow, Keras or PyTorch.

The documentation can be found at https://climetlab.readthedocs.io/.

Plugins

See https://climetlab.readthedocs.io/en/latest/guide/pluginlist.html.

License

Apache License 2.0 In applying this licence, ECMWF does not waive the privileges and immunities granted to it by virtue of its status as an intergovernmental organisation nor does it submit to any jurisdiction.

climetlab's People

Contributors

Stargazers

Watchers

Forkers

gaseous sylvielamythepaut pmaciel aoyono iamasam hades-rp2010 rragrawal newharmony aaronspring vidurmithal anastasialobanova bitanb1999 ecmwfcode4earth mmbateni malzubairy rosafilgueira mchantry truthteja mikewcasale krishnaap tnipen iacopoff ishaankochhar imarcello cycle13 dingxinjun emadehsan jodemaey chrystali2002 junjie2008v isabella232 trackow eddycmwf itxasooderiz thebaptiste martinez-sanchez marika-tatishvili neivnova jorgejesus fuxingwang2018 aganbal floriankrb fitsumw longtsing spritan peterhamfelt vadicloss uzhankocaman sachin9390 maximepolis openweatherai myherbdev akeeman agstephens

climetlab's Issues

Dataset registering change ?

Is it possible that with version 0.10.4, the way climetlab registers the local datasets has changed?
I explain: I am using it on a jupyter hub and sharing conda environment amongst users with nb_conda_kernels. I have the Eumetnet plugin installed but a local editable version in an environment that is shared with the other users. As the admin, I can use my own environment and climetlab find the local plugin/dataset, but the other users can't and for them it tries to connect to github to find the list of plugin available but don't find the Eumetnet one because it is not yet on this list, so this result in an error.

This was not the case with version 0.9.8, so I wonder if something changed? That could also be an error of mine dealing with the environments, but if you could confirm or infirm that would be great.

Thank you in advance,

Jonathan

CliMetLab documentation structure

Hi @floriankrb , would appreciate your thoughts about the structure of CliMetLab documentation and how it can possibly be improved

Having Developer Guide > Datasets page when User Guide > Datasets already exists would probably be redundant. Same can be said about Data sources page. Having a single Datasets page describing all the dataset related functionality (available to both User and Developer) might help maintain a single source of truth.
Also, I feel that having separate User Guide,Contributor Guide and Developer Guide could be a little confusing and sometimes repetitive?
I think that the hierarchy of the documentation would be much simpler and easier to navigate if it is organized with respect to topic (e.g. Plugin, Datasets, Data sources) instead of persona (User > Datasets, Developer > Data Sources, Contributor > Plugins).

install as many dependencies as possible with conda, then pip: a small example please ?

https://climetlab.readthedocs.io/en/latest/installing.html

we recommend installing as many ❓ dependencies as possible with conda, then install CliMetLab with pip

Before I bore you with logs from conda install / pip, could anyone give an example of a small conda install
e.g. cfgrib eccodes
after which pip install climetlab works ?

https://xkcd.com/1987

Doing `pip install climetlab` does not install s3fs which is required to load zarr sources

Trying to load zarr sources after a simple pip install gives me the following error:

          
E           File /opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/climetlab/sources/zarr.py:14
E                11 import os
E                12 from urllib.parse import urlparse
E           ---> 14 import s3fs
E                15 import xarray as xr
E                16 import zarr
E           
E           ModuleNotFoundError: No module named 's3fs'
E           ModuleNotFoundError: No module named 's3fs'

and I see that s3fs & zarr were removed from the setup.py file. Does it mean that climetlab is not supporting zarr at the moment?

climetlab should not require branca==0.3.1

https://github.com/python-visualization/branca/issues/81 is now closed
So climetlab should probably not require anymore branca==0.3.1

`to_pandas` warning

Next happens both in my computer and in Colab:

URL = "https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.SP.list.v04r00.csv"
data = cml.load_source("url", URL)
pd = data.to_pandas()
...
DtypeWarning: Columns (1,2,8,9,14,161,162) have mixed types. Specify dtype option on import or set low_memory=False.
  return pandas.read_csv(self.path, **pandas_read_csv_kwargs)

where ... varies quite on the system and the message is not exatly the same (maybe due to the python/packages versions.)

Pip not able to install climetlab (v0.11.31) in windows with Python 3.7 : HDF5 headers not found

The title is quite explicit :p . I get this error in the tests for the release of a new version of the EUMETNET benchmark climetlab plugin. Is climetlab still supported in Python 3.7?

See the error log here:

Collecting climetlab
  Downloading climetlab-0.11.31.tar.gz (128 kB)
     -------------------------------------- 128.1/128.1 kB 3.8 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting numpy
  Downloading numpy-1.21.6-cp37-cp37m-win_amd64.whl (14.0 MB)
     --------------------------------------- 14.0/14.0 MB 46.7 MB/s eta 0:00:00
Collecting pandas
  Downloading pandas-1.3.5-cp37-cp37m-win_amd64.whl (10.0 MB)
     --------------------------------------- 10.0/10.0 MB 57.9 MB/s eta 0:00:00
Collecting xarray>=0.19.0
  Downloading xarray-0.20.2-py3-none-any.whl (845 kB)
     ------------------------------------- 845.2/845.2 kB 26.9 MB/s eta 0:00:00
Collecting requests
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
     ---------------------------------------- 62.8/62.8 kB 3.3 MB/s eta 0:00:00
Collecting dask
  Downloading dask-2022.2.0-py3-none-any.whl (1.1 MB)
     ---------------------------------------- 1.1/1.1 MB 22.2 MB/s eta 0:00:00
Collecting netcdf4
  Downloading netCDF4-1.6.0.tar.gz (774 kB)
     ------------------------------------- 774.2/774.2 kB 24.6 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  python setup.py egg_info did not run successfully.
  exit code: 1
  
  [25 lines of output]
  reading from setup.cfg...
  
      HDF5_DIR environment variable not set, checking some standard locations ..
  checking C:\Users\runneradmin\include ...
  hdf5 headers not found in C:\Users\runneradmin\include
  checking /usr/local\include ...
  hdf5 headers not found in /usr/local\include
  checking /sw\include ...
  hdf5 headers not found in /sw\include
  checking /opt\include ...
  hdf5 headers not found in /opt\include
  checking /opt/local\include ...
  hdf5 headers not found in /opt/local\include
  checking /opt/homebrew\include ...
  hdf5 headers not found in /opt/homebrew\include
  checking /usr\include ...
  hdf5 headers not found in /usr\include
  Traceback (most recent call last):
    File "<string>", line 36, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "C:\Users\RUNNER~1\AppData\Local\Temp\pip-install-wpzd3q9h\netcdf4_0e84f88a843e43dab815c3ef9955b375\setup.py", line 444, in <module>
      _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs)
    File "C:\Users\RUNNER~1\AppData\Local\Temp\pip-install-wpzd3q9h\netcdf4_0e84f88a843e43dab815c3ef9955b375\setup.py", line 385, in _populate_hdf5_info
Error:       raise ValueError('did not find HDF5 headers')
  ValueError: did not find HDF5 headers
  [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Encountered error while generating package metadata.

See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Error: Process completed with exit code 1.

Use of climetlab in centos6 environment

Hello

May be this is more a question than an issue...

I'm working on Metwork project. We are building rpms providing a bunch of recent releases of free libraries (mostly C/C++ and python3). These rpms can be installed on several linux distributions (including CentOS6) to provide libraries not available on the system. Our rpms are installed without changing anything to the libraries previously installed on the system. We provide a specific environment that user has to load for using Metwork libraries. So our CI system is building rpms on CentOS6 using recent compilers such as gcc-8 (thanks to devtoolset-8). In several months (2021 or 2022 I guess) we will probably drop CentOS6 support and move our CI system to Centos7, but for the time being we need to keep CentOS6 compatibility.

Recently one of our users asked us to add climetlab in Metwork (issue here). We have difficulties to do so because climetlab requires ecmwflibs and ecmwflibs is only provided as "manylinux2014" wheel which is not compatible with glibc 2.12 on CentOS6. It's all the more unfortunate that all ".so" libraries provided by ecmwflibs are available in Metwork (we built them from sources). Maybe we need to upgrade some of them but if so it is something we can deal with.

Do you have any suggestion or workaround (or maybe you never know ideas of changes in climetlab or ecmwflibs) to solve our problem ?

Thanks

`load_source` does not work in my system with `'url'` data source

Updated

When I try to execute the next instruction

ds = cml.load_source('url', 'https://github.com/ecmwf/climetlab/raw/main/docs/examples/test.grib')

in my computer (OS: Debian 11) I get the following output/ error message:

test.grib:   0%|          | 0.00/1.03k [00:00<?, ?B/s]
CliMetLab cache: trying to free 14.9 GiB
Deleting entry {
    "path": "/tmp/climetlab-iago/grib-index-213de367fa7e1865472d2aaf9a729f7894defcf8f4ced5a23622f24928f06e5d.json",
    "owner": "grib-index",
    "args": [
        "/tmp/climetlab-iago/url-15280dbd4547333ede9ffec63d6959450329b9c003a148969685679b82657cba.grib",
        1677685539.2853959,
        1677685539.2813954,
        1052,
        0
    ],
    "creation_date": "2023-03-01 16:45:40.403550",
    "flags": 0,
    "owner_data": null,
    "last_access": "2023-03-01 16:45:40.403550",
    "type": "file",
    "parent": null,
    "replaced": null,
    "extra": null,
    "expires": null,
    "accesses": 1,
    "size": 4
}
CliMetLab cache: deleting /tmp/climetlab-iago/grib-index-213de367fa7e1865472d2aaf9a729f7894defcf8f4ced5a23622f24928f06e5d.json (4)
CliMetLab cache: grib-index ["/tmp/climetlab-iago/url-15280dbd4547333ede9ffec63d6959450329b9c003a148969685679b82657cba.grib", 1677685539.2853959, 1677685539.2813954, 1052, 0]
CliMetLab cache: could not free 14.9 GiB
CliMetLab cache: trying to free 14.9 GiB
Deleting entry {
    "path": "/tmp/climetlab-iago/url-15280dbd4547333ede9ffec63d6959450329b9c003a148969685679b82657cba.grib",
    "owner": "url",
    "args": {
        "url": "https://github.com/ecmwf/climetlab/raw/main/docs/examples/test.grib",
        "parts": null
    },
    "creation_date": "2023-03-01 16:47:48.942739",
    "flags": 0,
    "owner_data": {
        "connection": "keep-alive",
        "content-length": "1052",
        "cache-control": "max-age=300",
        "content-security-policy": "default-src 'none'; style-src 'unsafe-inline'; sandbox",
        "content-type": "application/octet-stream",
        "etag": "W/\"2bd5b56b1c0727c2971a7d94f9c3f22c13a72f1d78388827fc1261b2a9530e42\"",
        "strict-transport-security": "max-age=31536000",
        "x-content-type-options": "nosniff",
        "x-frame-options": "deny",
        "x-xss-protection": "1; mode=block",
        "x-github-request-id": "206A:0F3A:12C19C7:140943B:63FF7322",
        "accept-ranges": "bytes",
        "date": "Wed, 01 Mar 2023 15:47:49 GMT",
        "via": "1.1 varnish",
        "x-served-by": "cache-mad22020-MAD",
        "x-cache": "HIT",
        "x-cache-hits": "1",
        "x-timer": "S1677685669.193550,VS0,VE1",
        "vary": "Authorization,Accept-Encoding,Origin",
        "access-control-allow-origin": "*",
        "x-fastly-request-id": "3408d1c8e5cf268f976443336503d5442163d118",
        "expires": "Wed, 01 Mar 2023 15:52:49 GMT",
        "source-age": "131"
    },
    "last_access": "2023-03-01 16:47:48.942739",
    "type": "file",
    "parent": null,
    "replaced": null,
    "extra": null,
    "expires": null,
    "accesses": 1,
    "size": 1052
}
CliMetLab cache: deleting /tmp/climetlab-iago/url-15280dbd4547333ede9ffec63d6959450329b9c003a148969685679b82657cba.grib (1 KiB)
CliMetLab cache: url {"url": "https://github.com/ecmwf/climetlab/raw/main/docs/examples/test.grib", "parts": null}
CliMetLab cache: could not free 14.9 GiB

And if I try

URL = "https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.SP.list.v04r00.csv"
data = cml.load_source("url", URL)

then I get

ibtracs.SP.list.v04r00.csv:   0%|          | 0.00/33.2M [00:00<?, ?B/s]
CliMetLab cache: trying to free 14.9 GiB
Deleting entry {
    "path": "/tmp/climetlab-iago/grib-index-4a7b3dd2dd0a13c559af337be1026033c5f30b222383355c46c7fe2bb36a2b73.json",
    "owner": "grib-index",
    "args": [
        "/tmp/climetlab-iago/url-15280dbd4547333ede9ffec63d6959450329b9c003a148969685679b82657cba.grib",
        1677685669.5551624,
        1677685669.551163,
        1052,
        0
    ],
    "creation_date": "2023-03-01 16:47:49.630884",
    "flags": 0,
    "owner_data": null,
    "last_access": "2023-03-01 16:47:49.630884",
    "type": "file",
    "parent": null,
    "replaced": null,
    "extra": null,
    "expires": null,
    "accesses": 1,
    "size": 4
}
CliMetLab cache: deleting /tmp/climetlab-iago/grib-index-4a7b3dd2dd0a13c559af337be1026033c5f30b222383355c46c7fe2bb36a2b73.json (4)
CliMetLab cache: grib-index ["/tmp/climetlab-iago/url-15280dbd4547333ede9ffec63d6959450329b9c003a148969685679b82657cba.grib", 1677685669.5551624, 1677685669.551163, 1052, 0]
CliMetLab cache: could not free 14.9 GiB

Further, using cml.load_source always produces the message CliMetLab cache: could not free 14.9 GiB

What may be the issues?

Thank you!

Merge strategy/options

Hi,

Can I suggest merging strategy/options (similar to what is possible with xr.merge) for to_xarray. I'm asking this because I got the following error while making a call to this routine with some of the forecast data missing some steps:

---------------------------------------------------------------------------
DatasetBuildError                         Traceback (most recent call last)
Input In [12], in <cell line: 1>()
      3 ds = cml.load_dataset(
      4     "eumetnet-postprocessing-benchmark-training-data-gridded-reforecasts-pressure",
      5     date=dates[0],
      6     parameter="all",
      7     level=level
      8 )
     10 # download of the ensemble forecast
---> 11 fcs = ds.to_xarray()
     12 new_fcs = fcs.isel(step=time_index, longitude=lon_index, latitude=lat_index)
     14 # creating a new coordinate for the hdate

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/climetlab_eumetnet_postprocessing_benchmark/gridded/training_data_forecasts.py:61, in TrainingDataForecast.to_xarray(self, **kwargs)
     60 def to_xarray(self, **kwargs):
---> 61     return self.source.to_xarray(xarray_open_dataset_kwargs={"backend_kwargs": {"ignore_keys": ["dataType"]}}, **kwargs)

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/climetlab/readers/grib/fieldset.py:233, in FieldSet.to_xarray(self, **kwargs)
    215     xarray_open_dataset_kwargs[key] = mix_kwargs(
    216         user=user_xarray_open_dataset_kwargs.pop(key, {}),
    217         default={"errors": "raise"},
   (...)
    220         logging_main_key=key,
    221     )
    222 xarray_open_dataset_kwargs.update(
    223     mix_kwargs(
    224         user=user_xarray_open_dataset_kwargs,
   (...)
    230     )
    231 )
--> 233 result = xr.open_dataset(
    234     FieldsetAdapter(self, ignore_keys=ignore_keys),
    235     **xarray_open_dataset_kwargs,
    236 )
    238 def math_prod(lst):
    239     if not hasattr(math, "prod"):
    240         # python 3.7 does not have math.prod

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/xarray/backends/api.py:495, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    483 decoders = _resolve_decoders_kwargs(
    484     decode_cf,
    485     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    491     decode_coords=decode_coords,
    492 )
    494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
    496     filename_or_obj,
    497     drop_variables=drop_variables,
    498     **decoders,
    499     **kwargs,
    500 )
    501 ds = _dataset_from_backend_dataset(
    502     backend_ds,
    503     filename_or_obj,
   (...)
    510     **kwargs,
    511 )
    512 return ds

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/cfgrib/xarray_plugin.py:100, in CfGribBackend.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, lock, indexpath, filter_by_keys, read_keys, encode_cf, squeeze, time_dims, errors, extra_coords)
     78 def open_dataset(
     79     self,
     80     filename_or_obj: T.Union[str, abc.MappingFieldset[T.Any, abc.Field]],
   (...)
     97     extra_coords: T.Dict[str, str] = {},
     98 ) -> xr.Dataset:
--> 100     store = CfGribDataStore(
    101         filename_or_obj,
    102         indexpath=indexpath,
    103         filter_by_keys=filter_by_keys,
    104         read_keys=read_keys,
    105         encode_cf=encode_cf,
    106         squeeze=squeeze,
    107         time_dims=time_dims,
    108         lock=lock,
    109         errors=errors,
    110         extra_coords=extra_coords,
    111     )
    112     with xr.core.utils.close_on_error(store):
    113         vars, attrs = store.load()  # type: ignore

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/cfgrib/xarray_plugin.py:40, in CfGribDataStore.__init__(self, filename, lock, **backend_kwargs)
     38 else:
     39     opener = dataset.open_fieldset
---> 40 self.ds = opener(filename, **backend_kwargs)

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/cfgrib/dataset.py:730, in open_fieldset(fieldset, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, computed_keys, log, **kwargs)
    728 index = messages.FieldsetIndex.from_fieldset(fieldset, index_keys, computed_keys)
    729 filtered_index = index.subindex(filter_by_keys)
--> 730 return open_from_index(filtered_index, read_keys, time_dims, extra_coords, **kwargs)

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/cfgrib/dataset.py:706, in open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)
    699 def open_from_index(
    700     index: abc.Index[T.Any, abc.Field],
    701     read_keys: T.Sequence[str] = (),
   (...)
    704     **kwargs: T.Any,
    705 ) -> Dataset:
--> 706     dimensions, variables, attributes, encoding = build_dataset_components(
    707         index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
    708     )
    709     return Dataset(dimensions, variables, attributes, encoding)

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/cfgrib/dataset.py:660, in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
    658     short_name = var_name
    659 try:
--> 660     dict_merge(variables, coord_vars)
    661     dict_merge(variables, {short_name: data_var})
    662     dict_merge(dimensions, dims)

File ~/fileserver/home/anaconda3/envs/climetlab/lib/python3.9/site-packages/cfgrib/dataset.py:591, in dict_merge(master, update)
    589     pass
    590 else:
--> 591     raise DatasetBuildError(
    592         "key present and new value is different: "
    593         "key=%r value=%r new_value=%r" % (key, master[key], value)
    594     )

DatasetBuildError: key present and new value is different: key='step' value=Variable(dimensions=('step',), data=array([  0.,  12.,  24.,  36.,  48.,  60.,  72.,  84.,  96., 108., 120.,
       132., 144., 156., 168., 180., 192., 204., 216., 228., 240.])) new_value=Variable(dimensions=('step',), data=array([  0.,   6.,  12.,  18.,  24.,  30.,  36.,  42.,  48.,  54.,  60.,
        66.,  72.,  78.,  84.,  90.,  96., 102., 108., 114., 120., 126.,
       132., 138., 144., 150., 156., 162., 168., 174., 180., 186., 192.,
       198., 204., 210., 216., 222., 228., 234., 240.]))

. With xr.merge, it is possible to merge these data together and the missing steps are filled with NaN. Actually it is my current workaround for this problem, manually merging the different forecast data, but I wonder if I could suggest to to_xarray to do it automatically?

Thank you in advance,

Jonathan

allow xr.open_mfdataset(**kwargs)

I would like to get more flexibility in how cached files are opened by xarray.

cds.to_xarray(chunks={'longitude':'auto', 'step': 7})

There seem to be many points where **kwargs could be supplied to.
Like here:

climetlab/climetlab/sources/readers/directory.py

Line 36 in 78a4c98

return xr.open_mfdataset(self._content, combine="by_coords")
climetlab/climetlab/sources/readers/netcdf.py

Line 174 in 78a4c98

xr.open_mfdataset(self.path, combine="by_coords")
climetlab/climetlab/sources/readers/netcdf.py

Line 263 in 78a4c98

def to_xarray(self):

I didnt open a PR yet, because there are so many occurrences of xr.open_mfdataset that we should discuss this first.

I hope to reduce the number of chunks and tasks in s2s-ai-competition datasets, by specifying chunks={'step':7, 'forecast_time':53}

Issue with TFRecord from URL

I'm building a plugin for CML using TFRecord files.
When I point the MultiURL method at https://storage.ecmwf.europeanweather.cloud/MAELSTROM_AP3/TFR/TripCloud0.0.tfrecord
the file is successfully downloaded and loaded.
When I point it at https://storage.ecmwf.europeanweather.cloud/MAELSTROM_AP3/TFR/TripCloud0.1.tfrecord
the call to cml.load_source(....) fails with

Unknown file type /hugetmp/tmp-climetlab/url-63428528b56eb9c50418511287b38d1c7c53e6e203671861fb71faf86e6cbb72.man (b'PP\x00\x00\x00\x00\x00\x00'), ignoring

Examining the cache directory, the first file has been downloaded as a .tfrecord file and the second has been downloaded as a .man file.
Thanks.
Mat

Support for seasonal forecasts

Do you plan to support also operations on C3S seasonal forecast data?

Load the dataset locally

Thank you for sharing the dataset. I have downloaded the .nc file on my own computer. So I want to know how to load the local dataset file when I run cml.load_dataset.

Magics library cound not be found

after installing climetlab with version 0.2.3, I tried to run demos, but can not get expected result.

`import climetlab as cml

data = cml.load_source(
"url",
"https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r00/access/csv/ibtracs.SP.list.v04r00.csv",
)

pd = data.to_pandas()
uma = pd[pd.NAME == "UMA:VELI"]
cml.plot_map(uma, style="cyclone-track")`

Changelog update

What maintenance does this project need?

Could you please update the changelog at each new release (it's unchanged since 9 months...)
Thanks

Organisation

Meteo-France

set cache not working if never cached

After upgrading to the new 0.7 climetlab, set("cache-directory", path) does not work anymore.
EDIT: maybe this has nothing to do with the new climetlab, but rather if nothing has been cached yet.
At least when nothing has been cached in the past.

Is that on purpose or if not, could you please add a small note to the documentation? https://climetlab.readthedocs.io/en/latest/reference/settings.html

import climetlab
# no caching before ever
cml.settings.set("cache-directory", ".")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-557cbc31a23e> in <module>
      1 # caching path for climetlab
      2 #cache_path = "/work/mh0727/m300524/S2S_AI/cache" # set your own path
----> 3 cml.settings.set("cache-directory", ".")

/opt/conda/lib/python3.8/site-packages/climetlab/core/settings.py in set(self, name, *args, **kwargs)
    134 
    135         self._settings[name] = value
--> 136         self._changed()
    137 
    138     def reset(self, name: str = None):

/opt/conda/lib/python3.8/site-packages/climetlab/core/settings.py in _changed(self)
    168         self._save()
    169         for cb in self._callbacks:
--> 170             cb()
    171 
    172     def on_change(self, callback: Callable[[], None]):

/opt/conda/lib/python3.8/site-packages/climetlab/decorators.py in wrapped(*args, **kwargs)
     35     def wrapped(*args, **kwargs):
     36         with LOCK:
---> 37             return func(*args, **kwargs)
     38 
     39     return wrapped

/opt/conda/lib/python3.8/site-packages/climetlab/core/caching.py in settings_changed()
     62 def settings_changed():
     63     global _connection
---> 64     if _connection.db is not None:
     65         _connection.db.close()
     66     _connection.db = None

AttributeError: '_thread._local' object has no attribute 'db'



import climetlab_s2s_ai_challenge
print(f'Climetlab version : {cml.__version__}')
print(f'Climetlab-s2s-ai-challenge plugin version : {climetlab_s2s_ai_challenge.__version__}')
Climetlab version : 0.7.0
Climetlab-s2s-ai-challenge plugin version : 0.6.2

No module named 'climetlab.ml' when running cml.load_source

Hi there,

I'm following along the MOOC course with ECMWF and I tried running the Jupyter notebook for the following line:

cml.load_source('file', 'test.grib')

I had a ModuleNotFoundError: No module named 'climetlab.ml'. I was running climetlab 0.13.10 (latest) and the same error occured both locally and on google collab.

Managed to get it running by downgrading it to climetlab-0.13.5.

Is this a bug from the latest version? Posting here to follow along any developments :)

Cheers.

Does climetlab replace cdsapi?

Thank you for making this package! This solves my biggest pain point with the cdsapi, the inability to load requested data into memory without saving to disk. Does ECMWF consider cdsapi deprecated and suggest using climetlab instead?

feature request: let user change cache folder

we spoke about this in the S2S AI call on thursday. would it be possible to let the user set the location of the downloaded files?
in case that location gets filled up? I thought it was /tmp but I cannot find the data there. I understand that the filesnames are hashed. its just that I dont want to spill a location on disk I cannot control.

Adding `cos_solar_zenith_angle_integrated` function from earthkit-meteo

Is your feature request related to a problem? Please describe.

It would be great to have the cos_solar_zenith_angle_integrated function from earthkit.meteo.solar included alongside the other relevant functions in climetlab/sources/constants.py. So that this was available for the constants in the anemoi-datasets zarr creation. 🙏

Describe the solution you'd like

Including cos_solar_zenith_angle_integrated function from earthkit.meteo.solar in climetlab/sources/constants.py. This function requires both a start and end date time to generate the integrated cos_sza. This would most likely be using the freq parameter from anemoi-datasets, so that start_date = date - freq and end_date = date.

Describe alternatives you've considered

No response

Additional context

I am creating some land surface datasets using anemoi-datasets and require some additional variables

Organisation

ECMWF

Collaborate with Pangeo and others in the scientific python community?

Greetings and thanks for taking the time and effort to create open source software for the meteorology / climate communities. I sincerely commend this effort! 👏 I am particularly in inspired by the stated goal of the climetlab package:

reduce boilerplate code by providing high-level unified access to meteorological and climate datasets, allowing scientists to focus on their research instead of solving technical issues.

I think it's safe to say that very many people in the software community share this goal. It has been a particular focus of the Pangeo project for many years. However, it is also a difficult goal, given the vast diversity of different data providers, catalogs, and data formats that we encounter in the wild. Therefore, I believe that collaboration is essential for achieving this goal.

In this spirit, I would like to invite the climetlab developers to collaborate with the Pangeo project and related python packages. There may be some ways we can combine efforts to deliver more effective software with a lower overall maintenance burden.

A primary possible area of collaboration would be to reduce duplication in functionality across the ecosystem. Reducing duplication is good because it:

Minimizes end user confusing by aligning with the Zen of Python mantra "There should be one-- and preferably only one --obvious way to do it."
Maximizes the value of developer time
Leads to better integration across the ecosystem

In that spirit, here are some existing packages that offer functionality similar to climetlab.

Intake

https://github.com/intake/intake

Intake is a lightweight set of tools for loading and sharing data in data science projects.
Intake helps you:

Load data from a variety of formats (see the current list of known plugins) into containers you already know, like Pandas dataframes, Python lists, NumPy arrays, and more.

Convert boilerplate data loading code into reusable Intake plugins

Describe data sets in catalog files for easy reuse and sharing between projects and with others.

Share catalog information (and data sets) over the network with the Intake server

Documentation is available at Read the Docs.

Status of intake and related packages is available at Status Dashboard

Weekly news about this repo and other related projects can be found on the
wiki

Intake is the main tool we currently use in Pangeo to provide "convenient" data access in Pangeo (usually via the intake-xarray plugin). Intake has similar goals but a different architecture to climetlab's data source feature. With intake, one creates a catalog yaml file (example), which specifies the data sources and options for loading the data.

For example, to load the grib example file from the climetlab docs, I would write a yaml file like this

catalog.yaml

plugins:
  source:
    - module: intake_xarray
sources:
  sample_grib_data:
    description: Sample GRIB file
    driver: netcdf
    args:
      urlpath: 'simplecache::https://github.com/ecmwf/climetlab/raw/main/docs/examples/test.grib'
      xarray_kwargs:
        engine: cfgrib

and then open it as an xarray dataset

import intake
cat = intake.open_catalog("catalog.yaml")
cat.sample_grib_data.to_dask()

Some other intake features that may be useful to this project:

Intake also has a rich templating system, allowing users to parametrize their data sources. I noticed that climetlab implements similar functionality.
The intake-xarray module supports combining / merging multiple files into a single dataset. I noticed climetlab implements similar functionality.
Intake uses a plugin architecture, similar to climetlab, to allow third parties to extend it to support different data formats and drivers. (See the plugin status dashboard for an exhaustive list)
Intake integrates with THREDDS, a highly established catalog format in the weather / climate world, via intake-thredds
Intake integrates with STAC (Spatiotemporal Asset Catalog) via intake-stac, an emerging catalog format that is being used heavily in the geospatial imaging world
Through its integration with filesystem-spec, intake can access data via a huge range of different transfer protocols

In terms of architecture:

Intake defines catalogs via yaml files. To expose new datasets to intake, third parties just put a yaml file online somewhere.
Climetlab defines catalogs via python code. There are some built in datasets which are hard coded into the core package. To expose new datasets to climetlab, third parties must write a python package that implements an entry point, publish it, and then have users install / import it. (Question: what are the criteria for including a dataset in the "core" package as opposed to a third party entrypoint?)

My unsolicited opinion is that the climetlab approach--writing python code for each new dataset--is not scalable to the volume and diversity of meteorology / climate datasets that exist in the world. Going down that path means effectively writing code to describe the structure / layout of every dataset in the world. Leveraging established community standards for data catalogs, or allowing users to very easily create their own catalogs, seems to me as the only viable path forward.

So a specific suggestion would be to refactor climetlab to use intake interally, rather than duplicating much of intake's functionality for data downloading, caching, loading, templating, etc. This would allow you to delete lots of code 🎉 and lower your maintenance burden. New functionality in terms of data loaders could be pursued upstream as intake plugins.

Pooch

https://github.com/fatiando/pooch

Does your Python package include sample datasets? Are you shipping them with the code? Are they getting too big?

Pooch is here to help! It will manage a data registry by downloading your data files from a server only when needed and storing them locally in a data cache (a folder on your computer).

Here are Pooch's main features:

Pure Python and minimal dependencies.

Download a file only if necessary (it's not in the data cache or needs to be updated).

Verify download integrity through SHA256 hashes (also used to check if a file needs to be updated).

Designed to be extended: plug in custom download (FTP, scp, etc) and post-processing (unzip, decompress, rename) functions.

Includes utilities to unzip/decompress the data upon download to save loading time.

Can handle basic HTTP authentication (for servers that require a login) and printing download progress bars.

Easily set up an environment variable to overwrite the data cache location.

Are you a scientist or researcher? Pooch can help you too!

Automatically download your data files so you don't have to keep them in your GitHub repository.

Make sure everyone running the code has the same version of the data files (enforced through the SHA256 hashes).

Pooch has a much narrower scope than intake. It is extremely stable and solid if what you want to do is download remote files to a local computer. It supports many of the same protocols as climetlab, and some other ones, such as Zenodo-based DOI downloads.

Here is how pooch would be used to download the climetlab test grib data

import pooch
import xarray as xr

catalog = pooch.create(
    path=pooch.os_cache("climetlab"),
    base_url="https://github.com/ecmwf/climetlab/raw/main/docs/examples/",
    registry={
        "test.grib": "md5:6395ffca06c42b8287d4d3f0e6d14d5f"
    }
)

local_file = catalog.fetch("test.grib")
xr.open_dataset(local_file, engine="cfgrib")

Here the opportunity for climetlab is to leverage Pooch's downloading / caching capabilities, rather than duplicating similar capabilities internally.

It may be possible that you looked at these packages and decided that they had feature gaps or bugs that made them unusable for your project. If so, an alternate path could be to work with the upstream libraries to resolve these gaps and bugs, instead of duplicating their functionality. Part of my goal in opening this issue is to state clearly that the broader scientific python community welcomes your involvement in and contributions to upstream packages. We would benefit greatly from your expertise.

Thank you for taking the time to read my long issue. I reiterate my commendation of your efforts to provide open-source software to the community and my alignment with your vision regarding the goals of this package. I welcome a discussion on these topics or any other ways you think we could be collaborating. 🙏

Is climetlab still supported on Python 3.7

Hi there,

I maintain a plugin and all my Python 3.7 workflows fail at the moment, whatever the OS used.
Here is an example:

Collecting climetlab
  Downloading climetlab-0.12.6.tar.gz (149 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.5/149.5 kB 4.1 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-3gji2bae/climetlab_83cb018ea656414aaac7ca81d54e30a1/setup.py", line 43, in <module>
          + " is not supported. Python 3.8 is required."
      Exception: Python version sys.version_info(major=3, minor=7, micro=15, releaselevel='final', serial=0) is not supported. Python 3.8 is required.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Error: Process completed with exit code 1.

So did Python 3.7 support for climetlab dropped? I would like to know because then I need to drop it for my plugin too.

Thank you in advance,

Jonathan

`assign_coords` on xarray datasets breaks `plot_map`

First of all, thanks for developing CliMetLab. It is very convenient and useful!

I'm trying to make a crop that wraps around the longitudinal coordinates, and have been facing a few issues. So far, I've been converting the CML dataset to xarray in order to do so.

import climetlab as cml
# Dataset probably doesn't matter, but to get the plugin run `pip install climetlab-maelstrom-ens10`
cmlds = cml.load_dataset('maelstrom-ens10', date='20170101', dtype='sfc')
ds = cmlds.to_xarray()

Then, I tried to select and re-sort the data using isel/sel:

cropds = ds.isel(latitude=slice(0,60), longitude=list(range(-80, 0)) + list(range(60)))

If I do so, cml.plot_map(cropds.t2m[0,0,0,0]) yields the following result:

which is visibly broken in the middle part. Checking with xarray's methods their plot is also broken.

Since the above is probably not the best way, I tried the xarray-recommended assign_coords:

dsc = ds.assign_coords(longitude=(((ds.longitude + 180) % 360) - 180))
dsc = dsc.sortby('longitude')

In this case, xarray itself produces correct plots:

dsc.t2m[0,0,0,0].plot.pcolormesh()

but cml.plot_map returns an exception:

AssertionError                            Traceback (most recent call last)
----> 1 cml.plot_map(dsc.t2m[0,0,0,0])
~/anaconda3/envs/py38/lib/python3.8/site-packages/climetlab/plotting/__init__.py in plot_map(data, **kwargs)
--> 111     p.plot_map(data)
~/anaconda3/envs/py38/lib/python3.8/site-packages/climetlab/plotting/__init__.py in plot_map(self, data, **kwargs)
---> 72             d = get_wrapper(d)
~/anaconda3/envs/py38/lib/python3.8/site-packages/climetlab/wrappers/__init__.py in get_wrapper(data, *args, **kwargs)
---> 52         wrapper = h(data, *args, **kwargs)
~/anaconda3/envs/py38/lib/python3.8/site-packages/climetlab/wrappers/xarray.py in wrapper(data, *args, **kwargs)
--> 110             return XArrayDatasetWrapper(data.to_dataset(), *args, **kwargs)
~/anaconda3/envs/py38/lib/python3.8/site-packages/climetlab/wrappers/xarray.py in __init__(self, data)
     45         if latitude is None or longitude is None:
---> 46             assert latitude is None and longitude is None

Thanks in advance!

Loading CSV with Pandas in CML

If a csv with only one column of floats is loaded with CML to_pandas() then it interprets . as the separator.
e.g. the file

soil_temperature
6.27001953125
5.779998779296875
6.1199951171875
6.010009765625
6.1400146484375

would be loaded as the following:

Can't merge ECMWF perturbed and control forecast together as xarray since release 0.10

Hi there,

Since the release 0.10.1, the EUMETNET benchmark plugin is partly broken because apparently cfgrib does not accept anymore to merge the control and perturbed ECMWF forecasts together.

I use the multi source to do that (see here), but when calling to_xarray, I get the following issue:

DatasetBuildError                         Traceback (most recent call last)
File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/cfgrib/dataset.py:633, in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
    632 try:
--> 633     dims, data_var, coord_vars = build_variable_components(
    634         var_index,
    635         encode_cf,
    636         filter_by_keys,
    637         errors=errors,
    638         squeeze=squeeze,
    639         read_keys=read_keys,
    640         time_dims=time_dims,
    641         extra_coords=extra_coords,
    642     )
    643 except DatasetBuildError as ex:
    644     # NOTE: When a variable has more than one value for an attribute we need to raise all
    645     #   the values in the file, not just the ones associated with that variable. See #54.

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/cfgrib/dataset.py:471, in build_variable_components(index, encode_cf, filter_by_keys, log, errors, squeeze, read_keys, time_dims, extra_coords)
    460 def build_variable_components(
    461     index: abc.Index[T.Any, abc.Field],
    462     encode_cf: T.Sequence[str] = (),
   (...)
    469     extra_coords: T.Dict[str, str] = {},
    470 ) -> T.Tuple[T.Dict[str, int], Variable, T.Dict[str, Variable]]:
--> 471     data_var_attrs = enforce_unique_attributes(index, DATA_ATTRIBUTES_KEYS, filter_by_keys)
    472     grid_type_keys = GRID_TYPE_MAP.get(index.getone("gridType"), [])

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/cfgrib/dataset.py:273, in enforce_unique_attributes(index, attributes_keys, filter_by_keys)
    272         fbks.append(fbk)
--> 273     raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
    274 if values and values[0] not in ("undef", "unknown"):

DatasetBuildError: multiple values for key 'dataType'

During handling of the above exception, another exception occurred:

DatasetBuildError                         Traceback (most recent call last)
Input In [17], in <module>
----> 1 fcs = ds.to_xarray(xarray_open_dataset_kwargs=dict(squeeze=True))
      2 fcs

File ~/climetlab-eumetnet-postprocessing-benchmark/climetlab_eumetnet_postprocessing_benchmark/gridded/training_data_forecasts.py:398, in TrainingDataForecastSurfacePostProcessed.to_xarray(self, **kwargs)
    397 def to_xarray(self, **kwargs):
--> 398     fcs = self.source.to_xarray(**kwargs)
    399     variables = list(fcs.keys())
    400     ds_list = list()

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/climetlab/readers/grib/fieldset.py:203, in FieldSet.to_xarray(self, **kwargs)
    185     xarray_open_dataset_kwargs[key] = mix_kwargs(
    186         user=user_xarray_open_dataset_kwargs.pop(key, {}),
    187         default={"errors": "raise"},
   (...)
    190         logging_main_key=key,
    191     )
    192 xarray_open_dataset_kwargs.update(
    193     mix_kwargs(
    194         user=user_xarray_open_dataset_kwargs,
   (...)
    200     )
    201 )
--> 203 result = xr.open_dataset(
    204     self,
    205     **xarray_open_dataset_kwargs,
    206 )
    208 def math_prod(lst):
    209     if not hasattr(math, "prod"):
    210         # python 3.7 does not have math.prod

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/xarray/backends/api.py:495, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    483 decoders = _resolve_decoders_kwargs(
    484     decode_cf,
    485     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    491     decode_coords=decode_coords,
    492 )
    494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
    496     filename_or_obj,
    497     drop_variables=drop_variables,
    498     **decoders,
    499     **kwargs,
    500 )
    501 ds = _dataset_from_backend_dataset(
    502     backend_ds,
    503     filename_or_obj,
   (...)
    510     **kwargs,
    511 )
    512 return ds

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/cfgrib/xarray_plugin.py:99, in CfGribBackend.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, lock, indexpath, filter_by_keys, read_keys, encode_cf, squeeze, time_dims, errors, extra_coords)
     77 def open_dataset(
     78     self,
     79     filename_or_obj: T.Union[str, abc.MappingFieldset[T.Any, abc.Field]],
   (...)
     96     extra_coords: T.Dict[str, str] = {},
     97 ) -> xr.Dataset:
---> 99     store = CfGribDataStore(
    100         filename_or_obj,
    101         indexpath=indexpath,
    102         filter_by_keys=filter_by_keys,
    103         read_keys=read_keys,
    104         encode_cf=encode_cf,
    105         squeeze=squeeze,
    106         time_dims=time_dims,
    107         lock=lock,
    108         errors=errors,
    109         extra_coords=extra_coords,
    110     )
    111     with xr.core.utils.close_on_error(store):
    112         vars, attrs = store.load()  # type: ignore

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/cfgrib/xarray_plugin.py:39, in CfGribDataStore.__init__(self, filename, lock, **backend_kwargs)
     37 else:
     38     opener = dataset.open_fieldset
---> 39 self.ds = opener(filename, **backend_kwargs)

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/cfgrib/dataset.py:730, in open_fieldset(fieldset, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, computed_keys, log, **kwargs)
    728 index = messages.FieldsetIndex.from_fieldset(fieldset, index_keys, computed_keys)
    729 filtered_index = index.subindex(filter_by_keys)
--> 730 return open_from_index(filtered_index, read_keys, time_dims, extra_coords, **kwargs)

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/cfgrib/dataset.py:706, in open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)
    699 def open_from_index(
    700     index: abc.Index[T.Any, abc.Field],
    701     read_keys: T.Sequence[str] = (),
   (...)
    704     **kwargs: T.Any,
    705 ) -> Dataset:
--> 706     dimensions, variables, attributes, encoding = build_dataset_components(
    707         index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
    708     )
    709     return Dataset(dimensions, variables, attributes, encoding)

File /opt/tljh/user/envs/climetlab_test/lib/python3.9/site-packages/cfgrib/dataset.py:654, in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
    652         fbks.append(fbk)
    653         error_message += "\n    filter_by_keys=%r" % fbk
--> 654     raise DatasetBuildError(error_message, key, fbks)
    655 short_name = data_var.attributes.get("GRIB_shortName", "paramId_%d" % param_id)
    656 var_name = data_var.attributes.get("GRIB_cfVarName", "unknown")

DatasetBuildError: multiple values for unique key, try re-open the file with one of:
    filter_by_keys={'dataType': 'pf'}
    filter_by_keys={'dataType': 'cf'}

So the problem seems to be that it has become more strict about the multiple values for keys, using this function enforce_unique_attributes.

cfgrib version is 0.9.10.

How can I resolve this issue? Can I tell him to be more forgiving about the attributes uniqueness?

Thank you in advance.

To reproduce, simply install the plugin

pip install climetlab climetlab-eumetnet-postprocessing-benchmark

and run the notebook https://github.com/Climdyn/climetlab-eumetnet-postprocessing-benchmark/tree/main/notebooks/demo_ensemble_forecasts.ipynb .

Request for Enhanced Documentation on Diverse Data Retrieval in ClimetLab

What maintenance does this project need?

I am a PhD student in Atmospheric Sciences at the University of São Paulo. I recently discovered ClimetLab during the Machine Learning MOC by ECMWF and believe it has great potential for my research. However, I've encountered challenges due to the lack of detailed examples in the documentation, particularly regarding the retrieval of various data sources such as ERA5 pressure level data and ERA5 surface fluxes.

While the existing documentation provides a good starting point, it would be immensely beneficial for users like myself if it included more comprehensive examples and guidelines for accessing different types of datasets. Such enhancements would not only aid in research efficiency but also broaden the user base of ClimetLab within the atmospheric science community.

I appreciate your consideration of this request and look forward to any updates that could make ClimetLab even more user-friendly and versatile for researchers.

Organisation

University of São Paulo - Institute of Astronomy, Geophysics and Atmospheric Sciences

add support for loading single-precision data from GRIB files

@floriankrb - hi Florian, it'd be great if we could load data from GRIB files (like ERA5 / WeatherBench) in single precision (float32) instead of np.float64, which appears to be the current default. Reading float64s and converting to float32 for ML seems rather wasteful. I'd be happy to assist with this if possible. Cheers, ~M

TypeError while saving a plot to SVG

While using cml.plot_map with a path argument to save the plot as SVG, the following error appears, even though the plot is successfully saved to a .svg file. This error does not appear when saving the file to pdf or png.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-aab10829c1c0> in <module>()
      1 # cml.plot_map(data, title=True)
----> 2 cml.plot_map(data, title=True, path="test-grib-plot.svg")

/usr/local/lib/python3.7/dist-packages/climetlab/plotting/__init__.py in plot_map(data, **kwargs)
    176     p = new_plot(**kwargs)
    177     p.plot_map(data)
--> 178     p.show()
    179 
    180 

/usr/local/lib/python3.7/dist-packages/climetlab/plotting/__init__.py in show(self)
    120 
    121     def show(self):
--> 122         self.backend.show(display=display)
    123 
    124     def macro(self) -> list:

/usr/local/lib/python3.7/dist-packages/climetlab/plotting/backends/magics/backend.py in show(self, display)
    327             Display = Image  # noqa: N806
    328 
--> 329         return display(Display(path, metadata=dict(width=width)))
    330 
    331     def save(self, path):

TypeError: __init__() got an unexpected keyword argument 'metadata'

Steps to reproduce

# download the grib file that comes with ClieMetLab
!wget https://raw.githubusercontent.com/ecmwf/climetlab/develop/docs/examples/test.grib

Load the file and try to plot it with while saving also providing path for plot be saved in svg

import climetlab as cml

data = cml.load_source("file", "test.grib")

cml.plot_map(data, title=True, path="test-grib-plot.svg")

Directions for Contributors

Hi @floriankrb , CliMetLab's documentation needs work and a new contributor can perhaps start by writing test-cases for new features too. What would be your advice for someone looking to contribute to CliMetLab? Where to start?

I'm passionate about working on Climate Change and CliMetLab seems like a perfect start. Pedro recommended CliMetLab to me.

Example of `plot_map` method in a climetlab dataset package

Is your feature request related to a problem? Please describe.

Hi, this is a question and not a feature request. I have been playing with creating a dataset python package for climetlab. The loader works fine but I haven't properly worked out the best way to implement plot_map(...) as a method to my dataset class.

Is there a best-practice example package that you can point me to that implements plot_map(...) within one of the dataset plugins that already exists? Thanks

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Organisation

STFC-CEDA

ecmwflibs should not be mandatory as dependency

See #9 and comments

sqlite3.OperationalError: database is locked

Moved from: ecmwf-lab/climetlab-s2s-ai-challenge#32

I get the following error when downloading the new observations. I switched to a new cache location before.

cml.load_dataset(f"s2s-ai-challenge-test-output-reference", parameter=['tp','t2m'], date=xr.cftime_range(start='20200102',freq='7D', periods=53).strftime('%Y%m%d').to_list()).to_xarray()

  4%|████▎                                                                                                              | 4/106 [00:08<03:31,  2.07s/it]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/datasets/__init__.py", line 220, in load_dataset
    ds = klass(*args, **kwargs).mutate()
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab_s2s_ai_challenge/observations.py", line 96, in __init__
    PreprocessedObservations.__init__(self, *args, dataset="test-output-reference", **kwargs)
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/normalize.py", line 137, in inner
    return func(**normalized)
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab_s2s_ai_challenge/observations.py", line 75, in __init__
    self.source = cml.load_source("url-pattern", PATTERN_OBS, request, merger=S2sMerger(engine="netcdf4"))
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/sources/__init__.py", line 133, in load_source
    return source(name, *args, **kwargs).mutate()
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/sources/__init__.py", line 102, in __call__
    source = klass(*args, **kwargs)
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/sources/url_pattern.py", line 37, in __init__
    sources = list(tqdm(futures, leave=True, total=len(urls)))
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/tqdm/std.py", line 1130, in __iter__
    for obj in iterable:
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/sources/url_pattern.py", line 31, in url_to_source
    return Url(url)
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/sources/url.py", line 51, in __init__
    self.path = self.cache_file(url, extension=ext)
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/sources/__init__.py", line 50, in cache_file
    return cache_file(owner, *args, extension=extension)
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/core/caching.py", line 297, in cache_file
    update_cache()
  File "/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/climetlab/core/caching.py", line 185, in update_cache
    db.executemany("UPDATE cache SET size=?, type=? WHERE path=?", update)
sqlite3.OperationalError: database is locked

Feature preparation for weather data to use in AI/ML application

I am following an invitation from @floriankrb and Peter Dueben to share some insights of how I tackle some issues working with Weather data and AI.

Actually we do not know if this is helpful, but i think we need well structured and unified preprocessing for all users and climetlab could be the place to be.

The two points where I think you can treat weather data in the wrong way are:

aligning forecast data with measurement/observation/target data
structuring data for algorithms that require sequences

For both issues I have built solutions and I hope you can validate the way I do .

Alignment of features

from typing import Tuple
import pandas as pd
COLUMN_DT_FORE = 'dt_fore'

def align_features(forecast_data: pd.DataFrame, target_data: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame]:
        """
        takes both predictors and target values and derives intersection
        of both to create two matching dataframes by using dt_fore

        forecast_data contains MultiIndex with dt_calc, dt_fore, positional_index
        dt_calc: INIT/calculation run timestaml
        dt_fore: leading forecast timestamp
        positional_index: location based indexer

        """
        _target_data = []
        _target_index = []
        _rows_to_take = []
        for dt_fore in forecast_data.index.get_level_values(COLUMN_DT_FORE):
            try:
                _target_data.append(target_data.loc[dt_fore, :].values)
                _target_index.append(dt_fore)
                _rows_to_take.append(True)
            except KeyError:
                _rows_to_take.append(False)

        forecast_features = forecast_data.loc[_rows_to_take, :]
        target = pd.DataFrame(_target_data, index=_target_index)
        return forecast_features, target

Preprocess data according to sequences

This topic is relevant in case you would like to use recurrent neural networks like LSTM or Convolutional layers.

import pandas as pd

COLUMN_POSITIONAL_INDEX = 'positional_index'
COLUMN_DT_CALC = 'dt_calc'

def pre_process_lstm_dataframe_with_forecast_data(
    data: pd.DataFrame,
    lstm_sequence_length: int,
) -> pd.DataFrame:
    """
    This pre processing step builds sequence according to the lstm_sequence_length for data that contains forecast.
    A forecast dataset is characterized by a number of dt_calc with several dt_fores for each dt_calc.

    Note: This function requires equal weighted intervals.

    Args:
          data: pd.DataFrame with MultiIndex
          lstm_sequence_length: historical length of sequence in the dimension of time_frequency
          date_offset: granularity of time steps as DateOffset object

    Returns:
         dataframe with list objects as entries

    """

    def seq(n):
        """generator object to pre process data for use in lstm"""
        df = data.reset_index()
        for g in df.groupby(
            [COLUMN_POSITIONAL_INDEX, COLUMN_DT_CALC], sort=False
        ).rolling(n):
            yield g[data.columns].to_numpy().T if len(g) == n else []

    return pd.DataFrame(
        seq(lstm_sequence_length), index=data.index, columns=data.columns
    ).dropna()

As you can see, I am working with historical point forecasts. But I think this should work for arrays as well. In the end every 2D data can be transformed in such a DataFrame, but I think for array data it is not the best to do this with pandas. I am pretty sure that there are smarter solutions than these I am presenting here.

From my point of view these are the most important steps and differences to ordinary ML applications. Please let me know what you think about the topic.

Please note, that I am working hard to establish our company alitiq, so my time to contribute operational code for climetlab is limited. I will give my best to share knowledge and best practice.

I am really looking forward to discuss with you.

cml.plot_map doesn't show map

Hello,

In ECMWF e-learning Tier 2, these code doesn't show any map, only xarray dataset

ds = cml.load_source('file', 'test.grib')
for field in ds:
    cml.plot_map(field)
ds.to_xarray()

I run the code in my PC with this configuration,

python_version()
3.8.8

!climetlab versions
IPython            7.22.0
IPython            7.22.0
Magics             1.5.8
abc                c:\users\salmiah\anaconda3\lib\abc.py
argparse           1.1
asyncssh           missing
atexit             builtin
branca             0.5.0
bz2                c:\users\salmiah\anaconda3\lib\bz2.py
calendar           c:\users\salmiah\anaconda3\lib\calendar.py
cdsapi             0.6.1
climetlab          0.13.15
cmd                c:\users\salmiah\anaconda3\lib\cmd.py
codc               missing
code               c:\users\salmiah\anaconda3\lib\code.py
collections        c:\users\salmiah\anaconda3\lib\collections\__init__.py
contextlib         c:\users\salmiah\anaconda3\lib\contextlib.py
copy               c:\users\salmiah\anaconda3\lib\copy.py
csv                1.0
ctypes             1.1.0
dask               2021.4.0
dask_jobqueue      missing
datetime           c:\users\salmiah\anaconda3\lib\datetime.py
dateutil           2.8.2
distributed        2021.4.0
docutils           0.17
eccodes            1.5.2
ecmwf              namespace
ecmwfapi           1.6.3
ecmwflibs          0.5.1
entrypoints        0.3
filelock           3.0.12
findlibs           0.0.2
fnmatch            c:\users\salmiah\anaconda3\lib\fnmatch.py
folium             0.12.1.post1
functools          c:\users\salmiah\anaconda3\lib\functools.py
getpass            c:\users\salmiah\anaconda3\lib\getpass.py
glob               c:\users\salmiah\anaconda3\lib\glob.py
gribapi            2.27.0
gzip               c:\users\salmiah\anaconda3\lib\gzip.py
h5py               2.10.0
hashlib            c:\users\salmiah\anaconda3\lib\hashlib.py
imageio            2.9.0
importlib          c:\users\salmiah\anaconda3\lib\importlib\__init__.py
importlib_metadata 6.1.0
inspect            c:\users\salmiah\anaconda3\lib\inspect.py
io                 c:\users\salmiah\anaconda3\lib\io.py
ipywidgets         7.6.3
itertools          builtin
jinja2             2.11.3
json               2.0.9
logging            0.5.1.2
lzma               c:\users\salmiah\anaconda3\lib\lzma.py
markdown           3.4.3
math               builtin
metview            missing
mimetypes          c:\users\salmiah\anaconda3\lib\mimetypes.py
multiprocessing    c:\users\salmiah\anaconda3\lib\multiprocessing\__init__.py
multiurl           0.2.1
numbers            c:\users\salmiah\anaconda3\lib\numbers.py
numpngw            0.1.2
numpy              1.24.2
oauthlib           3.2.2
os                 c:\users\salmiah\anaconda3\lib\os.py
pandas             2.0.0
pathlib            c:\users\salmiah\anaconda3\lib\pathlib.py
pdbufr             0.10.1
pickle             c:\users\salmiah\anaconda3\lib\pickle.py
pkgutil            c:\users\salmiah\anaconda3\lib\pkgutil.py
platform           1.0.8
pyfdb              missing
pyodc              1.1.4
pytest             6.2.3
random             c:\users\salmiah\anaconda3\lib\random.py
re                 2.2.1
readline           c:\users\salmiah\anaconda3\lib\site-packages\readline.py
requests           2.25.1
requests_oauthlib  1.3.1
s3fs               missing
scipy              1.8.1
shlex              c:\users\salmiah\anaconda3\lib\shlex.py
shutil             c:\users\salmiah\anaconda3\lib\shutil.py
socket             c:\users\salmiah\anaconda3\lib\socket.py
sqlite3            2.6.0
stat               c:\users\salmiah\anaconda3\lib\stat.py
subprocess         c:\users\salmiah\anaconda3\lib\subprocess.py
sys                builtin
tarfile            0.9.0
tempfile           c:\users\salmiah\anaconda3\lib\tempfile.py
tensorflow         2.10.0
termcolor          2.2.0
textwrap           c:\users\salmiah\anaconda3\lib\textwrap.py
threading          c:\users\salmiah\anaconda3\lib\threading.py
time               builtin
torch              missing
tqdm               4.59.0
traceback          c:\users\salmiah\anaconda3\lib\traceback.py
typing             c:\users\salmiah\anaconda3\lib\typing.py
unittest           c:\users\salmiah\anaconda3\lib\unittest\__init__.py
urllib             c:\users\salmiah\anaconda3\lib\urllib\__init__.py
warnings           c:\users\salmiah\anaconda3\lib\warnings.py
weakref            c:\users\salmiah\anaconda3\lib\weakref.py
xarray             2023.1.0
xml                c:\users\salmiah\anaconda3\lib\xml\__init__.py
yaml               5.4.1
zarr               missing
zipfile            c:\users\salmiah\anaconda3\lib\zipfile.py

is there something wrong?
Thanks

ValueError: cannot include dtype 'M' in a buffer

When I run this command：
parameter = ['tp', 't2m']
ds = cml.load_source("file-pattern", data_path+"{var}.nc",var = parameter).to_xarray()

get Error: ValueError: cannot include dtype 'M' in a buffer

The complete error message is:

ValueError Traceback (most recent call last)
/tmp/ipykernel_222242/146458586.py in
7 #parameter = ['t2m']
8 parameter = ['tp', 't2m']
----> 9 ds = cml.load_source("file-pattern", data_path+"{var}.nc",var = parameter).to_xarray()
10 print(ds)

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/climetlab/sources/multi.py in to_xarray(self, **kwargs)
97
98 def to_xarray(self, **kwargs):
---> 99 return make_merger(self.merger, self.sources).to_xarray(**kwargs)
100
101 def to_tfdataset(self, **kwargs):

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/climetlab/mergers/init.py in to_xarray(self, **kwargs)
101 paths=self.paths,
102 reader_class=self.reader_class,
--> 103 **kwargs,
104 )
105

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/climetlab/mergers/xarray.py in merge(sources, paths, reader_class, **kwargs)
75 return reader_class.to_xarray_multi_from_paths(
76 paths,
---> 77 **options,
78 )
79

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/climetlab/readers/netcdf.py in to_xarray_multi_from_paths(cls, paths, **kwargs)
331 return xr.open_mfdataset(
332 paths,
--> 333 **options,
334 )
335

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
940 coords=coords,
941 join=join,
--> 942 combine_attrs=combine_attrs,
943 )
944 else:

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/combine.py in combine_by_coords(data_objects, compat, data_vars, coords, fill_value, join, combine_attrs, datasets)
989 fill_value=fill_value,
990 join=join,
--> 991 combine_attrs=combine_attrs,
992 )

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/merge.py in merge(objects, compat, join, fill_value, combine_attrs)
908 join,
909 combine_attrs=combine_attrs,
--> 910 fill_value=fill_value,
911 )
912 return Dataset._construct_direct(**merge_result._asdict())

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value)
639 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
640 variables, out_indexes = merge_collected(
--> 641 collected, prioritized, compat=compat, combine_attrs=combine_attrs
642 )
643 assert_unique_multiindex_level_names(variables)

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/merge.py in merge_collected(grouped, prioritized, compat, combine_attrs)
240 variables = [variable for variable, _ in elements_list]
241 try:
--> 242 merged_vars[name] = unique_variable(name, variables, compat)
243 except MergeError:
244 if compat != "minimal":

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals)
144 out = out.compute()
145 for var in variables[1:]:
--> 146 equals = getattr(out, compat)(var)
147 if not equals:
148 break

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/variable.py in no_conflicts(self, other, equiv)
1976 or both, contain NaN values.
1977 """
-> 1978 return self.broadcast_equals(other, equiv=equiv)
1979
1980 def quantile(

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/variable.py in broadcast_equals(self, other, equiv)
1958 except (ValueError, AttributeError):
1959 return False
-> 1960 return self.equals(other, equiv=equiv)
1961
1962 def identical(self, other, equiv=duck_array_ops.array_equiv):

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/variable.py in equals(self, other, equiv)
1942 try:
1943 return self.dims == other.dims and (
-> 1944 self._data is other._data or equiv(self.data, other.data)
1945 )
1946 except (TypeError, AttributeError):

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in array_notnull_equiv(arr1, arr2)
270 with warnings.catch_warnings():
271 warnings.filterwarnings("ignore", "In the future, 'NAT == x'")
--> 272 flag_array = (arr1 == arr2) | isnull(arr1) | isnull(arr2)
273 return bool(flag_array.all())
274 else:

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/dask/array/core.py in eq(self, other)
1745
1746 def eq(self, other):
-> 1747 return elemwise(operator.eq, self, other)
1748
1749 def gt(self, other):

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/dask/array/core.py in elemwise(op, *args, **kwargs)
3774 for a in args
3775 ),
-> 3776 **blockwise_kwargs
3777 )
3778

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/dask/array/blockwise.py in blockwise(func, out_ind, name, token, dtype, adjust_chunks, new_axes, align_arrays, concatenate, meta, *args, **kwargs)
143
144 if align_arrays:
--> 145 chunkss, arrays = unify_chunks(*args)
146 else:
147 arginds = [(a, i) for (a, i) in toolz.partition(2, args) if i is not None]

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/dask/array/core.py in unify_chunks(*args, **kwargs)
3043
3044 arginds = [
-> 3045 (asanyarray(a) if ind is not None else a, ind) for a, ind in partition(2, args)
3046 ] # [x, ij, y, jk]
3047 args = list(concat(arginds)) # [(x, ij), (y, jk)]

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/dask/array/core.py in (.0)
3043
3044 arginds = [
-> 3045 (asanyarray(a) if ind is not None else a, ind) for a, ind in partition(2, args)
3046 ] # [x, ij, y, jk]
3047 args = list(concat(arginds)) # [(x, ij), (y, jk)]

~/anaconda3/envs/S2SAI/lib/python3.7/site-packages/dask/array/core.py in asanyarray(a)
3618 elif hasattr(a, "to_dask_array"):
3619 return a.to_dask_array()
-> 3620 elif hasattr(a, "data") and type(a).module.startswith("xarray."):
3621 return asanyarray(a.data)
3622 elif isinstance(a, (list, tuple)) and any(isinstance(i, Array) for i in a):

ValueError: cannot include dtype 'M' in a buffer

can someone help find the reason for the error? Thank you very much.

When I run this command open one file, it's ok.
parameter = ['tp'']
ds = cml.load_source("file-pattern", data_path+"{var}.nc",var = parameter).to_xarray()

tp.nc Data information is :
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, lead_time: 2, latitude: 121, longitude: 240)
Coordinates:

category (category) object 'below normal' 'near normal' 'above normal'
forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
lead_time (lead_time) timedelta64[ns] 14 days 28 days
latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (forecast_time, lead_time) datetime64[ns] dask.array<chunksize=(53, 2), meta=np.ndarray>
Data variables:
tp (category, forecast_time, lead_time, latitude, longitude) float32 dask.array<chunksize=(3, 53, 2, 121, 240), meta=np.ndarray>
Attributes:
GRIB_edition: 1
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2021-06-04T12:48 GRIB to CDM+CF via cfgrib-0.9.9...
comment_lead_time: The value of valid_time does not refer to the da...

t2m.nc Data information is :
<xarray.Dataset>
Dimensions: (category: 3, forecast_time: 53, lead_time: 2, latitude: 121, longitude: 240)
Coordinates:

category (category) object 'below normal' 'near normal' 'above normal'
forecast_time (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
lead_time (lead_time) timedelta64[ns] 14 days 28 days
latitude (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (forecast_time, lead_time) datetime64[ns] dask.array<chunksize=(53, 2), meta=np.ndarray>
Data variables:
t2m (category, forecast_time, lead_time, latitude, longitude) float32 dask.array<chunksize=(3, 53, 2, 121, 240), meta=np.ndarray>
Attributes:
GRIB_edition: 1
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2021-06-04T12:48 GRIB to CDM+CF via cfgrib-0.9.9...
comment_lead_time: The value of valid_time does not refer to the da...

Finally, can someone help find the reason for the error? Thank you very much.

ecCodes 2.22.0 or higher is recommended but not released

Warning: ecCodes 2.22.0 or higher is recommended. You are running version 2.21.0

https://confluence.ecmwf.int/display/ECC/Latest+news

INFO Request is queued

I am trying to access ERA5 data using both cdsapi and climetlab. Just replicating the example notebook here (https://climetlab.readthedocs.io/en/latest/examples/03-source-cds.html) on my laptop as well as provided binder link.

The code works fine to plot 2m temperature and mean sea level as shown in the example notebook. However, the moment I change one of the requested variables, the notebook code gets stuck indefinitely with the following message.

source = cml.load_source(
    "cds",
    "reanalysis-era5-single-levels",
    variable=["2t", "blh"],
    product_type="reanalysis",
    area=[50, -50, 20, 50],
    date="2012-12-12",
    time="12:00",
    format="netcdf",
)
for s in source:
    cml.plot_map(s)

2021-10-25 19:31:26,858 INFO Lock 140013250624784 acquired on /tmp/climetlab-jovyan/cdsretriever-d1b901f874b7ac4e804d713920fc5c40fa5f10d2ca221b395896a72f6c9ca41b.nc.lock
2021-10-25 19:31:26,962 INFO Welcome to the CDS
2021-10-25 19:31:26,962 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
2021-10-25 19:31:27,080 INFO Request is queued

The message above is from the binder notebook but I get the same result on my laptop as well.

ecmwf / climetlab Goto Github PK

climetlab's Introduction

CliMetLab

Plugins

License

climetlab's People

Contributors

Stargazers

Watchers

Forkers

climetlab's Issues

What maintenance does this project need?

Organisation

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

Intake

Pooch

What maintenance does this project need?

Organisation

Steps to reproduce

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

Alignment of features

Preprocess data according to sequences

get Error: ValueError: cannot include dtype 'M' in a buffer

can someone help find the reason for the error? Thank you very much.

Recommend Projects

Recommend Topics

Recommend Org