ecmwf / cfgrib Goto Github PK

View Code? Open in Web Editor NEW

393.0 393.0 75.0 10.16 MB

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes

License: Apache License 2.0

Makefile 0.45% Python 96.53% Jupyter Notebook 1.29% PowerShell 1.73%

grib meteorology

cfgrib's People

Contributors

Stargazers

Watchers

Forkers

iainrussell alexamici masabhathini exwexs jlandmann shoyer calmomau bram94 mhdella jumaalnassri maxdow moonexplorer flashlxy enyfeo wqshen rvalenzuelar lianyipython qiuzeng296 bweng001 macweather junjie2008v jarnorfb cofinoa b8raoult d-reynolds casadoj meteodaniel jqs-noaa xavierabellan cmitr raybellwaves point9repeating agile-lee blaylockbk xrosliang imarcello ilopezgp pits5 pmav99 linuist vasarabharat likev dopplershift peterdudfield isabella232 sahar-github emfdavid gorjammk dmey headtr1ck sandorkertesz cosunae bearman88 eddycmwf ensemblexinxin meridionaljet arpitjain799 metamess cole-p elicharlese aerosolwzt tsupinie-spc xhzhaois mwtoews steph-ben henghengniceman xenos-code leoldn maresb jthielen

cfgrib's Issues

Add support for saving the index produced with a full GRIB scan at open.

At the moment every time a GRIB file is opened cfgrib needs to scan all the messages in the file to build the index that is then used to compute the values of the coordinates and build the hypercube representation of the variables.

Worse, when opening a GRIB file with the convenience function open_datasets the index is discarded every time the recursive call fails and the expensive file scan is done again.

Proposed implementation requirements for the feature are:

save the index to disk with path + .idx immediately after computation
- a pickle of the in-memory structure is the simplest implementation
- shall not fail if the index cannot be written (file can be on a read only filesystem)
when opening a file search for the path + .idx index file, test that it is in sync with the GRIB file and load it
- timestamp ordering is enough for now
- do not fail if the index is corrupt
use locking to avoid concurrent writes or reads and write
- concurrents reads must be ok

some variables in GFS/FNL grib cannot be read by cfgrib

I use wgrib2 to list all the variables in GFS or FNL data.

238:124466869:d=2016091318:TMP:1000 mb:anl:
239:124885334:d=2016091318:RH:1000 mb:anl:
240:125601738:d=2016091318:VVEL:1000 mb:anl:
241:126734494:d=2016091318:UGRD:1000 mb:anl:
242:127239459:d=2016091318:VGRD:1000 mb:anl:
243:127726020:d=2016091318:ABSV:1000 mb:anl:
244:128427260:d=2016091318:CLWMR:1000 mb:anl:
246:129424957:d=2016091318:HGT:1000 mb:anl:

But cfgrib only reads these variables in pressure level,it misses many variables.

Data variables:
gh (isobaricInhPa, latitude, longitude) float32 ...
t (isobaricInhPa, latitude, longitude) float32 ...
r (isobaricInhPa, latitude, longitude) float32 ...
u (isobaricInhPa, latitude, longitude) float32 ...
v (isobaricInhPa, latitude, longitude) float32 ...

Is there any solutions?

Packaging for the Anaconda / Conda environment.

I am struggling to have cfgrib working properly in a Anaconda Python 3 environment. Unfortunately, the eccodes packages doesn't get along with gdal breaking it... I will try with homebrew, even if I don't like it...

Performance of GRIB files read with dask and dask.distributed can be improved.

Support for dask and dask.distributed is currently quite good from the point of view of the correctness. We are able to perform complex operations on 320Gb of data from 20 files on a 10 VM x 8 core dask.distributed cluster with no problem.

Performance is not stellar and more work is needed in this respect.

Slow when writing to Zarr

I am trying to convert a multiple files grib files Xarray dataset into Zarr. Reading these files is relatively quick, but writing to Zarr is going very slow. What I am doing:

import xarray as xr
chunks = {"time": 24, "latitude": 103, "longitude": 180}
ds = xr.open_mfdataset("{}/*/*/*_{}_*.grb".format(source, var), chunks=chunks, engine='cfgrib', backend_kwargs = {'indexpath': '/mnt/test/era5/grib_index/idx'}, concat_dim='time')
ds.to_zarr("mypath")

It writes about 1MB per 10 seconds and it is using 100% CPU. There is a lot performance capacity on the disk, so I assume it should be about how it reads grib files.

>>> xarray.__version__
'0.11.0'

python -V
Python 3.7.0

pip list |grep cfgrib
cfgrib           0.9.4.1

Each file is about 1.5GB and I have about 216 files to write as Zarr

Add `rotated_ll` support

As reported by @guidocioni GRIB files in the gridType==rotated_ll can be structured as 2D grids like lambert.

Example files in: https://opendata.dwd.de/weather/nwp/cosmo-d2-eps/grib/00/t_2m/

RuntimeError: error while translating coordinate: 'latitude' on cf2cdm.translate_coords

GRIB files that produce a non-dimension coordinate (for example rotated_ll) crash when attemping to cf2cdm.translate_coords:

>>> cf2cdm.translate_coords(rotated_ll_data, coord_model=cf2cdm.CDS)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/devel/MPY/cfgrib/cf2cdm/cfcoords.py in translate_coords(data, coord_model, errors, coord_translators)
    172         try:
--> 173             data = translator(cf_name, data, coord_model=coord_model)
    174         except:

~/devel/MPY/cfgrib/cf2cdm/cfcoords.py in coord_translator(default_out_name, default_units, default_direction, is_cf_type, cf_type, data, coord_model)
     79         data.coords[out_name].attrs['units'] = units
---> 80     data = translate_direction(data, out_name, stored_direction)
     81     return data

~/devel/MPY/cfgrib/cf2cdm/cfcoords.py in translate_direction(data, out_name, stored_direction)
     49     values = data.coords[out_name].values
---> 50     if values[-1] > values[0] and stored_direction == 'decreasing':
     51         data = data.isel({out_name: slice(None, None, -1)})

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-25-f05da76a7f8c> in <module>
----> 1 cosmo = cf2cdm.translate_coords(cosmo, coord_model=cf2cdm.CDS)

~/devel/MPY/cfgrib/cf2cdm/cfcoords.py in translate_coords(data, coord_model, errors, coord_translators)
    174         except:
    175             if errors != 'ignore':
--> 176                 raise RuntimeError("error while translating coordinate: %r" % cf_name)
    177     return data
    178 

RuntimeError: error while translating coordinate: 'latitude'

ValueError: multiple values for unique attribute 'stepUnits': [1, 0]

Hey,

i tried to read the following grib data with xarray_store.open_dataset('cosmo-d2_germany_regular-lat-lon_single-level_2018073000_002_ASWDIR_S.grib2') and got the following error:

ValueError: multiple values for unique attribute 'stepUnits': [1, 0]

How to deal with it?

cosmo-d2_germany_regular-lat-lon_single-level_2018073000_00.zip

My setup:
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux
and cfgrib Version: 0.8.4.1

Add support to open multiple GRIB files as a single Stream / Dataset

At low level we use an explicit file path and file offset in several places.

Note that xr.open_mfdataset handles opening and merging of multiple files without any additional support from the low-level driver so this feature is low priority.

Improvemente: adding indexpath:'' still prints an error

I'm opening multiple grib files that are located in a directory where I only have reading permissions. Therefore, writing an index file is not possible. I included the indexpath:'' and it allows me to open the file, but still prints an error , which is quite annoying.
Maybe in future versions this can be eliminated

valid_time is computed incorrectly

It seems to add the step, which is in hours, as if it is actually in seconds.

For example, try with this GRIB file in the sample-data folder:
multi_param_on_multi_dims.grib

time is 2018-04-04T12:00:00
steps are 0, 12, 24, 36 hours
valid_time is
array(['2018-04-04T12:00:00.000000000', '2018-04-04T12:00:12.000000000',
'2018-04-04T12:00:24.000000000', '2018-04-04T12:00:36.000000000'],

Add a "dump" command to `cfgrib`

Something along the lines of:

$ cfgrib dump era5-levels-members.grib
<xarray.Dataset>
Dimensions:        (isobaricInhPa: 2, latitude: 61, longitude: 120, number: 10, time: 4)
Coordinates:
  * number         (number) int64 0 1 2 3 4 5 6 7 8 9
  * time           (time) datetime64[ns] 2017-01-01 ... 2017-01-02T12:00:00
    step           timedelta64[ns] ...
  * isobaricInhPa  (isobaricInhPa) int64 850 500
  * latitude       (latitude) float64 90.0 87.0 84.0 81.0 ... -84.0 -87.0 -90.0
  * longitude      (longitude) float64 0.0 3.0 6.0 9.0 ... 351.0 354.0 357.0
    valid_time     (time) datetime64[ns] ...
Data variables:
    z              (number, time, isobaricInhPa, latitude, longitude) float32 ...
    t              (number, time, isobaricInhPa, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 ...

TypeError: Timestamp subtraction must have the same timezones or no timezones

test_cfgrib_cli_to_netcdf test fails with xarray==0.11.1 with:

>       assert res.exit_code == 0
E       AssertionError: assert -1 == 0
E        +  where -1 = <Result TypeError('Timestamp subtraction must have the same timezones or no timezones')>.exit_code

Build a wheel binary distribution that includes ecCodes

At the moment ecCodes must be installed on the system for cfgrib to work.

Following thew strategy of https://github.com/mapbox/rasterio it would e possible to package the whole ecCodes library and data files in the multilinux wheel binary package.

Pro:

no need to have two install steps, only pip install cfgrib
no need to install ecCodes as root

Con:

huge binary wheel package
possible conflicts with other Python packages that link to ecCodes (quite remote)

The infrastructure used by rasterio is here: https://github.com/sgillies/frs-wheel-builds

Error when loading a UK MetOffice seasonal forecast

I have downloaded a GRIB from the CDS with the following API request:

CDS API request

import cdsapi

c = cdsapi.Client()

r = c.retrieve(
    'seasonal-monthly-single-levels',
    {
        'originating_centre':'ukmo',
        'variable':'2m_temperature',
        'product_type':'monthly_mean',
        'year':[
            '2014','2015'
        ],
        'month':[
            '06','07'
        ],
        'leadtime_month':'1',
        'format':'grib'
    })

r.download('download.grib')

But when loading with `cfgrib` with the line:

my_data = xarray_store.open_dataset('/path/to/download.grib')

I get:

ValueError                                Traceback (most recent call last)
<ipython-input-63-523d0bd9e50f> in <module>()
----> 1 my_data = xarray_store.open_dataset('/path/to/download.grib')

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/xarray_store.py in open_dataset(path, flavour_name, filter_by_keys, **kwargs)
    160         if k.startswith('encode_'):
    161             overrides[k] = kwargs.pop(k)
--> 162     store = GribDataStore.frompath(path, **overrides)
    163     return _open_dataset(store, **kwargs)
    164 

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/xarray_store.py in frompath(cls, path, flavour_name, **kwargs)
    102         config = flavour.pop('dataset', {}).copy()
    103         config.update(kwargs)
--> 104         return cls(ds=cfgrib.Dataset.frompath(path, **config), **flavour)
    105 
    106     def __attrs_post_init__(self):

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/dataset.py in frompath(cls, path, mode, **kwargs)
    393     @classmethod
    394     def frompath(cls, path, mode='r', **kwargs):
--> 395         return cls(stream=messages.Stream(path, mode, message_class=cfmessage.CfMessage), **kwargs)
    396 
    397     def __attrs_post_init__(self):

<attrs generated init c839a147c90eb3321ada82313ce86e3ade1b1758> in __init__(self, stream, encode_parameter, encode_time, encode_vertical, encode_geography, filter_by_keys)
      6     self.encode_geography = encode_geography
      7     self.filter_by_keys = filter_by_keys
----> 8     self.__attrs_post_init__()

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/dataset.py in __attrs_post_init__(self)
    396 
    397     def __attrs_post_init__(self):
--> 398         dims, vars, attrs = build_dataset_components(**self.__dict__)
    399         self.dimensions = dims  # type: T.Dict[str, T.Optional[int]]
    400         self.variables = vars  # type: T.Dict[str, Variable]

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/dataset.py in build_dataset_components(stream, encode_parameter, encode_time, encode_vertical, encode_geography, filter_by_keys)
    367         var_index = index.subindex(paramId=param_id)
    368         dims, data_var, coord_vars = build_data_var_components(
--> 369             var_index, encode_parameter, encode_time, encode_geography, encode_vertical,
    370         )
    371         if encode_parameter and var_name != 'undef':

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/dataset.py in build_data_var_components(index, encode_parameter, encode_time, encode_geography, encode_vertical, log)
    281     data_var_attrs_keys = DATA_ATTRIBUTES_KEYS[:]
    282     data_var_attrs_keys.extend(GRID_TYPE_MAP.get(index.getone('gridType'), []))
--> 283     data_var_attrs = enforce_unique_attributes(index, data_var_attrs_keys)
    284     if encode_parameter:
    285         data_var_attrs['standard_name'] = data_var_attrs.get('GRIB_cfName', 'undef')

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/dataset.py in enforce_unique_attributes(index, attributes_keys)
    113         values = index[key]
    114         if len(values) > 1:
--> 115             raise ValueError("multiple values for unique attribute %r: %r" % (key, values))
    116         if values:
    117             attributes['GRIB_' + key] = values[0]

ValueError: multiple values for unique attribute 'longitudeOfFirstGridPointInDegrees': [0.0, 0.5]

Add support of recent versions of Windows

At the moment cfgrib supports only Linux and MacOS.

For Windows we need:

ecCodes support (experimental)
CFFI support and DLL search capabilities
popular Python distributions
- conda
loose ends
- write support (fix needed in ecCodes)
- broken .idx locking

Missing variables when getting the "DatasetBuildError: key present and new value is different" warning

When attempting to read grib file from EFAS which contained 3 variables, cfgrib fails with the error below.

The grib contains 6 hourly time steps 0/to/240/by/6 for var sd which is read ok as the first variable of 41 steps. The remaining params tp06 and dis06 are values for total precipitation and discharge over the last 6 hours and only have 40 time steps, timesteps are written a range 0-6 6-12 ... 234-240.

Reading the variables from separate gribs works ok. It seems due to the number of steps between the 2 variables not being equal.
I cannot upload the grib but drop me an email and I will send it to you.

>>> cfgrib.__version__
'0.9.6.post1'
>>> eccodes.__version__
'2.12.0'
>>> ds = xr.open_dataset('test/all_20190201.grb',engine='cfgrib')
skipping variable: paramId==240023 shortName='dis06'
Traceback (most recent call last):
  File "/home/ma/maca/.local/lib/python3.6/site-packages/cfgrib/data

set.py", line 467, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/home/ma/maca/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 430, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='step' value=41 new_value=40
skipping variable: paramId==260267 shortName='tp06'
Traceback (most recent call last):
  File "/home/ma/maca/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 467, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/home/ma/maca/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 430, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='step' value=41 new_value=40
>>>

Add 'unstructured_grid' support to write function

Currently cfgrib can handle the gridType unstructured_grid when reading files (see #28). However, when writing a GRIB file using this data an error is raised NotImplementedError: Unsupported 'gridType': u'unstructured_grid'. Is it to difficult to implement this feature?

Cannot `import cfgrib` if ecCodes is not present on the system.

It would be nice to be able to import cfgrib even when no ecCodes is present, for example to build the documentation or explore the functions signatures. We even have a small helper in place to raise the exception lazily RaiseOnAttributeAccess, but we use it incorrectly as we access some lib attributes during module import.

Best would be to access lib attributes only inside functions so the exception is raised lazily.

KeyError: 'GRIB_name' - Problem reading NCEP NGAC data

First off I think this project is an awesome idea and I am interested.

I'm trying this out on some of the NCEP data. Particularly the NGAC data.

http://www.ftp.ncep.noaa.gov/data/nccf/com/ngac/prod/

I am trying to read in some of the AOD variables and such but am getting an error

from cfgrib import xarray_store

xarray_store.open_dataset('ngac.t00z.a2df105.grib2',filter_by_keys={'typeOfLevel':'atmosphere'})
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-79-1033cece9edc> in <module>()
----> 1 test = xarray_store.open_dataset('ngac.t00z.a2df105.grib2',filter_by_keys={'typeOfLevel':'atmosphere'})

/naqfc/save/Barry.Baker/envs/monet/lib/python3.6/site-packages/cfgrib/xarray_store.py in open_dataset(path, flavour_name, filter_by_keys, errors, **kwargs)
    161         if k.startswith('encode_'):
    162             overrides[k] = kwargs.pop(k)
--> 163     store = GribDataStore.frompath(path, **overrides)
    164     return _open_dataset(store, **kwargs)
    165 

/naqfc/save/Barry.Baker/envs/monet/lib/python3.6/site-packages/cfgrib/xarray_store.py in frompath(cls, path, flavour_name, errors, **kwargs)
    102         config = flavour.pop('dataset', {}).copy()
    103         config.update(kwargs)
--> 104         return cls(ds=cfgrib.Dataset.frompath(path, errors=errors, **config), **flavour)
    105 
    106     def __attrs_post_init__(self):

/naqfc/save/Barry.Baker/envs/monet/lib/python3.6/site-packages/cfgrib/dataset.py in frompath(cls, path, mode, errors, **kwargs)
    379     def frompath(cls, path, mode='r', errors='ignore', **kwargs):
    380         stream = messages.Stream(path, mode, message_class=cfmessage.CfMessage, errors=errors)
--> 381         return cls(stream=stream, **kwargs)
    382 
    383     def __attrs_post_init__(self):

<attrs generated init c839a147c90eb3321ada82313ce86e3ade1b1758> in __init__(self, stream, encode_parameter, encode_time, encode_vertical, encode_geography, filter_by_keys)
      6     self.encode_geography = encode_geography
      7     self.filter_by_keys = filter_by_keys
----> 8     self.__attrs_post_init__()

/naqfc/save/Barry.Baker/envs/monet/lib/python3.6/site-packages/cfgrib/dataset.py in __attrs_post_init__(self)
    382 
    383     def __attrs_post_init__(self):
--> 384         dims, vars, attrs = build_dataset_components(**self.__dict__)
    385         self.dimensions = dims  # type: T.Dict[str, T.Optional[int]]
    386         self.variables = vars  # type: T.Dict[str, Variable]

/naqfc/save/Barry.Baker/envs/monet/lib/python3.6/site-packages/cfgrib/dataset.py in build_dataset_components(stream, encode_parameter, encode_time, encode_vertical, encode_geography, filter_by_keys)
    352         var_index = index.subindex(paramId=param_id)
    353         dims, data_var, coord_vars = build_data_var_components(
--> 354             var_index, encode_parameter, encode_time, encode_geography, encode_vertical,
    355         )
    356         if encode_parameter and var_name not in ('undef', 'unknown'):

/naqfc/save/Barry.Baker/envs/monet/lib/python3.6/site-packages/cfgrib/dataset.py in build_data_var_components(index, encode_parameter, encode_time, encode_geography, encode_vertical, log)
    268         if data_var_attrs.get('GRIB_cfName'):
    269             data_var_attrs['standard_name'] = data_var_attrs['GRIB_cfName']
--> 270         data_var_attrs['long_name'] = data_var_attrs['GRIB_name']
    271         data_var_attrs['units'] = data_var_attrs['GRIB_units']
    272 

KeyError: 'GRIB_name'

Now these variables are present in the NCEP grib tables but seems to be missing the GRIB_name attribute. From wgrib2

$ wgrib2 colmd.grb2 -set_ext_name 1 -netcdf junk_ext_name.nc
1:0:d=2016081200:COLMD.aerosol=Total_Aerosol.aerosol_size_<1e-05.:entire atmosphere:anl:
2:71357:d=2016081200:COLMD.aerosol=Total_Aerosol.aerosol_size_<2.5e-06.:entire atmosphere:anl:
3:160149:d=2016081200:COLMD.aerosol=Dust_Dry.aerosol_size_<2.5e-06.:entire atmosphere:anl:
4:162022:d=2016081200:COLMD.aerosol=Sea_Salt_Dry.aerosol_size_<2.5e-06.:entire atmosphere:anl:
5:226224:d=2016081200:COLMD.aerosol=Black_Carbon_Dry.aerosol_size_<2.36e-08.:entire atmosphere:anl:
6:275933:d=2016081200:COLMD.aerosol=Particulate_Organic_Matter_Dry.aerosol_size_<4.24e-08.:entire atmosphere:anl:
7:312737:d=2016081200:COLMD.aerosol=Sulphate_Dry.aerosol_size_<2.5e-06.:entire atmosphere:anl:
-bash-4.2$ wgrib2 ngac.t00z.a2df105.grib2 
1:0:d=2018091200:ASYSFK:entire atmosphere:105 hour fcst:aerosol=Total Aerosol:aerosol_size <2e-05:aerosol_wavelength >=3.38e-07,<=3.42e-07
2:88995:d=2018091200:SSALBK:entire atmosphere:105 hour fcst:aerosol=Total Aerosol:aerosol_size <2e-05:aerosol_wavelength >=3.38e-07,<=3.42e-07
3:173734:d=2018091200:AOTK:entire atmosphere:105 hour fcst:aerosol=Total Aerosol:aerosol_size <2e-05:aerosol_wavelength >=5.45e-07,<=5.65e-07
4:257495:d=2018091200:AOTK:entire atmosphere:105 hour fcst:aerosol=Dust Dry:aerosol_size <2e-05:aerosol_wavelength >=5.45e-07,<=5.65e-07
5:308070:d=2018091200:AOTK:entire atmosphere:105 hour fcst:aerosol=Sea Salt Dry:aerosol_size <2e-05:aerosol_wavelength >=5.45e-07,<=5.65e-07
6:381840:d=2018091200:AOTK:entire atmosphere:105 hour fcst:aerosol=Sulphate Dry:aerosol_size <7e-07:aerosol_wavelength >=5.45e-07,<=5.65e-07
7:434792:d=2018091200:AOTK:entire atmosphere:105 hour fcst:aerosol=Particulate Organic Matter Dry:aerosol_size <7e-07:aerosol_wavelength >=5.45e-07,<=5.65e-07
8:496510:d=2018091200:AOTK:entire atmosphere:105 hour fcst:aerosol=Black Carbon Dry:aerosol_size <7e-07:aerosol_wavelength >=5.45e-07,<=5.65e-07
9:551432:d=2018091200:AEMFLX:entire atmosphere:105 hour fcst:aerosol=Dust Dry:aerosol_size <2e-05:
10:559057:d=2018091200:SEDMFLX:entire atmosphere:105 hour fcst:aerosol=Dust Dry:aerosol_size <2e-05:
11:598927:d=2018091200:DDMFLX:entire atmosphere:105 hour fcst:aerosol=Dust Dry:aerosol_size <2e-05:
12:646148:d=2018091200:WLSMFLX:entire atmosphere:105 hour fcst:aerosol=Dust Dry:aerosol_size <2e-05:
13:684570:d=2018091200:WDCPMFLX:entire atmosphere:105 hour fcst:aerosol=Dust Dry:aerosol_size <2e-05:
14:718763:d=2018091200:AEMFLX:entire atmosphere:105 hour fcst:aerosol=Sea Salt Dry:aerosol_size <2e-05:
15:766737:d=2018091200:SEDMFLX:entire atmosphere:105 hour fcst:aerosol=Sea Salt Dry:aerosol_size <2e-05:
16:830919:d=2018091200:DDMFLX:entire atmosphere:105 hour fcst:aerosol=Sea Salt Dry:aerosol_size <2e-05:
17:897610:d=2018091200:WLSMFLX:entire atmosphere:105 hour fcst:aerosol=Sea Salt Dry:aerosol_size <2e-05:
18:952870:d=2018091200:WDCPMFLX:entire atmosphere:105 hour fcst:aerosol=Sea Salt Dry:aerosol_size <2e-05:
19:1000818:d=2018091200:AEMFLX:entire atmosphere:105 hour fcst:aerosol=Black Carbon Dry:aerosol_size <2.36e-08:
20:1020808:d=2018091200:SEDMFLX:entire atmosphere:105 hour fcst:aerosol=Black Carbon Dry:aerosol_size <7e-07:
21:1066505:d=2018091200:DDMFLX:entire atmosphere:105 hour fcst:aerosol=Black Carbon Dry:aerosol_size <7e-07:
22:1104627:d=2018091200:WLSMFLX:entire atmosphere:105 hour fcst:aerosol=Black Carbon Dry:aerosol_size <7e-07:
23:1140259:d=2018091200:WDCPMFLX:entire atmosphere:105 hour fcst:aerosol=Black Carbon Dry:aerosol_size <7e-07:
24:1168276:d=2018091200:AEMFLX:entire atmosphere:105 hour fcst:aerosol=Particulate Organic Matter Dry:aerosol_size <7e-07:
25:1186046:d=2018091200:SEDMFLX:entire atmosphere:105 hour fcst:aerosol=Particulate Organic Matter Dry:aerosol_size <7e-07:
26:1231200:d=2018091200:DDMFLX:entire atmosphere:105 hour fcst:aerosol=Particulate Organic Matter Dry:aerosol_size <7e-07:
27:1280346:d=2018091200:WLSMFLX:no_level:105 hour fcst:aerosol=Particulate Organic Matter Dry:aerosol_size <7e-07:
28:1327429:d=2018091200:WDCPMFLX:no_level:105 hour fcst:aerosol=Particulate Organic Matter Dry:aerosol_size <7e-07:
29:1359326:d=2018091200:MASSDEN:surface:105 hour fcst:aerosol=Dust Dry:aerosol_size <1e-05:
30:1391056:d=2018091200:MASSDEN:surface:105 hour fcst:aerosol=Dust Dry:aerosol_size <2.5e-06:
31:1427968:d=2018091200:COLMD:entire atmosphere:105 hour fcst:aerosol=Dust Dry:aerosol_size <1e-05:
32:1468168:d=2018091200:COLMD:entire atmosphere:105 hour fcst:aerosol=Dust Dry:aerosol_size <2.5e-06:

Provide version attribute

It's a convention for Python projects to set the __version__ attribute to a string representing the current version. This facilitates debugging and version checking by downstream projects (e.g., xarray).

Define and document a strategy to change data model with cf2cdm

cf2cdm is a small infrastructure to translate between different CF data models, for example the ECMWF (e.g. vertical pressure levels are named level and have hPa as unit) and CDS (e.g. vertical pressure levels are named plev and have Pa as unit).

This is useful both on read to convert from the canonical data model to any other and on write to convert a generic CF dataset to canonical form.

Inconsistente naming of `air_pressure` coordinate

All other level variables are named after the typeOfLevel except isobaricInhPa.

This is a major user-visible change, to do during the beta, but the inconsistency is too big to leave it.

Add a grib to netcdf conversion scirpt

This can be done easily with xarray and netcdf4.

Missing `v` component in nam.t06z.awip3d00.tm00.grib2 after `filter_by_keys`

I think I may have found a bug or at least something I do not understand in the implementation of the filter_by_keys argument.

Here is some output I receive when trying to open one of those NAM grib files:

>>> xr.open_dataset('nam.t06z.awip3d00.tm00.grib2',
                    engine='cfgrib',
                    backend_kwargs={
                        'filter_by_keys': {'typeOfLevel': 'isobaricInhPa'},
                        'errors': 'ignore'
                    })

skipping variable: paramId==3041 shortName='absv'
Traceback (most recent call last):
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 429, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 407, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='isobaricInhPa' value=39 new_value=7
skipping variable: paramId==1 shortName='strf'
Traceback (most recent call last):
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 430, in build_dataset_components
    dict_merge(variables, vars)
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 407, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='isobaricInhPa' value=Variable(dimensions=('isobaricInhPa',), data=array([1000,  975,  950,  925,  900,  875,  850,  825,  800,  775,  750,
        725,  700,  675,  650,  625,  600,  575,  550,  525,  500,  475,
        450,  425,  400,  375,  350,  325,  300,  275,  250,  225,  200,
        175,  150,  125,  100,   75,   50])) new_value=Variable(dimensions=(), data=250)
skipping variable: paramId==3017 shortName='dpt'
Traceback (most recent call last):
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 429, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 407, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='isobaricInhPa' value=39 new_value=6
skipping variable: paramId==260022 shortName='mconv'
Traceback (most recent call last):
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 429, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 407, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='isobaricInhPa' value=39 new_value=2

<xarray.Dataset>
Dimensions:        (isobaricInhPa: 39, x: 185, y: 129)
Coordinates:
    time           datetime64[ns] ...
    step           timedelta64[ns] ...
  * isobaricInhPa  (isobaricInhPa) int64 1000 975 950 925 900 ... 125 100 75 50
    latitude       (y, x) float64 ...
    longitude      (y, x) float64 ...
    valid_time     datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
    gh             (isobaricInhPa, y, x) float32 ...
    t              (isobaricInhPa, y, x) float32 ...
    r              (isobaricInhPa, y, x) float32 ...
    q              (isobaricInhPa, y, x) float32 ...
    w              (isobaricInhPa, y, x) float32 ...
    u              (isobaricInhPa, y, x) float32 ...
    tke            (isobaricInhPa, y, x) float32 ...
    clwmr          (isobaricInhPa, y, x) float32 ...
    cice           (isobaricInhPa, y, x) float32 ...
    snmr           (isobaricInhPa, y, x) float32 ...
    strf           (y, x) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP 
    history:                 GRIB to CDM+CF via cfgrib-0.9.5.1/ecCodes-2.8.0 ...

You can see that it successfully returns a Dataset, but looking at the variables it returns, there is the U component of the wind but not the V component of the wind. I'm not sure why this is the case, since I've inspected the grib and find nothing apparent wrong with the v-winds. I've additionally tried this on other NAM gribs with similar results (even in your comment above on Sept. 30 this was the case).

I am unsure if this is an error on my part or if there is a way around this.
Thank you.

Originally posted by @bbonenfant in #2 (comment)

"negative seek position" error

I'm trying to open ERA5 pressure level data inside a grib-file downloaded from the CDS (see the bottom of the post for the download-command).

I'm trying to open the data at different time-indices like this, which works fine for any value of t between 0 and 18:

ds = cfgrib.open_file(r'F:\era5_for_wam2\pl_2008_01.grib')
var = ds.variables['q']

t = 18
q = var.data[t,:,:,:]

However, as soon as t is larger than 18, e.g.:

q = var.data[19,:,:,:]

I get the following error:

Traceback (most recent call last):

  File "<ipython-input-30-a8a5f441ed1e>", line 1, in <module>
    q = var.data[19,:,:,:]

  File "C:\ProgramData\Miniconda3\lib\site-packages\cfgrib\dataset.py", line 259, in __getitem__
    message = self.stream.message_from_file(file, offset=offset[0])

  File "C:\ProgramData\Miniconda3\lib\site-packages\cfgrib\messages.py", line 244, in message_from_file
    return self.message_class.from_file(file=file, offset=offset, **kwargs)

  File "C:\ProgramData\Miniconda3\lib\site-packages\cfgrib\messages.py", line 66, in from_file
    file.seek(offset)

ValueError: negative seek position -5

I have many files like the one in this example (pl_2008_01.grib, pl_2008_02.grib, pl_2008_03.grib, etc.), and each of them crashes when trying to open data with a t > 18.

Any ideas what is causing this and how to solve this? I'm on a windows machine, Python=3.7.1 and cfgrib=0.9.6

I'm getting my grib files like this:

import cdsapi
c = cdsapi.Client()
c.retrieve(
            'reanalysis-era5-pressure-levels',
            {
                'product_type':'reanalysis',
                'format':'grib',
                'pressure_level':[
                    '50','150','250','350',
                    '450','550','650',
                    '750','800','850',
                    '900','950','1000'
                ],
                'month':'1',
                'day':[
                '01','02','03',
                '04','05','06',
                '07','08','09',
                '10','11','12',
                '13','14','15',
                '16','17','18',
                '19','20','21',
                '22','23','24',
                '25','26','27',
                '28','29','30',
                '31'
            ],
                'time':
                    [
                    '00:00','01:00','02:00',
                    '03:00','04:00','05:00',
                    '06:00','07:00','08:00',
                    '09:00','10:00','11:00',
                    '12:00','13:00','14:00',
                    '15:00','16:00','17:00',
                    '18:00','19:00','20:00',
                    '21:00','22:00','23:00',],
                'year':'2008',
                'variable':[
                    'specific_humidity','u_component_of_wind','v_component_of_wind',
                    'vertical_velocity'
                ]
            },
            r'F:\era5_for_wam2\pl_2008_01.grib')

Missing CODES_MISSING_LONG, CODES_MISSING_DOUBLE constants

It would be useful to be able to access the following constants defined in ecCodes from cfgrib:

GRIB_MISSING_LONG, GRIB_MISSING_DOUBLE (defined in ecCodes grib_api.h)
CODES_MISSING_LONG, CODES_MISSING_DOUBLE (defined in ecCodes eccodes.h)

Add comprehensive and possibly automatic checking of CF Conventions compliance

Converted files fail on CF-Conventions checkers.

Using the Reading University on-line checker on current master we fail in an embarrassing way: WARN: (2.6.1): No 'Conventions' attribute present 😞.

Including some level of compliance checking in unit tests would be ideal.

cfgrib misses some variables in GFS grib data

Dear all,
I am working with some historical GFS grib files. Not all Variables that are in the file (checked with wgrib2 -v) could be read.

1:0:d=2017020212:TMP Temperature [K]:850 mb:324 hour fcst:
2:2188:d=2017020212:RH Relative Humidity [%]:850 mb:324 hour fcst:
3:4879:d=2017020212:TMP Temperature [K]:1000 mb:324 hour fcst:
4:7067:d=2017020212:RH Relative Humidity [%]:1000 mb:324 hour fcst:
5:9507:d=2017020212:PRES Pressure [Pa]:surface:324 hour fcst:
6:13453:d=2017020212:TMP Temperature [K]:surface:324 hour fcst:
7:15893:d=2017020212:SNOD Snow Depth [m]:surface:324 hour fcst:
8:17771:d=2017020212:TMP Temperature [K]:2 m above ground:324 hour fcst:
9:20964:d=2017020212:DPT Dew Point Temperature [K]:2 m above ground:324 hour fcst:
10:23404:d=2017020212:RH Relative Humidity [%]:2 m above ground:324 hour fcst:
11:25592:d=2017020212:SUNSD Sunshine Duration [s]:surface:324 hour fcst:
12:29287:d=2017020212:DSWRF Downward Short-Wave Radiation Flux [W/m^2]:surface:312-324 hour ave fcst:
13:31248:d=2017020212:DLWRF Downward Long-Wave Rad. Flux [W/m^2]:surface:312-324 hour ave fcst:
14:33209:d=2017020212:USWRF Upward Short-Wave Radiation Flux [W/m^2]:surface:312-324 hour ave fcst:
15:34919:d=2017020212:ULWRF Upward Long-Wave Rad. Flux [W/m^2]:surface:312-324 hour ave fcst:

Only the

SNOD
SUNSD
RH
TMP
PRES

variables and one of the radiation could be detected by cfgrib xarray_store.open_dataset()

gfs.0p25.2017020212.f324.grib2.spasub.zip

Some NCEP NGAC products from NOAA are unusable.

As reported in #16 a class of GRIB2 products distributed by NOAA (http://www.ftp.ncep.noaa.gov/data/nccf/com/ngac/prod/) are completely unusable, due to ecCodes (v2.6.0) not recognising the the parameter metadata, for example:

$ grib_ls -n parameter ngac.t00z.a2df105.grib2 
ngac.t00z.a2df105.grib2
centre      paramId     shortName   units       name        
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
kwbc        0           unknown     unknown     unknown    
32 of 32 messages in ngac.t00z.a2df105.grib2

32 of 32 total messages in 1 files

As we rely on ecCodes shortName key to identify the variables the resulting Dataset is completely unusable.

This does not appear to be a bug of cfgrib, it may be that ecCodes doesn't support the specific data or, according to @bbakernoaa, there may be issues with the data, so I labeled it wontfix. I keep it open anyhow for documentation purpose.

cc @StephanSiemen

ValueError: multiple values for unique key... on GRIB files with more than one value of a key per variable.

I am able to successfully load the test grib file that was suggested in the README, however when I try to read a grib file such as the one below I get the following error output.

> import cfgrib
> ds = cfgrib.Dataset.frompath('nam.t00z.awip1200.tm00.grib2')
Traceback (most recent call last):
File "<stdin>", line 1, in
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 482, in frompath
return cls(stream=messages.Stream(path, mode=mode, message_class=CfMessage), **kwargs)
File "<attrs generated init baa5906ed7dcdc8b722f343b3fe827a76110eccb>", line 7, in init
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 485, in attrs_post_init
dims, vars, attrs = build_dataset_components(**self.dict)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 457, in build_dataset_components
var_index, encode_parameter, encode_time, encode_geography, encode_vertical,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 372, in build_data_var_components
data_var_attrs = enforce_unique_attributes(index, data_var_attrs_keys)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 150, in enforce_unique_attributes
raise ValueError("multiple values for unique attribute %r: %r" % (key, values))
ValueError: multiple values for unique attribute 'typeOfLevel': ['hybrid', 'cloudBase', 'unknown', 'cloudTop']

I've tried with both grib1 and grib2 file types and it seems the formatting is incorrect for all the files I've tried. Any suggestions?

Weird issue with 'step' value from seasonal forecasts

I have downloaded a dataset from seasonal-monthly-single-levels with starting month 1 and lead-time from 1 to 4 for several years.
I have loaded the grib file with:
d = xr.open_dataset('/path/to/grib', engine = 'cfgrib')
and in the step field I get some strange numbers:

<xarray.DataArray 'step' (step: 7)>
array([ 2678400000000000,  5097600000000000,  5184000000000000,
        7776000000000000,  7862400000000000, 10368000000000000,
       10454400000000000], dtype='timedelta64[ns]')
Coordinates:
  * step     (step) timedelta64[ns] 31 days 59 days ... 120 days 121 days
    surface  int64 ...
Attributes:
    standard_name:  forecast_period
    long_name:      time since forecast_reference_time

Apparently, the steps take into account the variable number of days for February. This is a big problem because for some values of step I have NaN in some years with the impossibility to calculate climatologies for example.

Instead, I would expect to have step with the same size of leadtime_month. Is this an expected behaviour? Can you suggest a way to deal with this?

Add support to write a `xarray.Dataset` to a GRIB file.

Due to the fact that the NetCDF data model is much more free and extensible than the GRIB one, it is not possible to write a generic xarray.Dataset to a GRIB file. The aim for cfgrib is to implement write support for a subset of carefully crafted datasets that fit the GRIB data model.

In particular the only coordinates that we target at the moment are the one returned by opening a GRIB with the cfgrib flavour of cfgrib.open_dataset, namely:

number, time, step, a vertical coordinate (isobaricInhPa, heightAboveGround, surface, etc), and the horizontal coordinates (for example latitude and longitude for a regular_ll grid type).

Note that all passed GRIB_ attributes are used to set keys in the output file, it is left to the user to ensure coherence among them.

Some of the keys are autodetected from the coordinates, namely:

Horizontal coordinates gridTypes:

regular: regular_ll and regular_gg
not target: projected: lambert, etc (can be controlled with GRIB_ attributes)
not target: reduced: reduced_ll and reduced_gg (can be controlled with GRIB_ attributes)

Vertical coordinates typeOfLevel:

single level: surface, meanSea, etc.
pressure: isobaricInhPa and isobaricInPa
other: hybrid

GRIB edition:

GRIB2
GRIB1

No latitudes/longitudes provided by ecCodes for gridType

cfgrib relies on the ECMWF ecCodes C-library for encoding and decoding the GRIB format, including everything related to the coordinate systems. GRIB files encoded in a gridType not supported by the installed ecCodes version will print the warning: No latitudes/longitudes provided by ecCodes for gridType.

The GRIB file will be opened but the geographic portion of the data will be represented by as single dimension without coordinate named values.

For example:

>>> cfgrib.open_dataset('../ST4.2018080204.01h')
No latitudes/longitudes provided by ecCodes for gridType = 'polar_stereographic'
<xarray.Dataset>
Dimensions:     (values: 987601)
Coordinates:
    time        datetime64[ns] ...
    step        timedelta64[ns] ...
    surface     int64 ...
    valid_time  datetime64[ns] ...
Dimensions without coordinates: values
Data variables:
    tp          (values) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          4
    history:                 GRIB to CDM+CF via cfgrib-0.9.../ecCodes-2.8...

The list of known supported gridTypes is:

latitude and longitude as dimension coordinates
- regular_ll
- regular_gg
xand y as dimensions and latitude and longitude as non-dimension coordinates
- rotated_ll
- rotated_gg
- lambert
- lambert_azimuthal_equal_area
- albers
- polar_stereographic (ecCodes > 2.9.0)
values as dimension and latitude and longitude as non-dimension coordinates
- reduced_ll
- reduced_gg
- unstructured_grid

see http://xarray.pydata.org/en/stable/data-structures.html#coordinates for details on xarray naming conventions.

Initially noted in #27.

Issue when loading a ERA5 GRIB file from MARS

I have downloaded a ERA5 GRIB from the moda (monthly means) stream. Here the code to download the file to replicate the issue:

#!/usr/bin/env python
from ecmwfapi import ECMWFDataServer
server = ECMWFDataServer()
server.retrieve({
    "class": "ea",
    "dataset": "era5",
    "date": "20000101/20000201",
    "decade": "2000",
    "expver": "1",
    "levtype": "sfc",
    "param": "167.128",
    "stream": "moda",
    "type": "an",
    "target": "ERA5-moda-test.grib",
})

(the download of the ~2MB file will take about 15 minutes! I hope that all the ERA5 streams will be moved soon in the CDS)

I have the latest version of cfgrib installed with pip in a Python 3.6 environment (MacOS). With this small piece of code:

from cfgrib import xarray_store
x = xarray_store.open_dataset('ERA5-moda-test.grib')

I get this error.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-68ffc55e9926> in <module>()
----> 1 x = xarray_store.open_dataset('ERA5-moda-test.grib')

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/xarray_store.py in open_dataset(path, flavour_name, filter_by_keys, **kwargs)
    160         if k.startswith('encode_'):
    161             overrides[k] = kwargs.pop(k)
--> 162     store = GribDataStore.frompath(path, **overrides)
    163     return _open_dataset(store, **kwargs)
    164 

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/xarray_store.py in frompath(cls, path, flavour_name, **kwargs)
    102         config = flavour.pop('dataset', {}).copy()
    103         config.update(kwargs)
--> 104         return cls(ds=cfgrib.Dataset.frompath(path, **config), **flavour)
    105 
    106     def __attrs_post_init__(self):

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/dataset.py in frompath(cls, path, mode, **kwargs)
    393     @classmethod
    394     def frompath(cls, path, mode='r', **kwargs):
--> 395         return cls(stream=messages.Stream(path, mode, message_class=cfmessage.CfMessage), **kwargs)
    396 
    397     def __attrs_post_init__(self):

<attrs generated init c839a147c90eb3321ada82313ce86e3ade1b1758> in __init__(self, stream, encode_parameter, encode_time, encode_vertical, encode_geography, filter_by_keys)
      6     self.encode_geography = encode_geography
      7     self.filter_by_keys = filter_by_keys
----> 8     self.__attrs_post_init__()

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/dataset.py in __attrs_post_init__(self)
    396 
    397     def __attrs_post_init__(self):
--> 398         dims, vars, attrs = build_dataset_components(**self.__dict__)
    399         self.dimensions = dims  # type: T.Dict[str, T.Optional[int]]
    400         self.variables = vars  # type: T.Dict[str, Variable]

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/dataset.py in build_dataset_components(stream, encode_parameter, encode_time, encode_vertical, encode_geography, filter_by_keys)
    360         filter_by_keys={},
    361 ):
--> 362     index = stream.index(ALL_KEYS).subindex(filter_by_keys)
    363     param_ids = index['paramId']
    364     dimensions = collections.OrderedDict()

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/messages.py in index(self, index_keys)
    267 
    268     def index(self, index_keys):
--> 269         return Index.fromstream(stream=self, index_keys=index_keys)

~/miniconda2/envs/cds/lib/python3.6/site-packages/cfgrib/messages.py in fromstream(cls, stream, index_keys)
    198                 header_values.append(value)
    199             offset = message.message_get('offset', eccodes.CODES_TYPE_LONG)
--> 200             offsets.setdefault(tuple(header_values), []).append(offset)
    201         return cls(stream=stream, index_keys=index_keys, offsets=offsets)
    202 

TypeError: unhashable type: 'list'

Here my conda environment:

$ conda list
# packages in environment at /Users/matteodefelice/miniconda2/envs/cds:
#
# Name                    Version                   Build  Channel
appnope                   0.1.0                    py36_0    conda-forge
asn1crypto                0.24.0                     py_1    conda-forge
attrs                     18.1.0                    <pip>
backcall                  0.1.0                      py_0    conda-forge
bokeh                     0.13.0                   py36_0    conda-forge
bottleneck                1.2.1            py36h7eb728f_1    conda-forge
ca-certificates           2018.4.16                     0    conda-forge
cartopy                   0.16.0                   py36_0    conda-forge
cdsapi                    0.1.1                     <pip>
certifi                   2018.4.16                py36_0    conda-forge
cffi                      1.11.5                   py36_0    conda-forge
cfgrib                    0.8.2                     <pip>
cftime                    1.0.0            py36h7eb728f_1    conda-forge
chardet                   3.0.4                    py36_2    conda-forge
chardet                   3.0.4                     <pip>
clangdev                  6.0.0                 default_0    conda-forge
click                     6.7                        py_1    conda-forge
click-plugins             1.0.3                     <pip>
cligj                     0.4.0                     <pip>
cloudpickle               0.5.3                      py_0    conda-forge
cryptography              2.3              py36hdffb7b8_0    conda-forge
cryptography-vectors      2.3                      py36_0    conda-forge
curl                      7.61.0               h93b3f91_0    conda-forge
cycler                    0.10.0                     py_1    conda-forge
cytoolz                   0.9.0.1          py36h470a237_0    conda-forge
dask                      0.18.2                     py_0    conda-forge
dask-core                 0.18.2                     py_0    conda-forge
decorator                 4.3.0                      py_0    conda-forge
distributed               1.22.0                   py36_0    conda-forge
eccodes                   2.8.0                         0    conda-forge
ecmwf-api-client          1.5.0                     <pip>
Fiona                     1.7.13                    <pip>
freetype                  2.8.1                         0    conda-forge
future                    0.16.0                    <pip>
geopandas                 0.4.0                     <pip>
geos                      3.6.2                hfc679d8_2    conda-forge
h5netcdf                  0.6.1                      py_0    conda-forge
h5py                      2.8.0            py36h470a237_0    conda-forge
hdf4                      4.2.13                        0    conda-forge
hdf5                      1.10.1                        2    conda-forge
heapdict                  1.0.0                    py36_0    conda-forge
icu                       58.2                 hfc679d8_0    conda-forge
idna                      2.7                      py36_2    conda-forge
idna                      2.7                       <pip>
intel-openmp              2018.0.3                      0  
ipython                   6.4.0                    py36_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jasper                    1.900.1                       4    conda-forge
jedi                      0.12.1                   py36_0    conda-forge
jinja2                    2.10                       py_1    conda-forge
jpeg                      9c                   h470a237_0    conda-forge
kiwisolver                1.0.1                    py36_1    conda-forge
krb5                      1.14.6                        0    conda-forge
libcxx                    6.0.0                         0    conda-forge
libffi                    3.2.1                         3    conda-forge
libgfortran               3.0.1                h93005f0_2  
libiconv                  1.15                 h470a237_1    conda-forge
libnetcdf                 4.6.1                         2    conda-forge
libpng                    1.6.34               ha92aebf_1    conda-forge
libssh2                   1.8.0                h5b517e9_2    conda-forge
libtiff                   4.0.9                he6b73bb_1    conda-forge
libxml2                   2.9.8                h422b904_2    conda-forge
libxslt                   1.1.32               h88dbc4e_1    conda-forge
llvm-meta                 6.0.0                         0    conda-forge
llvmdev                   6.0.0                 default_4    conda-forge
locket                    0.2.0                      py_2    conda-forge
lxml                      4.2.3            py36hc9114bc_0    conda-forge
markupsafe                1.0                      py36_0    conda-forge
matplotlib                2.2.2                    py36_1    conda-forge
mkl                       2018.0.3                      1  
mkl_fft                   1.0.4                    py36_0    conda-forge
mkl_random                1.0.1                    py36_0    conda-forge
msgpack-python            0.5.6            py36h2d50403_2    conda-forge
munch                     2.3.2                     <pip>
ncurses                   6.1                           0    conda-forge
netcdf4                   1.4.0                    py36_0    conda-forge
numpy                     1.14.2           py36ha9ae307_1  
olefile                   0.45.1                     py_1    conda-forge
openssl                   1.0.2o                        0    conda-forge
owslib                    0.16.0                     py_0    conda-forge
packaging                 17.1                       py_0    conda-forge
pandas                    0.23.3                   py36_0    conda-forge
parso                     0.3.1                      py_0    conda-forge
partd                     0.3.8                      py_1    conda-forge
pexpect                   4.6.0                    py36_0    conda-forge
pickleshare               0.7.4                    py36_0    conda-forge
pillow                    5.2.0                    py36_0    conda-forge
pip                       18.0                     py36_0    conda-forge
proj4                     4.9.3                         5    conda-forge
prompt_toolkit            1.0.15                   py36_0    conda-forge
psutil                    5.4.6                    py36_0    conda-forge
ptyprocess                0.6.0                    py36_0    conda-forge
pycparser                 2.18                       py_1    conda-forge
pycparser                 2.18                      <pip>
pyepsg                    0.3.2                      py_1    conda-forge
pygments                  2.2.0                      py_1    conda-forge
pyopenssl                 18.0.0                   py36_0    conda-forge
pyparsing                 2.2.0                      py_1    conda-forge
pyproj                    1.9.5.1                  py36_0    conda-forge
pyshp                     1.2.12                     py_0    conda-forge
pysocks                   1.6.8                    py36_1    conda-forge
python                    3.6.5                         1    conda-forge
python-dateutil           2.7.3                      py_0    conda-forge
pytz                      2018.5                     py_0    conda-forge
pyyaml                    3.12                     py36_1    conda-forge
readline                  7.0                  haf1bffa_1    conda-forge
regionmask                0.4.0                     <pip>
requests                  2.19.1                    <pip>
requests                  2.19.1                   py36_1    conda-forge
scipy                     1.1.0            py36hcaad992_0  
setuptools                40.0.0                   py36_0    conda-forge
shapely                   1.6.4            py36h164cb2d_1    conda-forge
simplegeneric             0.8.1                      py_1    conda-forge
six                       1.11.0                   py36_1    conda-forge
sortedcontainers          2.0.4                      py_1    conda-forge
sqlite                    3.20.1                        0    conda-forge
tblib                     1.3.2                      py_1    conda-forge
tk                        8.6.8                         0    conda-forge
toolz                     0.9.0                      py_0    conda-forge
tornado                   5.1                      py36_0    conda-forge
traitlets                 4.3.2                    py36_0    conda-forge
typing                    3.6.4                     <pip>
urllib3                   1.23                      <pip>
urllib3                   1.23                     py36_0    conda-forge
wcwidth                   0.1.7                      py_1    conda-forge
wheel                     0.31.1                   py36_0    conda-forge
xarray                    0.10.8                   py36_0    conda-forge
xz                        5.2.3                         0    conda-forge
yaml                      0.1.7                         0    conda-forge
zict                      0.1.3                      py_0    conda-forge
zlib                      1.2.11               h470a237_3    conda-forge

ValueError: key present and new value is different...

@darrenleeweber has reported on #2 a problem that deserves it's own issue:

Thanks, also bumped into this while trying to read a GFS grib2 file, e.g.

import cfgrib
ds = cfgrib.Dataset.frompath('gfs_4_20110807_0000_000.grb2')
# snipped traceback
~/anaconda3/lib/python3.6/site-packages/cfgrib/dataset.py in enforce_unique_attributes(index, attributes_keys)
    113         values = index[key]
    114         if len(values) > 1:
--> 115             raise ValueError("multiple values for unique attribute %r: %r" % (key, values))
    116         if values and values[0] not in ('undef', 'unknown'):
    117             attributes['GRIB_' + key] = values[0]

ValueError: multiple values for unique attribute 'typeOfLevel': ['isobaricInhPa', 'tropopause', 'maxWind', 'isothermZero', 'unknown', 'potentialVorticity']

The work around seems to work, but hits another snag for this particular data example, i.e.

ds = cfgrib.Dataset.frompath('gfs_4_20110807_0000_000.grb2', filter_by_keys={'typeOfLevel': 'isobaricInhPa'})
# snipped traceback
~/anaconda3/lib/python3.6/site-packages/cfgrib/dataset.py in build_dataset_components(stream, encode_parameter, encode_time, encode_vertical, encode_geography, filter_by_keys)
    374         vars = collections.OrderedDict([(short_name, data_var)])
    375         vars.update(coord_vars)
--> 376         dict_merge(dimensions, dims)
    377         dict_merge(variables, vars)
    378     attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS)

~/anaconda3/lib/python3.6/site-packages/cfgrib/dataset.py in dict_merge(master, update)
    353         else:
    354             raise ValueError("key present and new value is different: "
--> 355                              "key=%r value=%r new_value=%r" % (key, master[key], value))
    356 
    357 

ValueError: key present and new value is different: key='air_pressure' value=26 new_value=25

It's not easy to figure out if this is cfgrib or the data is not conforming.

Depend on xarray>=0.11.0 to reduce complexity

xarray v0.11 introduced a native cfgrib backend and the only supported version older than that is v0.10.9 (as we depend on a new internal API) and it is not worth to keep the complexity just for it.

Using cfgrib to open Stage-IV rainfall data

Hi,

I was wondering if it's possible to use cfgrib to open Stage-IV GRIB rainfall data (downloaded here). I attempted this in Jupyter notebook:

cfgrib.open_dataset('/glade/scratch/doughert/stage_iv/2008/ST4.2008031900.01h')

And received the following error:

EcCodesError: ('Key/value not found (-10).', -10)

During handling of the above exception, another exception occurred:
..
KeyError: 'latitudes

I am guessing this is due to the structure of the Stage-IV data, which is seen by opening the dataset via the usual xarray function:

xr.open_dataset('/glade/scratch/doughert/stage_iv/2008/ST4.2008031900.01h', engine='pynio')

<xarray.Dataset>
Dimensions:                 (g5_x_0: 881, g5_y_1: 1121)
Coordinates:
    g5_lat_0                (g5_x_0, g5_y_1) float32 ...
    g5_lon_1                (g5_x_0, g5_y_1) float32 ...
Dimensions without coordinates: g5_x_0, g5_y_1
Data variables:
    g5_rot_2                (g5_x_0, g5_y_1) float32 ...
    VAR_237_GDS5_SFC_acc1h  (g5_x_0, g5_y_1) float32 ...

Where the coordinates and variables aren't the usual "latitude" and "longitude" I've seen in the examples of cfgrib so far. So does this mean this package would be unusable for Stage IV data?

Define last Python2 compatible version and follow xarray in being Python3 only.

Starting from version 0.9.7 cfgrib is Python3 only in line with xarray 0.12.0.

We intend to release additional bugfix only versions of the Python 2 compatible branch https://github.com/ecmwf/cfgrib/tree/stable/0.9.6.x

Latest conda packages broken?

Thanks for providing this valuable package. I used with xarray and ERA5 data for lab exercises in a grad-level data science class a few weeks ago. It went very well.

Unfortunately, I'm now unable to run the same notebook that worked last week. Same environment, same Jupyterhub deployment. I have conda install lines early in the notebook to install eccodes and cfgrib. These were executed when I restarted kernel and ran all cells. So, likely a conda packaging. I've since tried multiple installations (and ordering) of cfgrib and eccodes with conda and pip. No luck. Any ideas on your end based on recent modifications?

conda install -y -c conda-forge eccodes

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda:

The following NEW packages will be INSTALLED:

    cfgrib:     0.9.6.1.post1-py_0 conda-forge
    eccodes:    2.12.3-h4fa793d_0  conda-forge
    future:     0.17.1-py36_1000   conda-forge
    libaec:     1.0.4-hf484d3e_0   conda-forge
    typing:     3.6.4-py36_0       conda-forge

The following packages will be UPDATED:

    ipyleaflet: 0.9.2-py36_1001    conda-forge --> 0.10.1-py36_0 conda-forge
    xarray:     0.10.9-py36_0      conda-forge --> 0.12.0-py_0   conda-forge

libaec-1.0.4-h 100% |################################| Time: 0:00:00  19.70 MB/s
future-0.17.1- 100% |################################| Time: 0:00:00  28.15 MB/s
typing-3.6.4-p 100% |################################| Time: 0:00:00  31.01 MB/s
eccodes-2.12.3 100% |################################| Time: 0:00:00  64.42 MB/s
xarray-0.12.0- 100% |################################| Time: 0:00:00  62.48 MB/s
cfgrib-0.9.6.1 100% |################################| Time: 0:00:00  28.62 MB/s
ipyleaflet-0.1 100% |################################| Time: 0:00:00  63.98 MB/s

wa_t = xr.open_dataset(wa_fn[0], engine='cfgrib')

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/cfgrib/bindings.py in <module>()
     51     try:
---> 52         lib = ffi.dlopen(libname)
     53         LOG.info("ecCodes library found using name '%s'.", libname)

/opt/conda/lib/python3.6/site-packages/cffi/api.py in dlopen(self, name, flags)
    139         with self._lock:
--> 140             lib, function_cache = _make_ffi_library(self, name, flags)
    141             self._function_caches.append(function_cache)

/opt/conda/lib/python3.6/site-packages/cffi/api.py in _make_ffi_library(ffi, libname, flags)
    785     backend = ffi._backend
--> 786     backendlib = _load_backend_lib(backend, libname, flags)
    787     #

/opt/conda/lib/python3.6/site-packages/cffi/api.py in _load_backend_lib(backend, name, flags)
    780             msg = "%s.  Additionally, %s" % (first_error, msg)
--> 781         raise OSError(msg)
    782     return backend.load_library(path, flags)

OSError: ctypes.util.find_library() did not manage to locate a library called 'libeccodes'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-4-b7eec47bbe65> in <module>()
----> 1 wa_t = xr.open_dataset(wa_fn[0], engine='cfgrib')
      2 wa_t = wa_t.drop(['number', 'surface', 'step', 'valid_time'])
      3 wa_t -= 273.15
      4 wa_t['t2m'].attrs['units'] = 'C'

/opt/conda/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)
    378         elif engine == 'cfgrib':
    379             store = backends.CfGribDataStore(
--> 380                 filename_or_obj, lock=lock, **backend_kwargs)
    381 
    382     else:

/opt/conda/lib/python3.6/site-packages/xarray/backends/cfgrib_.py in __init__(self, filename, lock, **backend_kwargs)
     34     """
     35     def __init__(self, filename, lock=None, **backend_kwargs):
---> 36         import cfgrib
     37         if lock is None:
     38             lock = ECCODES_LOCK

/opt/conda/lib/python3.6/site-packages/cfgrib/__init__.py in <module>()
     19 
     20 # cfgrib core API depends on the ECMWF ecCodes C-library only
---> 21 from .cfmessage import CfMessage
     22 from .dataset import Dataset, DatasetBuildError, open_file
     23 from .messages import Message, FileStream

/opt/conda/lib/python3.6/site-packages/cfgrib/cfmessage.py in <module>()
     29 import numpy as np  # noqa
     30 
---> 31 from . import messages
     32 
     33 LOG = logging.getLogger(__name__)

/opt/conda/lib/python3.6/site-packages/cfgrib/messages.py in <module>()
     38 import attr
     39 
---> 40 from . import bindings
     41 
     42 

/opt/conda/lib/python3.6/site-packages/cfgrib/bindings.py in <module>()
     88 # Helper values to discriminate key types
     89 #
---> 90 CODES_TYPE_UNDEFINED = lib.GRIB_TYPE_UNDEFINED
     91 CODES_TYPE_LONG = lib.GRIB_TYPE_LONG
     92 CODES_TYPE_DOUBLE = lib.GRIB_TYPE_DOUBLE

/opt/conda/lib/python3.6/site-packages/cfgrib/bindings.py in __getattr__(self, attr)
     45 
     46     def __getattr__(self, attr):
---> 47         raise_from(RuntimeError(self.message), self.exc)
     48 
     49 

/opt/conda/lib/python3.6/site-packages/future/utils/__init__.py in raise_from(exc, cause)
    398         myglobals['__python_future_raise_from_cause'] = cause
    399         execstr = "raise __python_future_raise_from_exc from __python_future_raise_from_cause"
--> 400         exec(execstr, myglobals, mylocals)
    401 
    402     def raise_(tp, value=None, tb=None):

/opt/conda/lib/python3.6/site-packages/cfgrib/bindings.py in <module>()

RuntimeError: ecCodes library not found on the system.

How to avoid StopIteration Error?

Dear all,
in the last time i often receive the following Error:

  File "/app/src/modules/decoder/grib_decoder.py", line 18, in __init__
    self._data = xarray_store.open_dataset(str(file_name))
  File "/usr/local/lib/python3.6/site-packages/cfgrib/xarray_store.py", line 47, in open_dataset
    store = cfgrib_.CfGribDataStore(path, **real_backend_kwargs)
  File "/usr/local/lib/python3.6/site-packages/cfgrib/cfgrib_.py", line 78, in __init__
    self.ds = cfgrib.open_file(filename, mode='r', **backend_kwargs)
  File "/usr/local/lib/python3.6/site-packages/cfgrib/dataset.py", line 452, in open_file
    return Dataset.from_path(path, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/cfgrib/dataset.py", line 447, in from_path
    return cls(*build_dataset_components(stream, **flavour_kwargs))
  File "/usr/local/lib/python3.6/site-packages/cfgrib/dataset.py", line 395, in build_dataset_components
    index = stream.index(ALL_KEYS).subindex(filter_by_keys)
  File "/usr/local/lib/python3.6/site-packages/cfgrib/messages.py", line 211, in index
    return FileIndex.from_filestream(filestream=self, index_keys=index_keys)
  File "/usr/local/lib/python3.6/site-packages/cfgrib/messages.py", line 228, in from_filestream
    schema = make_message_schema(filestream.first(), index_keys)
  File "/usr/local/lib/python3.6/site-packages/cfgrib/messages.py", line 208, in first
    return next(iter(self))
StopIteration

Why cfgrib raise StopIteration Error? Whats wrong?

Reading HRRR data at different levels

I am testing cfgrib on Windows to read NOAA's HRRR model output.
Here is an example HRRR file: https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20190227/hrrr.t00z.wrfsfcf08.grib2

I have read this data as follows:

dt = xarray.open_dataset('hrrr.t00z.wrfsfcf08.grib2', engine='cfgrib', backend_kwargs={'filter_by_keys': {'typeOfLevel': 'heightAboveGround', 'stepType': 'instant'}})

HRRR grib files have multiple messages for U and V wind component at heightAboveGround. There is 10-m height and an 80-m height. I see the u10 variable loaded, but not u80.
Below shows the loaded variables:

Does the "skipping variable" have anything to do with this when opening a file? This is part of the message when the data was loading...

dt = xarray.open_dataset('hrrr.t00z.wrfsfcf00.grib2', engine='cfgrib', backend_kwargs={'filter_by_keys': {'typeOfLevel': 'heightAboveGround', 'stepType': 'instant'}})
skipping variable: paramId==131 shortName='u'
Traceback (most recent call last):
  File "C:\Users\------\AppData\Local\conda\conda\envs\TEST_cfgrib\lib\site-packages\cfgrib\dataset.py", line 468, in build_dataset_components
    dict_merge(variables, vars)
  File "C:\Users\------\AppData\Local\conda\conda\envs\TEST_cfgrib\lib\site-packages\cfgrib\dataset.py", line 430, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=('heightAboveGround',), data=array([1000, 4000])) new_value=Variable(dimensions=(), data=80)
skipping variable: paramId==132 shortName='v'
Traceback (most recent call last):
  File "C:\Users\------\AppData\Local\conda\conda\envs\TEST_cfgrib\lib\site-packages\cfgrib\dataset.py", line 468, in build_dataset_components
    dict_merge(variables, vars)
  File "C:\Users\------\AppData\Local\conda\conda\envs\TEST_cfgrib\lib\site-packages\cfgrib\dataset.py", line 430, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=('heightAboveGround',), data=array([1000, 4000])) new_value=Variable(dimensions=(), data=80)

(continued with other variables being skipped)

How does one use key filter arguments to get 10-m height and 80-m height when opening the dataset?

Also, the u and v variables don't indicate what level they are for. How can I find out the level for that?
I am also expecting to see variables for gusts at 10-m, reflectivity at 1-km above ground level, etc. For reference, here is a GRIB table for the HRRR files: https://rapidrefresh.noaa.gov/hrrr/GRIB2Table_hrrrncep_2d.txt

Remove `flavour_name` and simplify the `encode_` options

Now the flavour is just a definition of which encode_ option to set. On the other hand the name encode is wrong and the option are very long.

Probably change to something like:

>>> cfgrib.open_dataset('data.grib', backend_kwargs={'encode_cf': ['parameter', 'time', 'vertical', 'geography']})

Where the strings accepted are the ecCodes namespaces.

Programmatically load all messages

I'm not a grib expert, so I could be missing something about the available cfgrib functionality. My employee @katherinekolman and I have been working on converting some code that used pygrib to use cfgrib. The end result for our software is an xarray DataArray so cfgrib seemed like a good solution over the eccodes-python. However, we're having trouble reading some grib files from NCEP which are the main grib files we want to support.

We run in to the case a lot where some variables conflict with previously loaded versions of that same variable. We've even tried using the experimental open_datasets but that seems to fail in some cases where open_dataset with manual filter_by_keys succeeds. I think this is similar to #66 and #63.

So my question is, is there an interface (either in cfgrib or eccodes) that would allow us to programmatically list the metadata of a file's messages to see what filter_by_keys could be set to without first failing to load the file? I'm trying to find a way for our software to be given a grib file, analyze what can be loaded, and then provide that information to the user so they can request to load specific pieces of data.

We're willing to help if this is a time issue. If this is a "cfgrib doesn't plan on supporting these types of files or this type of reading" then do you have other ideas? Perhaps we could customize the existing open_datasets?

Warnings when importing xarray_store

Hello

Just installed cfgrib in our infrastructure using EasyBuild (dependencies: python 2.7.9, ecCodes 2.7.3, pandas 0.23.3 and xarray 0.10.8). When importing xarray, it seems to work but I have these warnings:

[kserrade@bscearth370 ~ ]$ python
Python 2.7.9 (default, Feb  5 2016, 16:59:17)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from cfgrib import xarray_store
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/_libs/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/__init__.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (hashtable as _hashtable,
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/dtypes/common.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos, lib
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/util/hashing.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import hashing, tslib
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/indexes/base.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (lib, index as libindex, tslib as libts,
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/tseries/offsets.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.tslibs.offsets as liboffsets
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/ops.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos as libalgos, ops as libops
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/indexes/interval.py:32: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs.interval import (
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/internals.py:14: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import internals as libinternals
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/sparse/array.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.sparse as splib
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/window.py:36: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.window as _window
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/groupby/groupby.py:68: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (lib, reduction,
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/core/reshape/reshape.py:30: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos as _algos, reshape as _reshape
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/io/parsers.py:45: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.parsers as parsers
/shared/earth/software/pandas/0.23.3-foss-2015a-Python-2.7.9/lib/python2.7/site-packages/pandas-0.23.3-py2.7-linux-x86_64.egg/pandas/io/pytables.py:50: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos, lib, writers as libwriters

Is it ok?

Nice work! Hope this simply our daily work with GRIB files!

xarray attributeError while opening gribFile

I am getting the following Attribute Error when i am using:

from cfgrib import xarray_store
ds=xarray_store.open_dataset('icon-eu_europe_regular-lat-lon_single-level_2018050300_000_ASOB_S.grib2')

Error:

In [3]: ds=xarray_store.open_dataset('icon-eu_europe_regular-lat-lon_single-level_2018050300_000_ASOB_S.grib2')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-23c9c469eeaa> in <module>()
----> 1 ds=xarray_store.open_dataset('icon-eu_europe_regular-lat-lon_single-level_2018050300_000_ASOB_S.grib2')

/usr/local/lib/python3.6/dist-packages/cfgrib/xarray_store.py in open_dataset(path, backend_kwargs, filter_by_keys, **kwargs)
     46     real_backend_kwargs.update(backend_kwargs)
     47     store = cfgrib_.CfGribDataStore(path, **real_backend_kwargs)
---> 48     return xr.backends.api.open_dataset(store, **kwargs)
     49 
     50 

/usr/lib/python3/dist-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
    324         store = backends.ScipyDataStore(filename_or_obj)
    325 
--> 326     return maybe_decode_store(store)
    327 
    328 

/usr/lib/python3/dist-packages/xarray/backends/api.py in maybe_decode_store(store, lock)
    236             store, mask_and_scale=mask_and_scale, decode_times=decode_times,
    237             concat_characters=concat_characters, decode_coords=decode_coords,
--> 238             drop_variables=drop_variables)
    239 
    240         _protect_dataset_variables_inplace(ds, cache)

/usr/lib/python3/dist-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables)
    601     vars, attrs, coord_names = decode_cf_variables(
    602         vars, attrs, concat_characters, mask_and_scale, decode_times,
--> 603         decode_coords, drop_variables=drop_variables)
    604     ds = Dataset(vars, attrs=attrs)
    605     ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars))

/usr/lib/python3/dist-packages/xarray/conventions.py in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables)
    534             k, v, concat_characters=concat_characters,
    535             mask_and_scale=mask_and_scale, decode_times=decode_times,
--> 536             stack_char_dim=stack_char_dim)
    537         if decode_coords:
    538             var_attrs = new_vars[k].attrs

/usr/lib/python3/dist-packages/xarray/conventions.py in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim)
    469         for coder in [times.CFTimedeltaCoder(),
    470                       times.CFDatetimeCoder()]:
--> 471             var = coder.decode(var, name=name)
    472 
    473     dimensions, data, attributes, encoding = (

/usr/lib/python3/dist-packages/xarray/coding/times.py in decode(self, variable, name)
    349             units = pop_to(attrs, encoding, 'units')
    350             calendar = pop_to(attrs, encoding, 'calendar')
--> 351             dtype = _decode_cf_datetime_dtype(data, units, calendar)
    352             transform = partial(
    353                 decode_cf_datetime, units=units, calendar=calendar)

/usr/lib/python3/dist-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar)
    105     values = indexing.ImplicitToExplicitIndexingAdapter(
    106         indexing.as_indexable(data))
--> 107     example_value = np.concatenate([first_n_items(values, 1) or [0],
    108                                     last_item(values) or [0]])
    109 

/usr/lib/python3/dist-packages/xarray/core/formatting.py in first_n_items(array, n_desired)
     92         indexer = _get_indexer_at_least_n_items(array.shape, n_desired)
     93         array = array[indexer]
---> 94     return np.asarray(array).flat[:n_desired]
     95 
     96 

/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    499 
    500     """
--> 501     return array(a, dtype, copy=False, order=order)
    502 
    503 

/usr/lib/python3/dist-packages/xarray/core/indexing.py in __array__(self, dtype)
    434 
    435     def __array__(self, dtype=None):
--> 436         return np.asarray(self.array, dtype=dtype)
    437 
    438     def __getitem__(self, key):

/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    499 
    500     """
--> 501     return array(a, dtype, copy=False, order=order)
    502 
    503 

/usr/lib/python3/dist-packages/xarray/core/indexing.py in __array__(self, dtype)
    492     def __array__(self, dtype=None):
    493         array = as_indexable(self.array)
--> 494         return np.asarray(array[self.key], dtype=None)
    495 
    496     def transpose(self, order):

/usr/local/lib/python3.6/dist-packages/cfgrib/cfgrib_.py in __getitem__(self, key)
     54 
     55     def __getitem__(self, key):
---> 56         return indexing.explicit_indexing_adapter(
     57             key, self.shape, indexing.IndexingSupport.OUTER, self._getitem)
     58 

AttributeError: module 'xarray.core.indexing' has no attribute 'explicit_indexing_adapter'

How to deal with this?
I am usin cfgrib version 0.9.2
icon-eu_europe_regular-lat-lon_single-level_2018050300_000_ASOB_S.zip

Sync the internal eccodes bindings with the official ones

Get ready to drop the internal eccodes bindings in favour of the official ecCodes python bindings (now available for Python 2 and 3) and later of eccodes-python.

Biggest change is using str in the place of bytes.

cfgrib fails on a GRIB file with one corrupted message with no option to ignore it

I am trying to open IFS forecasting data (provided by knmi):

ds = xarray_store.open_dataset('ECMWF_DET_MCONTROL2_2018072912_009_GB')

and got this error message:

EcCodesError: ('Wrong message length (-23).', -23)

No problem to open the data with other EcCodes scripts e.g.
grib_get_data ECMWF_DET_MCONTROL2_2018072912_009_GB

ECMWF_DET_MCONTROL2_2018072912_009_GB.zip

System settings:
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux

and cfgrib version: 0.8.4.1

Add support for wave spectra by adding direction and frequency coordinates

MARS distribute the ERA5 wave spectra that have two additional coordinates direction and frequency, that are not yet supported by cfgrib.

Example MARS request:

#!/usr/bin/env python
from ecmwfapi import ECMWFDataServer
server = ECMWFDataServer()
server.retrieve({
    "class": "ei",
    "dataset": "interim",
    "expver": "1",
    "stream": "wave",
    "type": "an",
    "date": "2016-01-01/to/2016-01-02",
    "time": "00:00:00/06:00:00/12:00:00/18:00:00",
    "param": "251.140",
    "direction": "1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24",
    "frequency": "1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30",
    "target": "2d_spectra_201601",
})