Code Monkey home page Code Monkey logo

Comments (7)

dcherian avatar dcherian commented on June 2, 2024 1

The second issue is non-conforming CF attribute. scale_factor should be of unpacked type (some floating point type). If you change it to floating point it works as intended.

We could cast and raise a warning. It should be OK to open a non-conforming file with xarray.

from xarray.

Thomas-Z avatar Thomas-Z commented on June 2, 2024 1

My problem is more about the fact that we can no longer read these type of variables without setting mask_and_scale to False (and I broke some of my colleagues CryoSat processing tools when I updated xarray 😇 )

Decoding it as a timedelta64 with the option to disable it with decode_timedelta=False seems ok to me but I'm not the end user of the data.
I'm not a fan of having different or hard to predict decoding behavior depending on whether it's a coordinate or a variable or if it has that specific attribute.

Simple rules (when possible) will not satisfy everyone but we will not have any surprise and we can adapt.

from xarray.

kmuehlbauer avatar kmuehlbauer commented on June 2, 2024

@Thomas-Z Thanks for the well written issue.

The first issue is with Timedelta decoding. If you remove the units attribute the pipeline works. This indicates, that there is a regression in that part. I'll have a closer look the next days. One remark here, packed data can't be of type int64 (see https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#packed-data).

The second issue is non-conforming CF attribute. scale_factor should be of unpacked type (some floating point type). If you change it to floating point it works as intended.

from xarray.

Thomas-Z avatar Thomas-Z commented on June 2, 2024

The first issue is with Timedelta decoding. If you remove the units attribute the pipeline works.

Not sure if it helps but keeping the unit and removing the fill_value makes it work too.

The second issue is non-conforming CF attribute. scale_factor should be of unpacked type (some floating point type). If you change it to floating point it works as intended.

Right, I was not aware of that.
Using 1000.0 as scale_factor does work but changes the unpacked data type (to float) which is kind of disturbing to me but seems conform to the CF convention.

We could cast and raise a warning. It should be OK to open a non-conforming file with xarray.

In my example I can open the non-conforming file.
I just cannot write a non-conforming file and this is maybe not a bad thing.

from xarray.

kmuehlbauer avatar kmuehlbauer commented on June 2, 2024

The first issue is with Timedelta decoding. If you remove the units attribute the pipeline works.

Not sure if it helps but keeping the unit and removing the fill_value makes it work too.

Yes, I would have thought so. The CF mask coder is only applied when _FillValue is given. As the time decoding is after masking that leads to some issue in this case. We possibly need to special case time units in CF mask coder. But aren't we doing that already?

We could cast and raise a warning. It should be OK to open a non-conforming file with xarray.

In my example I can open the non-conforming file. I just cannot write a non-conforming file and this is maybe not a bad thing.

So, for the second case we already allow to read int64 packed into int8 (which is not CF conforming). But then it might be good to raise a more specific error on write, here (non conforming CF).

from xarray.

kmuehlbauer avatar kmuehlbauer commented on June 2, 2024

@Thomas-Z Is your issue related to #1621? Do you want your data converted to timedelta64 or keep the floating point representation?

from xarray.

jsolbrig avatar jsolbrig commented on June 2, 2024

I'm having similar issues, but with reading a preexisting data file from Metop-C's ASCAT instrument. Maybe these files are non-conforming (I'm not sure) but the are official files from EUMETSAT.

Unless I'm misunderstanding something, though, the file appears to follow the rules regarding packed data linked by @Thomas-Z. The data are packed as an int32 and the scale factor is a float64.

Opening the file with df = xr.open_dataset(fname) succeeds, but I get the same error as above if I attempt to access values from df.time.values.

> df.time.values
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
Cell In[35], line 1
----> 1 dat.time.values

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/dataarray.py:784, in DataArray.values(self)
    771 @property
    772 def values(self) -> np.ndarray:
    773     """
    774     The array's data converted to numpy.ndarray.
    775 
   (...)
    782     to this array may be reflected in the DataArray as well.
    783     """
--> 784     return self.variable.values

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/variable.py:525, in Variable.values(self)
    522 @property
    523 def values(self):
    524     """The variable's data as a numpy.ndarray"""
--> 525     return _as_array_or_item(self._data)

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/variable.py:323, in _as_array_or_item(data)
    309 def _as_array_or_item(data):
    310     """Return the given values as a numpy array, or as an individual item if
    311     it's a 0d datetime64 or timedelta64 array.
    312 
   (...)
    321     TODO: remove this (replace with np.asarray) once these issues are fixed
    322     """
--> 323     data = np.asarray(data)
    324     if data.ndim == 0:
    325         if data.dtype.kind == "M":

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:806, in MemoryCachedArray.__array__(self, dtype)
    805 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray:
--> 806     return np.asarray(self.get_duck_array(), dtype=dtype)

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:809, in MemoryCachedArray.get_duck_array(self)
    808 def get_duck_array(self):
--> 809     self._ensure_cached()
    810     return self.array.get_duck_array()

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:803, in MemoryCachedArray._ensure_cached(self)
    802 def _ensure_cached(self):
--> 803     self.array = as_indexable(self.array.get_duck_array())

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:760, in CopyOnWriteArray.get_duck_array(self)
    759 def get_duck_array(self):
--> 760     return self.array.get_duck_array()

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:630, in LazilyIndexedArray.get_duck_array(self)
    625 # self.array[self.key] is now a numpy array when
    626 # self.array is a BackendArray subclass
    627 # and self.key is BasicIndexer((slice(None, None, None),))
    628 # so we need the explicit check for ExplicitlyIndexed
    629 if isinstance(array, ExplicitlyIndexed):
--> 630     array = array.get_duck_array()
    631 return _wrap_numpy_scalars(array)

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/coding/variables.py:81, in _ElementwiseFunctionArray.get_duck_array(self)
     80 def get_duck_array(self):
---> 81     return self.func(self.array.get_duck_array())

File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/coding/variables.py:399, in _scale_offset_decoding(data, scale_factor, add_offset, dtype)
    397 data = data.astype(dtype=dtype, copy=True)
    398 if scale_factor is not None:
--> 399     data *= scale_factor
    400 if add_offset is not None:
    401     data += add_offset

UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

I can get around this by opening the file with df = xr.open_dataset(fname, mask_and_scale=False) but that has obvious repercussions for the other DataArrays.

The time DataArray looks like this:

<xarray.DataArray 'time' (NUMROWS: 3264, NUMCELLS: 82)> Size: 2MB
[267648 values with dtype=int64]
Coordinates:
    lat      (NUMROWS, NUMCELLS) float64 2MB ...
    lon      (NUMROWS, NUMCELLS) float64 2MB ...
Dimensions without coordinates: NUMROWS, NUMCELLS
Attributes:
    valid_min:      0
    valid_max:      2147483647
    standard_name:  time
    long_name:      time
    units:          seconds since 1990-01-01 00:00:00

When read with mask_and_scale=False, the DataArray's attributes are:

{'_FillValue': -2147483647,
 'missing_value': -2147483647,
 'valid_min': 0,
 'valid_max': 2147483647,
 'standard_name': 'time',
 'long_name': 'time',
 'scale_factor': 1.0,
 'add_offset': 0.0}

I can replicate the error by attempting to do an in-place operation on some of the time data after reading with mask_and_scale=False, decode_times=False:

In [54]: df = xr.open_dataset(fname, mask_and_scale=False, decode_times=False)

In [55]: tmp = df.time.values[0:10, 0:10]

In [56]: tmp.dtype
Out[56]: dtype('int32')

In [57]: df.time.attrs['scale_factor'].dtype
Out[57]: dtype('float64')

In [58]: tmp *= dat2.time.attrs.get('scale_factor')
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
Cell In[58], line 1
----> 1 tmp *= dat2.time.attrs.get('scale_factor')

UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int32') with casting rule 'same_kind'

Am I doing something wrong? Is the file non-conformant? Is there a way to solve this issue without doing all of my own masking, scaling, and conversion to datetime?

from xarray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.