Comments (7)
The second issue is non-conforming CF attribute. scale_factor should be of unpacked type (some floating point type). If you change it to floating point it works as intended.
We could cast and raise a warning. It should be OK to open a non-conforming file with xarray.
from xarray.
My problem is more about the fact that we can no longer read these type of variables without setting mask_and_scale
to False (and I broke some of my colleagues CryoSat processing tools when I updated xarray 😇 )
Decoding it as a timedelta64 with the option to disable it with decode_timedelta=False
seems ok to me but I'm not the end user of the data.
I'm not a fan of having different or hard to predict decoding behavior depending on whether it's a coordinate or a variable or if it has that specific attribute.
Simple rules (when possible) will not satisfy everyone but we will not have any surprise and we can adapt.
from xarray.
@Thomas-Z Thanks for the well written issue.
The first issue is with Timedelta decoding. If you remove the units
attribute the pipeline works. This indicates, that there is a regression in that part. I'll have a closer look the next days. One remark here, packed data can't be of type int64 (see https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#packed-data).
The second issue is non-conforming CF attribute. scale_factor
should be of unpacked type (some floating point type). If you change it to floating point it works as intended.
from xarray.
The first issue is with Timedelta decoding. If you remove the units attribute the pipeline works.
Not sure if it helps but keeping the unit and removing the fill_value makes it work too.
The second issue is non-conforming CF attribute.
scale_factor
should be of unpacked type (some floating point type). If you change it to floating point it works as intended.
Right, I was not aware of that.
Using 1000.0 as scale_factor
does work but changes the unpacked data type (to float) which is kind of disturbing to me but seems conform to the CF convention.
We could cast and raise a warning. It should be OK to open a non-conforming file with xarray.
In my example I can open the non-conforming file.
I just cannot write a non-conforming file and this is maybe not a bad thing.
from xarray.
The first issue is with Timedelta decoding. If you remove the units attribute the pipeline works.
Not sure if it helps but keeping the unit and removing the fill_value makes it work too.
Yes, I would have thought so. The CF mask coder is only applied when _FillValue
is given. As the time decoding is after masking that leads to some issue in this case. We possibly need to special case time units in CF mask coder. But aren't we doing that already?
We could cast and raise a warning. It should be OK to open a non-conforming file with xarray.
In my example I can open the non-conforming file. I just cannot write a non-conforming file and this is maybe not a bad thing.
So, for the second case we already allow to read int64 packed into int8 (which is not CF conforming). But then it might be good to raise a more specific error on write, here (non conforming CF).
from xarray.
@Thomas-Z Is your issue related to #1621? Do you want your data converted to timedelta64 or keep the floating point representation?
from xarray.
I'm having similar issues, but with reading a preexisting data file from Metop-C's ASCAT instrument. Maybe these files are non-conforming (I'm not sure) but the are official files from EUMETSAT.
Unless I'm misunderstanding something, though, the file appears to follow the rules regarding packed data linked by @Thomas-Z. The data are packed as an int32
and the scale factor is a float64
.
Opening the file with df = xr.open_dataset(fname)
succeeds, but I get the same error as above if I attempt to access values from df.time.values
.
> df.time.values
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
Cell In[35], line 1
----> 1 dat.time.values
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/dataarray.py:784, in DataArray.values(self)
771 @property
772 def values(self) -> np.ndarray:
773 """
774 The array's data converted to numpy.ndarray.
775
(...)
782 to this array may be reflected in the DataArray as well.
783 """
--> 784 return self.variable.values
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/variable.py:525, in Variable.values(self)
522 @property
523 def values(self):
524 """The variable's data as a numpy.ndarray"""
--> 525 return _as_array_or_item(self._data)
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/variable.py:323, in _as_array_or_item(data)
309 def _as_array_or_item(data):
310 """Return the given values as a numpy array, or as an individual item if
311 it's a 0d datetime64 or timedelta64 array.
312
(...)
321 TODO: remove this (replace with np.asarray) once these issues are fixed
322 """
--> 323 data = np.asarray(data)
324 if data.ndim == 0:
325 if data.dtype.kind == "M":
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:806, in MemoryCachedArray.__array__(self, dtype)
805 def __array__(self, dtype: np.typing.DTypeLike = None) -> np.ndarray:
--> 806 return np.asarray(self.get_duck_array(), dtype=dtype)
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:809, in MemoryCachedArray.get_duck_array(self)
808 def get_duck_array(self):
--> 809 self._ensure_cached()
810 return self.array.get_duck_array()
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:803, in MemoryCachedArray._ensure_cached(self)
802 def _ensure_cached(self):
--> 803 self.array = as_indexable(self.array.get_duck_array())
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:760, in CopyOnWriteArray.get_duck_array(self)
759 def get_duck_array(self):
--> 760 return self.array.get_duck_array()
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/core/indexing.py:630, in LazilyIndexedArray.get_duck_array(self)
625 # self.array[self.key] is now a numpy array when
626 # self.array is a BackendArray subclass
627 # and self.key is BasicIndexer((slice(None, None, None),))
628 # so we need the explicit check for ExplicitlyIndexed
629 if isinstance(array, ExplicitlyIndexed):
--> 630 array = array.get_duck_array()
631 return _wrap_numpy_scalars(array)
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/coding/variables.py:81, in _ElementwiseFunctionArray.get_duck_array(self)
80 def get_duck_array(self):
---> 81 return self.func(self.array.get_duck_array())
File ~/anaconda3/envs/test-1.12.2-release/lib/python3.10/site-packages/xarray/coding/variables.py:399, in _scale_offset_decoding(data, scale_factor, add_offset, dtype)
397 data = data.astype(dtype=dtype, copy=True)
398 if scale_factor is not None:
--> 399 data *= scale_factor
400 if add_offset is not None:
401 data += add_offset
UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
I can get around this by opening the file with df = xr.open_dataset(fname, mask_and_scale=False)
but that has obvious repercussions for the other DataArrays.
The time
DataArray looks like this:
<xarray.DataArray 'time' (NUMROWS: 3264, NUMCELLS: 82)> Size: 2MB
[267648 values with dtype=int64]
Coordinates:
lat (NUMROWS, NUMCELLS) float64 2MB ...
lon (NUMROWS, NUMCELLS) float64 2MB ...
Dimensions without coordinates: NUMROWS, NUMCELLS
Attributes:
valid_min: 0
valid_max: 2147483647
standard_name: time
long_name: time
units: seconds since 1990-01-01 00:00:00
When read with mask_and_scale=False
, the DataArray's attributes are:
{'_FillValue': -2147483647,
'missing_value': -2147483647,
'valid_min': 0,
'valid_max': 2147483647,
'standard_name': 'time',
'long_name': 'time',
'scale_factor': 1.0,
'add_offset': 0.0}
I can replicate the error by attempting to do an in-place operation on some of the time data after reading with mask_and_scale=False, decode_times=False
:
In [54]: df = xr.open_dataset(fname, mask_and_scale=False, decode_times=False)
In [55]: tmp = df.time.values[0:10, 0:10]
In [56]: tmp.dtype
Out[56]: dtype('int32')
In [57]: df.time.attrs['scale_factor'].dtype
Out[57]: dtype('float64')
In [58]: tmp *= dat2.time.attrs.get('scale_factor')
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
Cell In[58], line 1
----> 1 tmp *= dat2.time.attrs.get('scale_factor')
UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int32') with casting rule 'same_kind'
Am I doing something wrong? Is the file non-conformant? Is there a way to solve this issue without doing all of my own masking, scaling, and conversion to datetime?
from xarray.
Related Issues (20)
- Nightly Hypothesis tests failed
- update `to_netcdf` docstring to list support for explicit CDF5 writes HOT 4
- (i)loc slicer specialization for convenient slicing by dimension label as `.loc('dim_name')[:n]`
- Improving performance of open_datatree HOT 4
- Why does xr.apply_ufunc support numpy/dask.arrays?
- Enhancement of xarray.Dataset.from_dataframe HOT 5
- Stricter check for .array attribute
- Release? HOT 5
- The numpy.array_api namespace has been removed in numpy 2.0 HOT 2
- Documentation Request: Clarity for __matmul__ operator HOT 2
- ```_FillValue``` and ```missing_value``` attributes get removed when using ```open_dataset``` HOT 4
- Potential regression in Dataset.from_dataframe() not preserving timezone HOT 6
- interpolate using quadratic returns nan HOT 1
- Map block reduction HOT 2
- Strings in coordinates may be truncated when saving concatenated rasters to zarr HOT 2
- Can't call open_mfdataset without creating chunked dask arrays HOT 3
- `DataSet.chunk` and `DataArray.chunk` handling object coordinates differently
- Regression/#1840: decoding to `float64` instead of `float32` HOT 7
- Passing in DataArray into `np.linspace` breaks with Numpy 2
- Square Logos HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xarray.