Comments (20)
@spencerkclark says that this is due to the fact that netcdf4 now returns cftime objects per default (always)
Thanks @fmaussion. I think it's still important to distinguish between objects that subclass datetime.datetime
(e.g. cftime.real_datetime
) and objects that subclass cftime.datetime
(e.g. cftime.DatetimeGregorian
). For instance for xarray's CFTimeIndex we require only cftime.datetime
objects be used, and we use the only_use_cftime_datetimes
flag in num2date
to guarantee this when decoding times (it is still False
by default). The issue here stems from #66, where the real_datetime
subclass of datetime.datetime
was introduced.
I would not expect cftime.datetime
objects to be handled by default by pandas (because those could have non-standard calendar types), but perhaps we might try to make cftime.real_datetime
objects compatible, because they are at their core ordinary datetime.datetime
objects.
from cftime.
It looks like all they are missing is a nanosecond attribute, which can just always be 0 for it to work as expected.
from cftime.
python datetime objects don't have a nanosecond attribute either
from cftime.
They indeed don't, but pandas assumes that whenever an object is "PyDateTime_Check(val) and not PyDateTime_CheckExact(val)", i.e. a subclass of datetime, that it has a nanosecond attribute, which is what causes the error in the first place:
https://github.com/pandas-dev/pandas/blob/7191af9b47f6f57991abdb20625d9563877370a2/pandas/_libs/tslib.pyx#L539
from cftime.
So this is a bug in pandas then?
from cftime.
from cftime.
Sorry for likely prompting that issue to be closed there...one could try to fix this issue in pandas, but I understand to some extent their pushback. They do not make any guarantees that non-pandas.Timestamp
subclasses of datetime.datetime
objects will be supported, and the logic there regarding datetimes is already quite complicated.
Barring rolling back the change in #66 to add dayofwk
and dayofyr
properties to datetime.datetime
via a subclass, as unsavory as it might sound, I think the most painless workaround here is in your code to convert the datetimes returned from num2date
to numpy.datetime64
before passing them to the Series:
In [1]: import cftime, pandas
In [2]: times = cftime.num2date([0,1,2,3,4,5,500,1000], 'days since 1801-10-01')
In [3]: times
Out[3]:
array([real_datetime(1801, 10, 1, 0, 0), real_datetime(1801, 10, 2, 0, 0),
real_datetime(1801, 10, 3, 0, 0), real_datetime(1801, 10, 4, 0, 0),
real_datetime(1801, 10, 5, 0, 0), real_datetime(1801, 10, 6, 0, 0),
real_datetime(1803, 2, 13, 0, 0), real_datetime(1804, 6, 27, 0, 0)], dtype=object)
In [4]: nptimes = times.astype('datetime64[ns]')
In [5]: nptimes
Out[5]:
array(['1801-10-01T00:00:00.000000000', '1801-10-02T00:00:00.000000000',
'1801-10-03T00:00:00.000000000', '1801-10-04T00:00:00.000000000',
'1801-10-05T00:00:00.000000000', '1801-10-06T00:00:00.000000000',
'1803-02-13T00:00:00.000000000', '1804-06-27T00:00:00.000000000'], dtype='datetime64[ns]')
In [6]: pandas.Series(range(len(nptimes)), index=nptimes)
Out[6]:
1801-10-01 0
1801-10-02 1
1801-10-03 2
1801-10-04 3
1801-10-05 4
1801-10-06 5
1803-02-13 6
1804-06-27 7
dtype: int64
At least for now, NumPy seems to handle subclasses of datetime.datetime
objects just fine in this conversion, and pandas is OK with numpy.datetime64
.
from cftime.
I think the most painless workaround here is in your code to convert the datetimes
Yes, I don't worry too much about workarounds - currently we are running with the nanosecond
attr, but I'll consider yours. It surprises me however that I'm the only one wanting to use pandas with netcdf files output, and I find this recent evolution (and the necessary workaround) quite unfortunate for simple users.
But as I said, I just don't understand dates in python well enough to make any recommendation here. In this perspective I am like most scientists I guess: there are way too many date formats in python. They are all equivalent to me, until I find out they are not ;-)
from cftime.
It surprises me however that I'm the only one wanting to use pandas with netcdf files output, and I find this recent evolution (and the necessary workaround) quite unfortunate for simple users.
You're definitely not the only one :). I'm sorry if I came across as being in support of doing nothing in response to this issue.
there are way too many date formats in python
I agree; in hindsight #66 effectively created another one, for arguably little gain (and clearly at least one regression).
From my perspective I think cftime should at least consider the option of reverting back to by default using pure datetime.datetime
objects for standard calendars (and not a subclass, i.e. real_datetime
) before going the route of trying to change the behavior downstream in pandas (which they seem somewhat reluctant to doing) or adding a nanosecond
attribute to real_datetime
(which seems somewhat fragile).
As I understand it, in addition to creating real_datetime
, #66 established working dayofwk
and dayofyr
properties for all cftime.datetime
objects. There are cftime.datetime
objects for all types of calendars (including standard ones). If one would always like those properties on their datetimes, they could toggle the only_use_cftime_datetimes
option in num2date
.
Ultimately, I think the benefits of returning standard Python datetime
objects (for interoperability with the rest of the stack) would outweigh the minor downside that by default, not all datetime objects returned by num2date
would have dayofwk
and dayofyr
attributes (because there would still be a way to achieve that behavior; it would just not be there automatically).
That's my two cents, but I am not a core developer of cftime, so take that with a grain of salt.
from cftime.
I do think that the actual issue in this case is on the pandas side. As they are the ones assuming that every subclass of datetime automatically has to have a nanosecond attribute, which there is absolutely no guarantee for.
from cftime.
@TimoRoth I understand that it would fix things in pandas if that was allowed, but I think it's important to note that pandas doesn't explicitly say they will support subclasses of datetime objects, which is the reason for their reluctance. Therefore I'm not sure I would frame it as a "bug" or "regression" in pandas; it would be more of a request for them to relax their assumptions.
The change that caused this was in cftime (and was made quite recently). For that reason I think we should at least reflect on whether that was a good decision before approaching pandas again.
from cftime.
For what it's worth though, I will note that others apparently would like something of this sort to be supported in pandas too -- this issue seems related: pandas-dev/pandas#21142.
from cftime.
@TimoRoth how much of an effort would it be to send a patch in Pandas? Jeff Reback seemed opened to it (pandas-dev/pandas#23419 (comment))
Let's just hope it doesn't imply messing around with C code...
from cftime.
The problem with this, and the reason it was changed in the first place, is that returning a different kind of object depending on the calendar and time period resulting in problems for users. I'm not in favor of reverting to this just as a workaround for how pandas deals with datetime subclasses.
Sorry @jswhit I was a little confused by your comment, though I can't seem to find it on GitHub anymore.
As a subclass of datetime.datetime
or as a pure datetime.datetime
object, cftime.real_datetime
is still not the same type as cftime.datetime
, so maybe I am not following, but it is not clear to me how #66 addressed the issue of returning different kinds of objects depending on the calendar and time period. As I noted earlier, #39 is a way to work around that particular issue.
That being said, I really don't feel very strongly about this, so I don't want to belabor the discussion. I merely thought given the recency of the change that it might warrant reconsidering in this case, but I see the other side of the argument too (i.e. that one might want dayofwk
and dayofyr
attributes on all date types returned by num2date
). Whatever you ultimately decide to do is OK with me. Since I'm responsible for bringing discussion here, if the decision is to fix things in cftime, I'm happy to volunteer to do that work, but if the decision is to push the fix downstream to pandas (as you suggested in your comment), I would rather someone else take it on, as I am not particularly invested in this problem.
from cftime.
I deleted that comment after I realized it was completely off base, sorry. You are right that this particular change only enabled the extra two attributes (dayofwk, dayofyr). I would still prefer the issue be addressed in pandas, but I'm not opposed to adding a bogus nanosecond attribute as a workaround.
from cftime.
I would still prefer the issue be addressed in pandas, but I'm not opposed to adding a bogus nanosecond attribute as a workaround.
Perfect, thanks. Hopefully things work out in pandas, but it's good to have that as a fallback option here.
from cftime.
@fmaussion, can you verify that pull request #95 (branch issue77
) fixes the issue?
from cftime.
I know I'm late on this, but thanks for adding the workaround @jswhit.
from cftime.
Sorry for likely prompting that issue to be closed there...one could try to fix this issue in pandas, but I understand to some extent their pushback. They do not make any guarantees that non-
pandas.Timestamp
subclasses ofdatetime.datetime
objects will be supported, and the logic there regarding datetimes is already quite complicated.Barring rolling back the change in #66 to add
dayofwk
anddayofyr
properties todatetime.datetime
via a subclass, as unsavory as it might sound, I think the most painless workaround here is in your code to convert the datetimes returned fromnum2date
tonumpy.datetime64
before passing them to the Series:In [1]: import cftime, pandas In [2]: times = cftime.num2date([0,1,2,3,4,5,500,1000], 'days since 1801-10-01') In [3]: times Out[3]: array([real_datetime(1801, 10, 1, 0, 0), real_datetime(1801, 10, 2, 0, 0), real_datetime(1801, 10, 3, 0, 0), real_datetime(1801, 10, 4, 0, 0), real_datetime(1801, 10, 5, 0, 0), real_datetime(1801, 10, 6, 0, 0), real_datetime(1803, 2, 13, 0, 0), real_datetime(1804, 6, 27, 0, 0)], dtype=object) In [4]: nptimes = times.astype('datetime64[ns]') In [5]: nptimes Out[5]: array(['1801-10-01T00:00:00.000000000', '1801-10-02T00:00:00.000000000', '1801-10-03T00:00:00.000000000', '1801-10-04T00:00:00.000000000', '1801-10-05T00:00:00.000000000', '1801-10-06T00:00:00.000000000', '1803-02-13T00:00:00.000000000', '1804-06-27T00:00:00.000000000'], dtype='datetime64[ns]') In [6]: pandas.Series(range(len(nptimes)), index=nptimes) Out[6]: 1801-10-01 0 1801-10-02 1 1801-10-03 2 1801-10-04 3 1801-10-05 4 1801-10-06 5 1803-02-13 6 1804-06-27 7 dtype: int64
At least for now, NumPy seems to handle subclasses of
datetime.datetime
objects just fine in this conversion, and pandas is OK withnumpy.datetime64
.
Thx for your answer, but I found something wrong:
'''
In [6]: pandas.Series(range(len(nptimes)), index=nptimes)
Out[6]:
1801-10-01 0
1801-10-02 1
1801-10-03 2
1801-10-04 3
1801-10-05 4
1801-10-06 5
1803-02-13 6
1804-06-27 7
dtype: int64
'''
Maybe the right code shoud be below:
'''
pandas.Series(nptimes, index = range(len(nptimes)))
'''
from cftime.
I think it depends on your purpose. The motivation for using the integer range as the values and the times as the index
was based on the usage in the initial comment: #77 (comment).
from cftime.
Related Issues (20)
- Potential approaches for performance improvements in `cftime.num2date` HOT 3
- Test failure with Cython 3 TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'cftime._cftime.DatetimeGregorian' HOT 11
- Inconsistent date2index (netcdf4-python #1155) HOT 1
- 1.6.0: sphinx warning `WARNING: autodoc: failed to import module 'cftime'` HOT 5
- 1.6.0: pytest warnings
- Implement a `strptime` method HOT 12
- New Numpy Version Issues HOT 2
- `cftime.num2date` fails on an empty integer-dtype array
- test_num2date_precision failure on various architectures HOT 3
- OverflowError time values outside range of 64 bit signed integers HOT 13
- Dealing with netcdf time:units with unexpected characters HOT 5
- Future plans for the `longdouble` argument HOT 2
- Support time units that are a multiple of second, hour, or day (with a reference date) HOT 5
- Unexpected results when not using separators in format code with cftime.datetime.strptime HOT 7
- Matplotlib date formatter and cftime : incompatible ? HOT 8
- Parsing "months since ..." dates HOT 4
- Build `musllinux` wheels HOT 4
- Missing 1.6.3 sdist on pypi
- reformat axis, calendar, units or all? Repeating time ranges HOT 1
- Passing empty array to date2num raises `IndexError: index 0 is out of bounds for axis 0 with size 0` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cftime.