Code Monkey home page Code Monkey logo

Comments (20)

spencerkclark avatar spencerkclark commented on June 24, 2024

@spencerkclark says that this is due to the fact that netcdf4 now returns cftime objects per default (always)

Thanks @fmaussion. I think it's still important to distinguish between objects that subclass datetime.datetime (e.g. cftime.real_datetime) and objects that subclass cftime.datetime (e.g. cftime.DatetimeGregorian). For instance for xarray's CFTimeIndex we require only cftime.datetime objects be used, and we use the only_use_cftime_datetimes flag in num2date to guarantee this when decoding times (it is still False by default). The issue here stems from #66, where the real_datetime subclass of datetime.datetime was introduced.

I would not expect cftime.datetime objects to be handled by default by pandas (because those could have non-standard calendar types), but perhaps we might try to make cftime.real_datetime objects compatible, because they are at their core ordinary datetime.datetime objects.

from cftime.

TimoRoth avatar TimoRoth commented on June 24, 2024

It looks like all they are missing is a nanosecond attribute, which can just always be 0 for it to work as expected.

from cftime.

jswhit avatar jswhit commented on June 24, 2024

python datetime objects don't have a nanosecond attribute either

from cftime.

TimoRoth avatar TimoRoth commented on June 24, 2024

They indeed don't, but pandas assumes that whenever an object is "PyDateTime_Check(val) and not PyDateTime_CheckExact(val)", i.e. a subclass of datetime, that it has a nanosecond attribute, which is what causes the error in the first place:
https://github.com/pandas-dev/pandas/blob/7191af9b47f6f57991abdb20625d9563877370a2/pandas/_libs/tslib.pyx#L539

from cftime.

jswhit avatar jswhit commented on June 24, 2024

So this is a bug in pandas then?

from cftime.

TimoRoth avatar TimoRoth commented on June 24, 2024

from cftime.

spencerkclark avatar spencerkclark commented on June 24, 2024

Sorry for likely prompting that issue to be closed there...one could try to fix this issue in pandas, but I understand to some extent their pushback. They do not make any guarantees that non-pandas.Timestamp subclasses of datetime.datetime objects will be supported, and the logic there regarding datetimes is already quite complicated.

Barring rolling back the change in #66 to add dayofwk and dayofyr properties to datetime.datetime via a subclass, as unsavory as it might sound, I think the most painless workaround here is in your code to convert the datetimes returned from num2date to numpy.datetime64 before passing them to the Series:

In [1]: import cftime, pandas

In [2]: times = cftime.num2date([0,1,2,3,4,5,500,1000], 'days since 1801-10-01')

In [3]: times
Out[3]:
array([real_datetime(1801, 10, 1, 0, 0), real_datetime(1801, 10, 2, 0, 0),
       real_datetime(1801, 10, 3, 0, 0), real_datetime(1801, 10, 4, 0, 0),
       real_datetime(1801, 10, 5, 0, 0), real_datetime(1801, 10, 6, 0, 0),
       real_datetime(1803, 2, 13, 0, 0), real_datetime(1804, 6, 27, 0, 0)], dtype=object)

In [4]: nptimes = times.astype('datetime64[ns]')

In [5]: nptimes
Out[5]:
array(['1801-10-01T00:00:00.000000000', '1801-10-02T00:00:00.000000000',
       '1801-10-03T00:00:00.000000000', '1801-10-04T00:00:00.000000000',
       '1801-10-05T00:00:00.000000000', '1801-10-06T00:00:00.000000000',
       '1803-02-13T00:00:00.000000000', '1804-06-27T00:00:00.000000000'], dtype='datetime64[ns]')

In [6]: pandas.Series(range(len(nptimes)), index=nptimes)
Out[6]:
1801-10-01    0
1801-10-02    1
1801-10-03    2
1801-10-04    3
1801-10-05    4
1801-10-06    5
1803-02-13    6
1804-06-27    7
dtype: int64

At least for now, NumPy seems to handle subclasses of datetime.datetime objects just fine in this conversion, and pandas is OK with numpy.datetime64.

from cftime.

fmaussion avatar fmaussion commented on June 24, 2024

I think the most painless workaround here is in your code to convert the datetimes

Yes, I don't worry too much about workarounds - currently we are running with the nanosecond attr, but I'll consider yours. It surprises me however that I'm the only one wanting to use pandas with netcdf files output, and I find this recent evolution (and the necessary workaround) quite unfortunate for simple users.

But as I said, I just don't understand dates in python well enough to make any recommendation here. In this perspective I am like most scientists I guess: there are way too many date formats in python. They are all equivalent to me, until I find out they are not ;-)

from cftime.

spencerkclark avatar spencerkclark commented on June 24, 2024

It surprises me however that I'm the only one wanting to use pandas with netcdf files output, and I find this recent evolution (and the necessary workaround) quite unfortunate for simple users.

You're definitely not the only one :). I'm sorry if I came across as being in support of doing nothing in response to this issue.

there are way too many date formats in python

I agree; in hindsight #66 effectively created another one, for arguably little gain (and clearly at least one regression).

From my perspective I think cftime should at least consider the option of reverting back to by default using pure datetime.datetime objects for standard calendars (and not a subclass, i.e. real_datetime) before going the route of trying to change the behavior downstream in pandas (which they seem somewhat reluctant to doing) or adding a nanosecond attribute to real_datetime (which seems somewhat fragile).

As I understand it, in addition to creating real_datetime, #66 established working dayofwk and dayofyr properties for all cftime.datetime objects. There are cftime.datetime objects for all types of calendars (including standard ones). If one would always like those properties on their datetimes, they could toggle the only_use_cftime_datetimes option in num2date.

Ultimately, I think the benefits of returning standard Python datetime objects (for interoperability with the rest of the stack) would outweigh the minor downside that by default, not all datetime objects returned by num2date would have dayofwk and dayofyr attributes (because there would still be a way to achieve that behavior; it would just not be there automatically).

That's my two cents, but I am not a core developer of cftime, so take that with a grain of salt.

from cftime.

TimoRoth avatar TimoRoth commented on June 24, 2024

I do think that the actual issue in this case is on the pandas side. As they are the ones assuming that every subclass of datetime automatically has to have a nanosecond attribute, which there is absolutely no guarantee for.

from cftime.

spencerkclark avatar spencerkclark commented on June 24, 2024

@TimoRoth I understand that it would fix things in pandas if that was allowed, but I think it's important to note that pandas doesn't explicitly say they will support subclasses of datetime objects, which is the reason for their reluctance. Therefore I'm not sure I would frame it as a "bug" or "regression" in pandas; it would be more of a request for them to relax their assumptions.

The change that caused this was in cftime (and was made quite recently). For that reason I think we should at least reflect on whether that was a good decision before approaching pandas again.

from cftime.

spencerkclark avatar spencerkclark commented on June 24, 2024

For what it's worth though, I will note that others apparently would like something of this sort to be supported in pandas too -- this issue seems related: pandas-dev/pandas#21142.

from cftime.

fmaussion avatar fmaussion commented on June 24, 2024

@TimoRoth how much of an effort would it be to send a patch in Pandas? Jeff Reback seemed opened to it (pandas-dev/pandas#23419 (comment))

Let's just hope it doesn't imply messing around with C code...

from cftime.

spencerkclark avatar spencerkclark commented on June 24, 2024

The problem with this, and the reason it was changed in the first place, is that returning a different kind of object depending on the calendar and time period resulting in problems for users. I'm not in favor of reverting to this just as a workaround for how pandas deals with datetime subclasses.

Sorry @jswhit I was a little confused by your comment, though I can't seem to find it on GitHub anymore.

As a subclass of datetime.datetime or as a pure datetime.datetime object, cftime.real_datetime is still not the same type as cftime.datetime, so maybe I am not following, but it is not clear to me how #66 addressed the issue of returning different kinds of objects depending on the calendar and time period. As I noted earlier, #39 is a way to work around that particular issue.

That being said, I really don't feel very strongly about this, so I don't want to belabor the discussion. I merely thought given the recency of the change that it might warrant reconsidering in this case, but I see the other side of the argument too (i.e. that one might want dayofwk and dayofyr attributes on all date types returned by num2date). Whatever you ultimately decide to do is OK with me. Since I'm responsible for bringing discussion here, if the decision is to fix things in cftime, I'm happy to volunteer to do that work, but if the decision is to push the fix downstream to pandas (as you suggested in your comment), I would rather someone else take it on, as I am not particularly invested in this problem.

from cftime.

jswhit avatar jswhit commented on June 24, 2024

I deleted that comment after I realized it was completely off base, sorry. You are right that this particular change only enabled the extra two attributes (dayofwk, dayofyr). I would still prefer the issue be addressed in pandas, but I'm not opposed to adding a bogus nanosecond attribute as a workaround.

from cftime.

spencerkclark avatar spencerkclark commented on June 24, 2024

I would still prefer the issue be addressed in pandas, but I'm not opposed to adding a bogus nanosecond attribute as a workaround.

Perfect, thanks. Hopefully things work out in pandas, but it's good to have that as a fallback option here.

from cftime.

jswhit avatar jswhit commented on June 24, 2024

@fmaussion, can you verify that pull request #95 (branch issue77) fixes the issue?

from cftime.

spencerkclark avatar spencerkclark commented on June 24, 2024

I know I'm late on this, but thanks for adding the workaround @jswhit.

from cftime.

Westbrook077 avatar Westbrook077 commented on June 24, 2024

Sorry for likely prompting that issue to be closed there...one could try to fix this issue in pandas, but I understand to some extent their pushback. They do not make any guarantees that non-pandas.Timestamp subclasses of datetime.datetime objects will be supported, and the logic there regarding datetimes is already quite complicated.

Barring rolling back the change in #66 to add dayofwk and dayofyr properties to datetime.datetime via a subclass, as unsavory as it might sound, I think the most painless workaround here is in your code to convert the datetimes returned from num2date to numpy.datetime64 before passing them to the Series:

In [1]: import cftime, pandas

In [2]: times = cftime.num2date([0,1,2,3,4,5,500,1000], 'days since 1801-10-01')

In [3]: times
Out[3]:
array([real_datetime(1801, 10, 1, 0, 0), real_datetime(1801, 10, 2, 0, 0),
       real_datetime(1801, 10, 3, 0, 0), real_datetime(1801, 10, 4, 0, 0),
       real_datetime(1801, 10, 5, 0, 0), real_datetime(1801, 10, 6, 0, 0),
       real_datetime(1803, 2, 13, 0, 0), real_datetime(1804, 6, 27, 0, 0)], dtype=object)

In [4]: nptimes = times.astype('datetime64[ns]')

In [5]: nptimes
Out[5]:
array(['1801-10-01T00:00:00.000000000', '1801-10-02T00:00:00.000000000',
       '1801-10-03T00:00:00.000000000', '1801-10-04T00:00:00.000000000',
       '1801-10-05T00:00:00.000000000', '1801-10-06T00:00:00.000000000',
       '1803-02-13T00:00:00.000000000', '1804-06-27T00:00:00.000000000'], dtype='datetime64[ns]')

In [6]: pandas.Series(range(len(nptimes)), index=nptimes)
Out[6]:
1801-10-01    0
1801-10-02    1
1801-10-03    2
1801-10-04    3
1801-10-05    4
1801-10-06    5
1803-02-13    6
1804-06-27    7
dtype: int64

At least for now, NumPy seems to handle subclasses of datetime.datetime objects just fine in this conversion, and pandas is OK with numpy.datetime64.

Thx for your answer, but I found something wrong:
'''
In [6]: pandas.Series(range(len(nptimes)), index=nptimes)
Out[6]:
1801-10-01 0
1801-10-02 1
1801-10-03 2
1801-10-04 3
1801-10-05 4
1801-10-06 5
1803-02-13 6
1804-06-27 7
dtype: int64
'''

Maybe the right code shoud be below:
'''
pandas.Series(nptimes, index = range(len(nptimes)))
'''

from cftime.

spencerkclark avatar spencerkclark commented on June 24, 2024

I think it depends on your purpose. The motivation for using the integer range as the values and the times as the index was based on the usage in the initial comment: #77 (comment).

from cftime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.