unidata / cftime Goto Github PK
View Code? Open in Web Editor NEWTime-handling functionality from netcdf4-python.
Home Page: https://unidata.github.io/cftime
License: MIT License
Time-handling functionality from netcdf4-python.
Home Page: https://unidata.github.io/cftime
License: MIT License
We are working on a cftime-compatible version of pandas' resample functionality in xarray (pydata/xarray#2593). In order to accomplish this, we are making heavy use of the datetime arithmetic functionality in cftime. There are two contexts where this occurs:
datetime.timedelta
object do a cftime.datetime
object to produce another cftime.datetime
objectcftime.datetime
objects to produce a datetime.timedelta
object.It is my understanding that in the first context, integer arithmetic is used, so the result is microsecond-precise; in the second context, however, things follow a different code path (using date2num
) and arithmetic is not necessarily exact (in my tests on a Mac this issue seems to have mostly gone away in cftime version 1.0.3.4; however @jwenfai has found on Windows that issues persist).
We are finding that having an exact result for the difference between two datetimes is important for our new use-case. A potential solution one might think of trying is to try to write a function that computes exact datetime differences.
Here is a potential function I came up with. Do you think that this would be a safe workaround for this issue, or might there be issues I'm not anticipating? Thanks for your help.
from datetime import timedelta
def exact_cftime_datetime_difference(a, b):
"""Exact computation of b - a
Assumes:
a = a_0 + a_m
b = b_0 + b_m
Here a_0, and b_0 represent the input dates rounded
down to the nearest second, and a_m, and b_m represent
the remaining microseconds associated with date a and
date b.
We can then express the value of b - a as:
b - a = (b_0 + b_m) - (a_0 + a_m) = b_0 - a_0 + b_m - a_m
By construction, we know that b_0 - a_0 must be a round number
of seconds. Therefore we can take the result of b_0 - a_0 using
ordinary cftime.datetime arithmetic and round to the nearest
second. b_m - a_m is the remainder, in microseconds, and we
can simply add this to the rounded timedelta.
Parameters
----------
a : cftime.datetime
Input datetime
b : cftime.datetime
Input datetime
Returns
-------
datetime.timedelta
"""
seconds = b.replace(microsecond=0) - a.replace(microsecond=0)
seconds = int(round(seconds.total_seconds()))
microseconds = b.microsecond - a.microsecond
return timedelta(seconds=seconds, microseconds=microseconds)
I am getting strange results with the function cftime.DateFromJulianDay
using the Gregorian Calendar:
cftime.DateFromJulianDay(1684958.5,calendar='gregorian')
cftime.DatetimeProlepticGregorian(-100, 3, 2, 23, 59, 59, 999967, -1, 1)
Thus effectively, the 3rd March, -100 BC.
A very small change to the Julian Day, give the 2nd March, -100 BC.
cftime.DateFromJulianDay(1684958.50001,calendar='gregorian')
cftime.DatetimeProlepticGregorian(-100, 3, 2, 0, 0, 0, 863974, -1, 1)
The issue also affect the standard calendar:
cftime.DateFromJulianDay(1684958.5,calendar='standard')
# returns cftime.DatetimeGregorian(-100, 3, 2, 23, 59, 59, 999967, -1, 1)
cftime.DateFromJulianDay(1684958.50001,calendar='standard')
# return cftime.DatetimeGregorian(-100, 3, 2, 0, 0, 0, 863974, -1, 1)
According to this converting web-site (http://aa.usno.navy.mil/jdconverter?ID=AA&jd=1684958.5), 2nd March, -100 BC should be correct.
I thing that cftimes uses the Jean Meeus algorithm, which should be correct for negative years (as long as the JD is positive).
Right now we just put a source tarball on pypi, along with binary wheels for windows from Christoph Gohlke's site. We could set up a MacPython project to build macos x and linux wheels, as we do or netcdf4-python, or use cibuildwheel. Anyone here know anything about cibuildwheel?
In pydata/xarray#1252 I am working on a NetCDFTimeIndex
that is intended to bring some of the features of pandas's DatetimeIndex (for now namely field accessors and partial datetime string indexing) to time indexes that use netcdftime._netcdftime.datetime
objects.
Currently one can construct non-sensical datetimes using netcdftime._netcdftime.datetime
objects:
In [1]: from netcdftime import DatetimeNoLeap
In [2]: DatetimeNoLeap(1, 45, 45)
Out[2]: netcdftime._netcdftime.DatetimeNoLeap(1, 45, 45, 0, 0, 0, 0, -1, 1)
Would it be possible for this kind of expression to raise an error? This would be nice, because that way if one tries to index with a slice involving an out-of-bounds datetime, it would automatically cause an error, rather than behave in the fashion below:
In [1]: import xarray as xr
In [2]: from xarray.conventions.netcdftimeindex import NetCDFTimeIndex
In [3]: from netcdftime import DatetimeNoLeap
In [4]: dates = [DatetimeNoLeap(1, 1, 1), DatetimeNoLeap(1, 2, 1), DatetimeNoLeap(2, 1, 1), DatetimeNoLeap(2, 2, 1)]
In [5]: da = xr.DataArray([1, 2, 3, 4], coords=[NetCDFTimeIndex(dates)], dims=['time'])
In [6]: da.sel(time=slice(DatetimeNoLeap(1, 1, 1), DatetimeNoLeap(1, 45, 45)))
Out[6]:
<xarray.DataArray (time: 2)>
array([1, 2])
Coordinates:
* time (time) object 1-01-01 00:00:00 1-02-01 00:00:00
cc @shoyer
#66 enabled working dayofyr
and dayofwk
attributes for all cftime dates constructed manually, which is great!
I've noticed that if one constructs datetime objects using replace
, that the dayofyr
and dayofwk
attributes are not automatically updated:
In [21]: date = cftime.DatetimeNoLeap(1, 2, 1)
In [22]: date.dayofyr
Out[22]: 32
In [23]: date.replace(year=2, month=5).dayofyr
Out[23]: 32
A workaround is to specify dayofwk=-1
within replace
:
In [24]: date.replace(year=2, month=5, dayofwk=-1).dayofyr
Out[24]: 121
Is this intentional? Should we not pass down the old dayofyr
and dayofwk
attributes to the new date object in replace
? I'm happy to provide a PR to change this behavior, if desired.
cftime
seems to think it requires a different set of packages to run in the conda recipe:
- python
- setuptools
- numpy x.x
and in setup.py
(via requirements.txt
):
numpy
cython
setuptools>=18.0
The requirement for cython
appears to be retained in such a way that a conda environment that includes cftime
but not cython
and tries to use make use of pkg_resources
gets the error:
...
pkg_resources.DistributionNotFound: The 'cython' distribution was not found and is required by cftime
The error is resolved by installing cython
or including it as an (unnecessary) dependency of any conda package that depends on cftime
but it would be more convenient if cython
were only included in setup.py
in setup_requires
and not in install_requires
.
So far, I haven't been able to create a super simple test that reproduces the issue. I see it when I try to build a conda package from a repo that I'm just starting to develop. Here is where I seem to need to include cython
, though I don't think it should be needed:
https://github.com/xylar/misomip1analysis/blob/initial_stub/conda/recipe/meta.yaml#L28
The IRI Data Library contains tons of datasets with the following time attributes:
float32 T(T) ;
T:standard_name = time ;
T:pointwidth = 1.0 ;
T:long_name = Time ;
T:calendar = 360 ;
T:expires = 1538524800 ;
T:gridtype = 0 ;
T:units = months since 1960-01-01 ;
(edited to correct typo in original post)
I would like to be able to open and decode these datasets in xarray, with time decoding handled by cftime.
There are two problems:
360_day
. But that is easy to fix by rewriting the calendar attribute.calendar = 360_day
, months
is not considered a valid time unitHowever, in a 360-day calendar month==30 days, so this should be valid.
This was discussed over in Unidata/netcdf4-python#434 (comment), where @jswhit commented:
A pull request allowing months since when calendar is 360_day would be welcome
Unlike the built-in datetime
, cftime.datetime.timetuple()
returns a tuple
and not a time.timetuple
. I'll submit a PR to correct this.
There have been 134 commits since the release of cftime 1.0.0 in May.
Give the large number of recent bugfixes and feature additions, perhaps a new release soon would be appropriate?
Thanks to everyone who works on this important package.
There are two separate definitions of date2index
in _netcdftime.pyx, at lines 286 and and 1309. I assume one of these should go away.
cythonize
should not be executed in the clean target, this breaks Debian package build by modifying the source tree outside the build chroot.
The following patch fixes the issue:
--- a/setup.py
+++ b/setup.py
@@ -75,9 +75,14 @@ if FLAG_COVERAGE in sys.argv or os.envir
sys.argv.remove(FLAG_COVERAGE)
print('enable: "linetrace" Cython compiler directive')
-extension = Extension('{}._{}'.format(NAME, NAME),
- sources=[CYTHON_FNAME],
- define_macros=DEFINE_MACROS)
+ext_modules = []
+if "clean" not in sys.argv:
+ extension = Extension('{}._{}'.format(NAME, NAME),
+ sources=[CYTHON_FNAME],
+ define_macros=DEFINE_MACROS)
+ ext_modules = cythonize(extension,
+ compiler_directives=COMPILER_DIRECTIVES,
+ language_level=2)
setup(
name=NAME,
@@ -89,9 +94,7 @@ setup(
cmdclass={'clean_cython': CleanCython},
packages=[NAME],
version=extract_version(),
- ext_modules=cythonize(extension,
- compiler_directives=COMPILER_DIRECTIVES,
- language_level=2),
+ ext_modules=ext_modules,
setup_requires=load('setup.txt'),
install_requires=load('requirements.txt'),
tests_require=load('requirements-dev.txt'))
As discussed in #32, when netCDF4 is installed its version of netcdftime overrides any installation of this package. It is very difficult to expect scientific python users to have the most up to date version of netCDF4, so up to date that it does not exist yet. This gives very strange behavior that is difficult to debug unless you've already run into it.
There is a really simple fix for this, which is to rename the netcdftime package, or provide an alias package name. This would remove a massive package incompatibility problem.
At the very least, netcdftime should require that these older versions of netCDF4 are not installed. This would allow me to tell users that they need to install netcdftime to get X functionality, and have a guarantee that when they do so they will not run into these import problems. One way to do this would be to check in netCDF4 whether netCDF4 is installed, and if it is installed to add a dependency on netCDF4 having at least the required version.
If you have already classified this software with ECCN, please confirm the applicable number.
If you do not have your software classified with an ECCN, please kindly answer the following questions so that we may self-assess:
1, Does the Software perform any encryption or utilize any encryption processes? Y/N
2. If the answer is YES to question 1, please indicate if the encryption is coded into the application or separately called (such as using SSL)
3. If the answer is YES to question 1, please indicate what function(s) the cryptography/encryption serves
A, Copyright protection purposes (Includes using a license key/code)
B, User authentication purposes
C, A core part of the functionality such as to encrypt databases
D, To encrypt communications between the software and a host system.
There's an error in the num2date function when using the Julian calendar in cftime
. The num2date
does not correctly produce the day of year:
ie:
import cftime
In [6]: cftime.num2date(16,'day since 1950-01-01 00:00:00.0000000',calendar='365_day')
Out[6]: cftime._cftime.DatetimeNoLeap(1950, 1, 17, 0, 0, 0, 0, 1, 17)
In [7]: cftime.num2date(16,'day since 1950-01-01 00:00:00.0000000',calendar='julian')
Out[7]: cftime._cftime.DatetimeJulian(1950, 1, 17, 0, 0, 0, 0, -1, 1)
The day of year (final number in the date output) is calculated correctly in the 365_day
calendar, but is set to 1
in the julian calendar.
The timezone parsing does not seem to follow the same standard that I can find on Wikipedia (https://en.wikipedia.org/w/index.php?title=ISO_8601&oldid=793312108#Time_zone_designators). Unfortunately, I cannot find any primary ISO source for the exact timezone format, so I have to rely solely on Wikipedia for the time being.
The page referenced in the code (https://github.com/Unidata/netcdftime/blob/1904de5c81fff7e40f3bbb23906dc2db83d7aaa1/netcdftime/_netcdftime.pyx#L33) is also not available any more. But judging from the archived version (https://web.archive.org/web/20161009150515/http://delete.me.uk/2005/03/iso8601.html), it suffers some - but not all of the format errors (see below).
What I think is wrong:
Both problems are relatively easy to fix by slightly modifying the regexes and maybe the parser code itself. If we can agree on the above points, I can work on a pull request to correct this.
cftime._cftime._dateparse
returns quite opaque error messages that can lead to issues tracking down errors.
For example:
>>> cftime._cftime._dateparse('days_since_1900-01-01')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "cftime/_cftime.pyx", line 56, in cftime._cftime._dateparse
IndexError: list index out of range
The code assumes the string can be split by whitespace into two strings. In the above example it cannot.
On Debian in a Dockerfile with Pipfile:
RUN pip install pipenv && pipenv install
with cftime = "*" in Pipfile
results in:
pipenv.patched.notpip._internal.exceptions.InstallationError: Command "python setup.py egg_info" failed with error code 1 in /tmp/tmp3dy86zfkbuild/cftime/
Version 1.0.0 or also 1.0.2.1 does not have this problem.
Installation with pip install cftime works for both versions.
I've written a PR for Sympl (code) to start integrating netcdftime datetime-like objects into Sympl models. Since the documentation is quite sparse and a little vague, I thought I should ask if there are any differences between the built-in datetime and netcdftime's datetime-like objects that I should be aware of and warn users about? You're welcome to comment on the PR directly.
I was also confused as to why there was a DatetimeProlepticGregorian to override the built-in datetime, so I've used the built-in one instead for that calendar option. Any insight into whether this is a good idea would be appreciated.
It has been suggested that my issue may be of interest/discussed/resolved through this group. I’ll present the questions pre-emptively as it may help understand the description of my problem:
Are CF dates constrained as being positive years?
Are code updates planned/required to allow for the use of the ISO/DIS 8601-2 standard to allow for negative dates?
We have NetCDF data files (with SeaDataNet and CF conventions) with a date channel as http://vocab.nerc.ac.uk/collection/P01/current/CJDY1101/
double TIME(INSTANCE) ;
TIME:long_name = "Chronological Julian Date" ;
TIME:sdn_parameter_urn = "SDN:P01::CJDY1101" ;
TIME:sdn_parameter_name = "Julian Date (chronological)" ;
TIME:sdn_uom_urn = "SDN:P06::UTAA" ;
TIME:sdn_uom_name = "Days" ;
TIME:units = "days since -4713-01-01T00:00:00Z" ;
TIME:standard_name = "time" ;
TIME:axis = "T" ;
TIME:ancillary_variables = "TIME_SEADATANET_QC" ;
TIME:calendar = "julian" ;
TIME:_FillValue = -99999. ;
byte TIME_SEADATANET_QC(INSTANCE) ;
….
Running the data file through the CFchecker software (http://pumatest.nerc.ac.uk/cgi-bin/cf-checker.pl) fails with
File "netcdftime/_netcdftime.pyx", line 715, in netcdftime._netcdftime.utime.init (netcdftime/_netcdftime.c:11201)
ValueError: negative reference year in time units, must be >= 1
This was reported to the CFchecker software, with the response from the developer that
cfunits is throwing the error
"netCDF4-python throws an error if real world calendars have negative years (https://github.com/Unidata/cftime/blob/master/cftime/_cftime.pyx#L140-L147)"
The following trail seems to imply that we are working correctly with the date, in particular having a negative year for the time origin:
http://cfconventions.org/Conformance/conformance.html
https://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html#bp_Calendar-Date-Time
refers to the udunits using ISO8601
https://en.wikipedia.org/wiki/ISO_8601
refers to
To represent years before 0000 or after 9999, the standard also permits the expansion of the year representation but only by prior agreement between the sender and the receiver.
https://www.iso.org/news/2017/02/Ref2164.html
refers to simply adding a minus sign.
https://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/CalendarDateTime.html refers to a minus date.
I hope this rather long description makes some sort of sense.
Ray
Good afternoon,
Here is an issue I posted on Unidata/netcdf4-python#810 two weeks ago, and I have been advised to submit it here. Here it is below:
You may already be aware of this problem, but I didn't find how to solve it.
I have an output with hourly intervals written in days since 1979-01-01 00:00:00.
The first time step is 0. The second is 0.041666666667.
I have noticed that num2date does not give the same results when reading the second time step depending on the calendar used. It sometimes give 1979-01-01 01:00:00 (which is what we want), and sometimes 1979-01-01 00:59:59 (see below).
How can we fix it so num2date would give the correct result (01:00:00. 02:00:00, etc.) whatever calendar is used?
Thank you for your help,
Marie-Estelle
time_in1=0.041666666667
print(time_in_units)
days since 1979-01-01 00:00:00
dt_in1 = num2date(time_in1,time_in_units,calendar='standard')
str(dt_in1)
'1979-01-01 01:00:00'
dt_in1 = num2date(time_in1,time_in_units,calendar='gregorian')
str(dt_in1)
'1979-01-01 01:00:00'
dt_in1 = num2date(time_in1,time_in_units,calendar='proleptic_gregorian')
str(dt_in1)
'1979-01-01 01:00:00'
dt_in1 = num2date(time_in1,time_in_units,calendar='noleap')
str(dt_in1)
'1979-01-01 00:59:59'
dt_in1 = num2date(time_in1,time_in_units,calendar='365_day')
str(dt_in1)
'1979-01-01 00:59:59'
dt_in1 = num2date(time_in1,time_in_units,calendar='360_day')
str(dt_in1)
'1979-01-01 00:59:59'
dt_in1 = num2date(time_in1,time_in_units,calendar='julian')
str(dt_in1)
'1979-01-01 01:00:00'
dt_in1 = num2date(time_in1,time_in_units,calendar='all_leap')
str(dt_in1)
'1979-01-01 00:59:59'
dt_in1 = num2date(time_in1,time_in_units,calendar='366_day')
str(dt_in1)
'1979-01-01 00:59:59'
I have been trying to build with icc
, but that failes as for the linking of the so, gcc
is used instead.
Other extensions seem to work fine.
You can reproduce this issue as well without an extra compiler:
CC=echo CXX=echo python3 setup.py build_ext
I have tested other extensions, and they use echo for compiling and linking, whereas cftime uses it only for compiling, and gcc for linking.
This issue is present in master, as well as cftime-1.0.0 from pypi.
netcdftime.datetime
objects constructed directly don't have the correct day of year or day of week; they always give a value of 1 and -1 respectively. Objects returned by utime.num2date
have the correct values, except for those with Julian calendars, which still have the issue.
>>> import netcdftime
>>> print netcdftime.__version__
1.4.1
>>> date = netcdftime.DatetimeNoLeap(2000, 1, 2)
>>> print repr(date)
netcdftime._netcdftime.DatetimeNoLeap(2000, 1, 2, 0, 0, 0, 0, -1, 1)
>>> print date.dayofyr
1
>>> print date.dayofwk
-1
Python version: 2.7
netdf4 version: 1.2.7
Installed via conda forge.
The sdist archive, cftime-1.0.3.1.tar.gz
, on PyPI contains an absolute path in cftime.egg-info\SOURCES.txt
such that installation from source fails:
<snip>
running egg_info
writing cftime.egg-info\PKG-INFO
writing dependency_links to cftime.egg-info\dependency_links.txt
writing requirements to cftime.egg-info\requires.txt
writing top-level names to cftime.egg-info\top_level.txt
reading manifest file 'cftime.egg-info\SOURCES.txt'
Traceback (most recent call last):
File "setup.py", line 122, in <module>
'License :: OSI Approved'])
File "X:\Python36\lib\site-packages\setuptools\__init__.py", line 143, in setup
return distutils.core.setup(**attrs)
File "X:\Python36\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "X:\Python36\lib\distutils\dist.py", line 955, in run_commands
self.run_command(cmd)
File "X:\Python36\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "X:\Python36\lib\site-packages\wheel\bdist_wheel.py", line 224, in run
self.run_command('install')
File "X:\Python36\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "X:\Python36\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "X:\Python36\lib\site-packages\setuptools\command\install.py", line 61, in run
return orig.install.run(self)
File "X:\Python36\lib\distutils\command\install.py", line 557, in run
self.run_command(cmd_name)
File "X:\Python36\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "X:\Python36\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "X:\Python36\lib\site-packages\setuptools\command\install_egg_info.py", line 34, in run
self.run_command('egg_info')
File "X:\Python36\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "X:\Python36\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "X:\Python36\lib\site-packages\setuptools\command\egg_info.py", line 296, in run
self.find_sources()
File "X:\Python36\lib\site-packages\setuptools\command\egg_info.py", line 303, in find_sources
mm.run()
File "X:\Python36\lib\site-packages\setuptools\command\egg_info.py", line 534, in run
self.add_defaults()
File "X:\Python36\lib\site-packages\setuptools\command\egg_info.py", line 577, in add_defaults
self.read_manifest()
File "X:\Python36\lib\site-packages\setuptools\command\sdist.py", line 199, in read_manifest
self.filelist.append(line)
File "X:\Python36\lib\site-packages\setuptools\command\egg_info.py", line 476, in append
path = convert_path(item)
File "X:\Python36\lib\distutils\util.py", line 125, in convert_path
raise ValueError("path '%s' cannot be absolute" % pathname)
ValueError: path '/private/tmp/cftime/cftime/_cftime.c' cannot be absolute
It would be nice if the super class for all netcdftime datetime objects (netcdftime._netcdftime.datetime
) were exposed in the public API; this would allow one to succinctly do type checking.
Currently, netcdftime.datetime
refers to the DatetimeProlepticGregorian
object. Is this intended, or could it be changed to point to netcdftime._netcdftime.datetime
?
In [1]: from netcdftime import datetime, DatetimeAllLeap
In [2]: datetime(1, 1, 1)
Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1)
In [3]: test = DatetimeAllLeap(1, 1, 1)
In [4]: isinstance(test, datetime)
Out[4]: False
In [5]: from netcdftime._netcdftime import datetime as super_datetime
In [6]: isinstance(test, super_datetime)
Out[6]: True
cc @shoyer
xref: pydata/xarray#1252 (comment)
Recently in the xarray mailing list (here) and in aospy (here) the need has arisen for the partial datetime string parsing that is currently implemented in xarray as private API, which we therefore don't want to rely on. @spencerkclark argued, and I think I agree, that this functionality is outside the scope of xarray in terms of making it public API.
Separately, @mcgibbon raised the notion of porting it to cftime, which I like the idea of also. Does this seem reasonable? It seems like such a common datetime-related need that it would well fit in the scope of cftime.
CC: @shoyer for any thoughts from the xarray side
Not sure what's going on here:
from cftime import num2date
num2date(np.ma.array([0., 1., 2.]), 'hours since 1980-06-03 12:00')
gives
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-56-fbdbe6e3afd0> in <module>()
3 time_units = 'hours since 1980-06-03 12:00'
4
----> 5 num2date(np.ma.array([0., 1., 2.]), time_units)
cftime/_cftime.pyx in cftime._cftime.num2date()
cftime/_cftime.pyx in cftime._cftime.utime.num2date()
~/miniconda3/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in reshape(a, newshape, order)
277 [5, 6]])
278 """
--> 279 return _wrapfunc(a, 'reshape', newshape, order=order)
280
281
~/miniconda3/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
49 def _wrapfunc(obj, method, *args, **kwds):
50 try:
---> 51 return getattr(obj, method)(*args, **kwds)
52
53 # An AttributeError occurs if the object does not have
ValueError: cannot reshape array of size 1 into shape (3,)
It works if I convert the data to a regular array.
xref: pydata/xarray#1252 (comment)
When encoding datetime-like objects for storage in netCDF files, xarray follows the standard convention (converting dates to a series of numbers representing some time unit since a reference date). The units are encoded as a string attribute of a variable in a netCDF file, e.g. 'days since 2000-01-01 00:00:00'
. In the process of constructing this string upon saving the file, xarray uses the repr of the datetime object stored in the array. Additionally, xarray's datetime decoding logic depends on the use of the pandas.Timestamp
constructor to assess whether a reference date with a standard calendar (encoded as a string) can be represented using a Timestamp
object (allowing the time series to be decoded fully without the optional dependency of netCDF4) or needs to be represented using a netcdftime
object (requiring the optional dependency of netCDF4 to be decoded).
In experimenting with round-tripping arrays containing datetime objects outside the Timestamp-valid range to netCDF files and back to xarray objects within pydata/xarray#1252, the non-zero-padded repr of netcdftime
objects (used in encoding the dates) and the datetime parser used by pandas in attempting to decode the dates do not always play well together. See for example:
In [1]: import xarray as xr
In [2]: from netcdftime import DatetimeProlepticGregorian
In [3]: da = xr.DataArray([DatetimeProlepticGregorian(1, 1, 1), DatetimeProlepticGregorian(1, 2, 1)])
In [4]: da
Out[4]:
<xarray.DataArray (dim_0: 2)>
array([netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1),
netcdftime._netcdftime.DatetimeProlepticGregorian(1, 2, 1, 0, 0, 0, 0, -1, 1)], dtype=object)
Dimensions without coordinates: dim_0
In [5]: da.to_dataset(name='time').to_netcdf('test-roundtrip.nc')
In [6]: xr.open_dataset('test-roundtrip.nc')
Out[6]:
<xarray.Dataset>
Dimensions: (dim_0: 2)
Dimensions without coordinates: dim_0
Data variables:
time (dim_0) datetime64[ns] 2001-01-01 2001-02-01
Upon closer inspection, one can find that this could be fixed by using a repr with a zero-padded year when encoding the datetimes:
In [7]: ds = xr.open_dataset('test-roundtrip.nc', decode_times=False)
In [8]: ds.time
Out[8]:
<xarray.DataArray 'time' (dim_0: 2)>
array([ 0, 31])
Dimensions without coordinates: dim_0
Attributes:
units: days since 1-01-01 00:00:00
calendar: proleptic_gregorian
In [9]: ds.time.attrs['units'] = 'days since 0001-01-01 00:00:00'
In [10]: xr.decode_cf(ds)
/Users/spencerclark/xarray-dev/xarray/xarray/conventions/coding.py:416: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range
result = decode_cf_datetime(example_value, units, calendar)
Out[10]: /Users/spencerclark/xarray-dev/xarray/xarray/conventions/coding.py:435: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range
calendar=self.calendar)
<xarray.Dataset>
Dimensions: (dim_0: 2)
Dimensions without coordinates: dim_0
Data variables:
time (dim_0) object 0001-01-01 0001-02-01
Would using repr with a zero-padded year possibly make sense for netcdftime
or should we look for an alternative solution to this issue?
@dopplershift - can you activate travis CI for this repo?
1.0.3 is lacking Windows wheels. Releasing a new patch version without releasing the wheels at the same time means that anyone trying to pip install it on Windows without specifying an exact version now sees failures. 1.0.2.1 has Windows wheels, and 1.0.3 as a patch release should be fully compatible to the old version, but this is not the case if the wheels are missing. This is especially annoying since cftime is often pulled in as transitive dependency, e.g. from the netCDF4 package. So pinning netCDF4 to a fixed version alone doesn't help, I would also have to pin the cftime version to 1.0.2.1 in that case to prevent any possible issues.
The point of this issue is to highlight the problem of release timelines. It's not enough to upload binary wheels days or weeks after the source package was released to PyPI. Ideally it should be an atomic operation where a new version immediately comes with all the wheels.
My apologies for getting this started and then letting it slide for so long. I'd like to aim for getting a first release of this package by the end of February.
I have created a milestone on the issue tracker for this release. Some other things to consider:
cc @dopplershift, @spencerkclark, @jswhit
from @dopplershift:
I'm happy to help step up on the infrastructure side (well, in a couple weeks probably), but I'm a little uncomfortable with stepping up fully since I don't have much technical knowledge in the space netcdftime is operating. So I'm happy to cut releases and stuff, but I'm not in possession of answers when it comes to solutions to problems.
If you could help get the documentation working, I can spend some cycles on getting the test infrastructure working again. I'm hoping @spencerkclark can help wrap up his open issues in that time frame.
I'm looking to submit a documentation PR, but am having trouble getting documentation to build locally with the same result as the web docs.
Steps I am following:
python setup.py build
python setup.py install
cd docs && make html
Result I expect: The docs in _build/html should be the same as on the web.
Result I get: During build, this is the log:
[mcgibbon@stcu docs]$ make html
Running Sphinx v1.5.2
making output directory...
loading pickled environment... not yet created
loading intersphinx inventory from https://docs.python.org/objects.inv...
intersphinx inventory has moved: https://docs.python.org/objects.inv -> https://docs.python.org/3/objects.inv
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: 1 added, 0 changed, 0 removed
reading sources... [100%] index
/home/disk/eos4/mcgibbon/python/netcdftime/docs/index.rst:11: WARNING: missing attribute mentioned in :members: or __all__: module netcdftime, attribute date2num
/home/disk/eos4/mcgibbon/python/netcdftime/docs/index.rst:11: WARNING: missing attribute mentioned in :members: or __all__: module netcdftime, attribute num2date
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
generating indices... genindex py-modindex
writing additional pages... search
copying static files... done
copying extra files... done
dumping search index in English (code: en) ... done
dumping object inventory... done
build succeeded, 2 warnings.
Build finished. The HTML pages are in _build/html.
In the resulting html docs, date2num and num2date are missing.
Further to all this, if I add netcdftime.datetime
to the list of automodule members on netcdftime, I get the docstring "alias of DatetimeProlepticGregorian", rather than the docstring I get in IPython:
In [4]: nt.datetime?
Docstring:
Phony datetime object which mimics the python datetime object,
but allows for dates that don't exist in the proleptic gregorian calendar.
Supports timedelta operations by overloading + and -.
Has strftime, timetuple, replace, __repr__, and __str__ methods. The
format of the string produced by __str__ is controlled by self.format
(default %Y-%m-%d %H:%M:%S). Supports comparisons with other phony
datetime instances using the same calendar; comparison with
datetime.datetime instances is possible for netcdftime.datetime
instances using 'gregorian' and 'proleptic_gregorian' calendars.
Instance variables are year,month,day,hour,minute,second,microsecond,dayofwk,dayofyr,
format, and calendar.
File: /home/disk/p/mcgibbon/anaconda/lib/python2.7/site-packages/netcdftime/_netcdftime.so
Type: type
The following was raised in SciTools/cf-units#77:
>>> import cftime
>>> cftime.DateFromJulianDay(2450022.5, "standard")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "cftime/_cftime.pyx", line 709, in cftime._cftime.DateFromJulianDay
ValueError: day is out of range for month
I haven't taken the time to dig further into this, but looks like reasonable input to the function.
When doing a bit of digging, I also tried:
>>> cftime.DateFromJulianDay(2450022.5, "standard", only_use_cftime_datetimes=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "cftime/_cftime.pyx", line 709, in cftime._cftime.DateFromJulianDay
File "cftime/_cftime.pyx", line 1759, in cftime._cftime.DatetimeGregorian.__init__
File "cftime/_cftime.pyx", line 1903, in cftime._cftime.assert_valid_date
ValueError: invalid day number provided in cftime.DatetimeGregorian(1995, 10, 32, 23, 59, 59, 999952, -1, 1)
Building off of #33, it would be ideal to have complete docstrings that include method signatures and method docstrings for each datetime-like class. Currently these do not appear in the documentation.
The accuracy of the current algorithm is about a millisecond, which can cause suprising results due to roundoff errors (see issue #54).
One simple way to improve the accuracy would be to modify the routines so that they represent the Julian day as two floats, instead of just one, as is done with jdcal. From the jdcal pypi page:
"Julian dates are stored in two floating point numbers (double). Julian dates, and Modified Julian dates, are large numbers. If only one number is used, then the precision of the time stored is limited. Using two numbers, time can be split in a manner that will allow maximum precision. For example, the first number could be the Julian date for the beginning of a day and the second number could be the fractional day. Calculations that need the latter part can now work with maximum precision."
netcdf4
version 1.4.0 uses cftime
and it will be released soon on PyPI. (It is already available on GitHub and conda-forge is shipping it.)
It would be nice if we could have a stable release of cftime
to accompany netcdf4 1.4.0
.
See pandas-dev/pandas#23419 for context.
The issue is following:
import pandas, numpy, netCDF4
time=netCDF4.num2date([0,1,2,3,4,5,500,1000], 'days since 1801-10-01')
pandas.Series(numpy.arange(len(time)), index=time)
Fails with
AttributeError: 'real_datetime' object has no attribute 'nanosecond'
@spencerkclark says that this is due to the fact that netcdf4 now returns cftime objects per default (always). I guess it is a necessary change (having different time objects to deal with depending on the calendar is a pain), but it makes me wonder about the level of compatibility between cftime objects and the rest of the pydata stack.
I'm no specialist at all here, so I don't know what it would involve, but it would be nice if cftime objects could be handled by pandas.
The easiest fix seems to simply add a nanoseconds
attribute to the cftime.real_datetime
objects.
The Debian package build for cftime 1.0.3 failed on 32bit architectures due to test failures, see:
https://buildd.debian.org/status/package.php?p=cftime
The buildlog contains many TypeError
issues like this:
=================================== FAILURES ===================================
____________________________ cftimeTestCase.runTest ____________________________
self = <test_cftime.cftimeTestCase testMethod=runTest>
def runTest(self):
"""testing cftime"""
# test mixed julian/gregorian calendar
# check attributes.
self.assertTrue(self.cdftime_mixed.units == 'hours')
self.assertTrue(
str(self.cdftime_mixed.origin) == '0001-01-01 00:00:00')
self.assertTrue(
self.cdftime_mixed.unit_string == 'hours since 0001-01-01 00:00:00')
self.assertTrue(self.cdftime_mixed.calendar == 'standard')
# check date2num method. (date before switch)
d = datetime(1582, 10, 4, 23)
t1 = self.cdftime_mixed.date2num(d)
assert_almost_equal(t1, 13865687.0)
# check num2date method.
> d2 = self.cdftime_mixed.num2date(t1)
test/test_cftime.py:99:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cftime/_cftime.pyx:874: in cftime._cftime.utime.num2date
???
cftime/_cftime.pyx:492: in cftime._cftime.DateFromJulianDay
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E TypeError: object of type 'numpy.int32' has no len()
Full buildlogs for the achitectures in question: armel, armhf, i386, mips, mipsel, hppa, hurd-i386
I think we should probably test this package on windows. I have no experience setting up Appveyor so I'm looking for someone that does and wants to see this package supported on Windows.
I don't know if this is related to #54 but I have noticed what I would call "unwanted precision" when interconverting between dates and numbers:
In [1]: from cftime import num2date, date2num
In [2]: calendar = 'noleap'
In [3]: u1970 = 'days since 1970-01-01'
In [4]: u1980 = 'days since 1980-01-01'
In [5]: date_1980 = num2date(0,u1980,calendar)
In [6]: date_1980
Out[6]: cftime.DatetimeNoLeap(1980, 1, 1, 0, 0, 0, 0, 6, 1)
In [7]: date2num(date_1980,u1970,calendar)
Out[7]: 3650.000000000001
I think this could be characterised as erroneous, The return value of the last date2num
call should be exactly 3650.
xref: pydata/xarray#1929
Using the current version of netCDF4.num2date
the following works:
>>> from netCDF4 import num2date
>>> import numpy as np
>>> num2date(np.arange(2), 'days since 1000-01-01', 'gregorian')
array([datetime.datetime(1000, 1, 1, 0, 0),
datetime.datetime(1000, 1, 2, 0, 0)], dtype=object)
However, using the current netcdftime.num2date
, it does not:
>>> from netcdftime import num2date
>>> import numpy as np
>>> num2date(np.arange(2), 'days since 1000-01-01', 'gregorian')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "netcdftime/_netcdftime.pyx", line 283, in netcdftime._netcdftime.num2date
File "netcdftime/_netcdftime.pyx", line 1165, in netcdftime._netcdftime.utime.num2date
File "netcdftime/_netcdftime.pyx", line 673, in netcdftime._netcdftime.DateFromJulianDay
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This error came up in the Xarray test suite (not in a real-world use-case). In my reading, it seems the Gregorian calendar does not start until October 15th, 1582, so should this be possible (as it is in netCDF4
) or should it not be? In other words, in this case is Xarray testing something that doesn't make sense? How does (or should?) netcdftime
handle this situation?
In a Travis build running pip install cftime
(line 458), the command is failing with a message that indicates there is no cftime available for Python 2.7. This seems wrong to me since there is a source distribution available on pypi. Is this a problem on your end or on mine, and any ideas what the cause may be?
How should we host this package's documentation. Do we want it to live with netcdf4-python
? Or should we host it elsewhere?
We need to document that netcdftime is available on pypi and conda-forge. Probably also how to build from source.
copy of Unidata/netcdf4-python#639
When a netcdftime datetime object is unable to be compared to another object, the richcmp method raises a TypeError. In previous versions of the module the method would return NotImplemented, causing the appropriate method on the other object to be called and allowing classes to be written which could be compared against netcdftime datetime objects. With the TypeError behaviour this is no longer the case.
Can the previous behaviour be restored?
There is a test class in test_netcdftime.py
that uses netcdf4-python. This can probably be removed or conditionally tested only when netcdf4-python is available.
In response to pydata/xarray#1252 (comment), @shoyer wrote:
I think netcdftime should have a version of num2date that always returns
netcdftime.datetime objects. It's really error prone to functions that turn
different types dependent on input values.
Would this be something folks would be open to? Somewhat selfishly this would make things easier downstream in pydata/xarray#1252, because we would only need to keep track of two datetime types for indexing and it would hopefully simplify the logic to encode datetimes for faithful roundtripping via netcdftime.num2date
.
num2date
largely does this already. I think it currently only returns datetime.datetime
objects in the following situations (though please correct me if there are others):
'gregorian'
or 'proleptic_gregorian'
num2date
includes a date after the beginning of the Gregorian calendar and the dates are encoded using a calendar type of 'standard'
.If we could agree on how this alternative version of num2date
would work in these situations I would be happy to put together a PR. I'm not sure if we might want to define a new function or expose this via a keyword argument in the existing num2date
.
A fix for fully functioning rich comparisons at python 3 was introduced in #53 but without a unit test. I just want to record in an issue what such a unit test would look like so when I get time to write it I haven't forgotten!
I am imagining creating two test classes that can be compared to cftime instances. One of which is designed to play nice, and one which isn't. Instances of these classes should be compared to a cftime instance in both directions, i.e. a .cmp. b and b .cmp. a. The results will then be dependent on which version of python is being used.
As discussed in Unidata/netcdf4-python#767, setup.py calls from Cython.Build import cythonize
before it requires Cython to be installed. As a consequence, building without Cython previously installed fails with the following error:
Traceback (most recent call last):
File "setup.py", line 3, in <module>
from Cython.Build import cythonize
ImportError: No module named Cython.Build
@jhamman Hope all is well with you since the AOSPy Workshop! I'm putting an issue in here to address some missing capabilities that are CF compliant, but not fully supported by the netcdftime module.
This issue can be immediately seen by trying to initialize a utime instance with the 'common_year' or 'year' units (which are UDUNITS supported):
utime('common_years since 1-1-1 0:0:0, calendar='noleap')
which immediately errors with the error:
ValueError: units must be one of 'seconds', 'minutes', 'hours' or 'days' (or singular version of these), got 'years'
My understanding is that this should work, since 'common_year' is a fixed multiple of 'day' (i.e., 365 days, regardless of calendar). However, netcdftime does not support any year-like units.
I'm interested in implementing this ASAP, so I might fork the repo and put in a PR soon-ish.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.