Comments (6)
@clausmichele is right; a workaround for now is to drop the missing item. Or you could manually fix the input datetime properties to all be in the same format.
However, I consider this a stackstac bug. The problem indeed is with that one datetime near the end that doesn't have fractional seconds:
'2022-09-03T10:32:02.517000Z',
'2022-09-03T10:32:05Z',
'2022-09-05T10:22:19.490000Z',
'2022-09-05T10:22:20.519000Z']
The STAC spec just says that datetimes should be formatted according to RFC 3339, section 5.6. We can see there that time-secfrac
is an optional part of partial-time
, so this is still a valid datetime and should be parsed correctly.
I found that the issue seems to be related to the errors="coerce"
argument to pd.to_datetime
here, along with infer_datetime_format
:
stackstac/stackstac/prepare.py
Lines 408 to 412 in 694c686
Weirdly, with pandas 1.3.5, using errors='raise'
doesn't raise any error, but seems to parse correctly:
>>> ts = sorted(item.properties["datetime"] for item in items)
>>> pd.to_datetime(ts, infer_datetime_format=True, errors='raise')
DatetimeIndex(['2022-08-04 10:32:05.634000+00:00',
...
'2022-09-03 10:32:02.517000+00:00',
'2022-09-03 10:32:05+00:00',
'2022-09-05 10:22:19.490000+00:00',
'2022-09-05 10:22:20.519000+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
>>> pd.to_datetime(ts, infer_datetime_format=True, errors='coerce')
DatetimeIndex(['2022-08-04 10:32:05.634000', '2022-08-04 10:32:08.099000',
'2022-08-06 10:22:16.627000', '2022-08-06 10:22:18.455000',
'2022-08-09 10:32:12.888000', '2022-08-09 10:32:15.351000',
'2022-08-11 10:22:08.959000', '2022-08-11 10:22:10.785000',
'2022-08-14 10:32:04.138000', '2022-08-14 10:32:06.617000',
'2022-08-16 10:22:17.889000', '2022-08-16 10:22:19.720000',
'2022-08-19 10:32:13.767000', '2022-08-19 10:32:16.217000',
'2022-08-21 10:22:06.588000', '2022-08-21 10:22:08.412000',
'2022-08-24 10:32:01.411000', '2022-08-24 10:32:03.910000',
'2022-08-26 10:22:20.430000', '2022-08-26 10:22:21.453000',
'2022-08-29 10:32:13.577000', '2022-08-29 10:32:16.024000',
'2022-08-31 10:22:06.328000', '2022-08-31 10:22:08.153000',
'2022-09-03 10:32:02.517000', 'NaT',
'2022-09-05 10:22:19.490000', '2022-09-05 10:22:20.519000'],
dtype='datetime64[ns]', freq=None)
However, after switching to pandas 2, we do get a helpful and informative error:
>>> pd.__version__
2.0.3
>>> In [7]: pd.to_datetime(ts, errors='raise')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-0ac50347e0b9> in <cell line: 1>()
----> 1 pd.to_datetime(ts, errors='raise')
...
ValueError: time data "2022-09-03T10:32:05Z" doesn't match format "%Y-%m-%dT%H:%M:%S.%f%z", at position 25. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
This makes sense: there are multiple datetime formats present, so infer_datetime_format
(which became True by default in pandas 2) rightfully complains about this. The fact that it didn't raise an error in older versions was probably a bug.
Given that the STAC spec indicates all datetimes should be a subset of ISO 8601, format='ISO8601'
seems like just what we want:
>>> pd.to_datetime(ts, errors='raise', format='ISO8601')
DatetimeIndex(['2022-08-04 10:32:05.634000+00:00',
...
'2022-08-31 10:22:08.153000+00:00',
'2022-09-03 10:32:02.517000+00:00',
'2022-09-03 10:32:05+00:00',
'2022-09-05 10:22:19.490000+00:00',
'2022-09-05 10:22:20.519000+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
format='ISO8601'
is only available in pandas 2, so that's a compelling reason to require pandas 2, xref #213. (pandas-dev/pandas#41047 is also fixed in newer versions of pandas, allowing us to remove some old code for #2).
I think I'll probably fix this by using format='ISO8601'
and requiring pandas 2.
from stackstac.
You can drop them before trying to crop to the AOI like this:
from stackstac.
Thanks guys, that worked great. stack = stack.where(~np.isnat(stack.time), drop=True)
from stackstac.
Glad that works for you, but dropping data doesn't seem like an ideal workaround to me, so I'll reopen this until the underlying problem is fixed.
from stackstac.
To provide a bit more context to this bug, I have been using stackstac to document HLS (Landsat 8/9 + Sentinel-2) clear observation availability over Canada and came across these NaT time-steps.
My "fix" has been to drop the NaT time-steps in a similar manner as demonstrated here, and I have documenting how prevalent they are across the HLS archive. Originally, I thought this was a metadata issue within the HLS archive, and only recently realized this was a stackstac bug, otherwise I would have fixed the missing dates, rather than removing them.
For HLS, this impacts Sentinel-2 more than Landsat. Across the couple years of imagery I have documented so far (2018 - 2020), it impacts <0.1% of Landsat images and 0.2 - 1% of Sentinel-2 images depending on the year. It seems to be more prevalent in 2018 for S30 (1%), dropping to 0.2% by 2020.
Most of the time, the NaT time-steps are a couple images here and there (e.g., <5 in a year looking at all available imagery). However, I have noticed a few cases where almost every time-step in a year is NaT.
Here is a basic reproducible example of this case:
lon, lat = -91.4888563, 48.8885808 # All but first S30 will be NaT datetime
#lon, lat = -89.4888563, 48.8885808 # No NaT datetime
start = '2019-01-01' # In 2020 or starting on 01-03, no issue
end = '2019-12-31'
import os
import pystac_client as pc
gdalEnv = ss.DEFAULT_GDAL_ENV.updated(dict(
GDAL_DISABLE_READDIR_ON_OPEN = 'TRUE',
GDAL_HTTP_COOKIEFILE = os.path.expanduser('~/cookies.txt'),
GDAL_HTTP_COOKIEJAR = os.path.expanduser('~/cookies.txt'),
GDAL_HTTP_MAX_RETRY = 10,
GDAL_HTTP_RETRY_DELAY = 15))
catalog = pc.Client.open('https://cmr.earthdata.nasa.gov/stac/LPCLOUD')
itemsS30 = catalog.search(intersects = dict(type = 'Point', coordinates = [lon, lat]), datetime = f'{start}/{end}', collections = ['HLSS30.v2.0'], limit = 100).item_collection()
#itemsS30 # Can see datetime values in all images
import stackstac as ss
s30 = ss.stack(itemsS30, assets = ['Fmask'], epsg = 32615, resolution = 30, gdal_env = gdalEnv)
#s30 # Will see #NaT date-times here
s30.time.values # All but first are NaT
In this case, the first datetime in the stack is '2019-01-02T17:10:02.000000000'
. It appears that since this date time does not have fractional seconds, and it is the first datetime in the stack, all the other datetimes are considered invalid and become NaT. Obviously a much more problematic result than having just one time here and there becoming NaT! These situations are pretty rare (I have noticed it only 3-4 times across the few years of work I have done over Canada so far) - but wanted to mention here now that I know it is a stackstac issue rather than an HLS metadata issue.
When doing NRT work or phenology, for-example, every image counts! So good to document and hopefully fix these edge cases.
from stackstac.
Related Issues (20)
- Black stripes on mosaicking HOT 3
- README example raises `DoesNotConformTo` HOT 3
- Only support pystac? HOT 1
- Slicing KeyError HOT 2
- High RAM use during stackstac.stack(items, epsg=item_crs) HOT 2
- Basic example SSLCertVerificationError HOT 12
- Accessing VZA and SZA bands? HOT 1
- Error for assests with multiple bands is not raised correctly
- Dimension names from cube:dimensions HOT 1
- NetCDF Time Dimension HOT 1
- Create stac objects from stackstac xarray HOT 1
- Error using .sel on time index if timestamp not in np.datetime64
- Stackstac 0.5.0 produces bad pixel values for Sentinel-2 HOT 5
- 'bounds' is messing up the x, y coordinnates
- "errors_as_nodata" not working correctly HOT 1
- Could logging be disabled for skipping VRT
- Stacking local assets - slow performance
- Handle empty input better
- issue with numpy 2.0.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stackstac.