Code Monkey home page Code Monkey logo

nowcasting_dataset's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nowcasting_dataset's Issues

Try simple approach: Multiple DataLoader workers, each loads samples at random

Do #15 first.

No manually-coded multi-process stuff. Just use DataLoader's worker processes. prefetch_factor should be high (especially if dataset yields individual samples)

Each worker samples totally from random from the entire dataset. No pre-loading. No carefully aligning with Zarr chunk boundaries.

To speed things up, pick the geographical location ahead of time (as per #1), and write efficient data loading code. Load a complete batch at once using dask.compute(), so dask can parallelise loading and processing each sample in the batch. Use dask to parallelise optical flow

If not fast enough then maybe re-create Zarr, with each chunk being a single timestep long.

Loading slows down with large dataset

The problem

When using the 3,600 timesteps of test Zarr data, loading is super-quick (40 it/s with batch_size=32, image_size_pixels=128, n_samples_per_timestep=4, num_workers=16). This test Zarr has chunk sizes: time=1, y=704, x=548, variable=1. It reads data at almost 200 MB/s.

But, using the full Zarr dataset (with exactly the same chunk size and compression), it struggles to get more than about 5 it/s; and reads data at a few tens of MB/s.

Experimenting, I don't think the bottleneck is gcsfs. Reading a single file; or searching using glob all seem about the same speed on the two zarr datasets.

Instead, it looks like Dask takes a long time to consider what to do with all those little chunks! The full Zarr dataset has 2 million chunks. Reading is even slower when using the Zarr array with quarter spatial resolution.

Potential solutions

First thing I'm trying is preparing a dataset with just HRV. UPDATE: This seems to work!

When we need more channels, then re-create a dataset and put the other channels in the same chunk, so the total number of chunks stays the same.

Use bigger chunks!

Can Xarray read data without dask?? Update: Yes: xr.open_zarr(filename, chunks=None)

Use cubic interpolation when upsampling NWPs to 5-minutely

Needs a buffer of 1 hour to start_hourly and end_hourly in get_nwp_example. But also needs some fidly things fixing too:

  • When computing datetimes available for training, need to take this 1 hour buffer into consideration
  • In NWPDataLoader need to load data with an extra buffer. Maybe this means dropping an hour from the start and end of all the contiguous segments, to leave that hour buffer for the NWPs.

BUG: Sat data sometimes returns images of wrong size when training

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-7b6b8391c42e> in <module>
----> 1 trainer.fit(model, data_module)

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    456         )
    457 
--> 458         self._run(model)
    459 
    460         assert self.state.stopped

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in _run(self, model)
    754 
    755         # dispatch `start_training` or `start_evaluating` or `start_predicting`
--> 756         self.dispatch()
    757 
    758         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
    795             self.accelerator.start_predicting(self)
    796         else:
--> 797             self.accelerator.start_training(self)
    798 
    799     def run_stage(self):

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
     94 
     95     def start_training(self, trainer: 'pl.Trainer') -> None:
---> 96         self.training_type_plugin.start_training(trainer)
     97 
     98     def start_evaluating(self, trainer: 'pl.Trainer') -> None:

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
    142     def start_training(self, trainer: 'pl.Trainer') -> None:
    143         # double dispatch to initiate the training loop
--> 144         self._results = trainer.run_stage()
    145 
    146     def start_evaluating(self, trainer: 'pl.Trainer') -> None:

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in run_stage(self)
    805         if self.predicting:
    806             return self.run_predict()
--> 807         return self.run_train()
    808 
    809     def _pre_training_routine(self):

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
    867                 with self.profiler.profile("run_training_epoch"):
    868                     # run train epoch
--> 869                     self.train_loop.run_training_epoch()
    870 
    871                 if self.max_steps and self.max_steps <= self.global_step:

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
    489         is_last_batch = None
    490 
--> 491         for batch_idx, (batch, is_last_batch) in train_dataloader:
    492             self.trainer.batch_idx = batch_idx
    493             self.trainer.is_last_batch = is_last_batch

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/profiler/profilers.py in profile_iterable(self, iterable, action_name)
    110             try:
    111                 self.start(action_name)
--> 112                 value = next(iterator)
    113                 self.stop(action_name)
    114                 yield value

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py in prefetch_iterator(iterable)
    532         return
    533 
--> 534     for val in it:
    535         # yield last and has next
    536         yield last, False

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py in __next__(self)
    462 
    463         """
--> 464         return self.request_next_batch(self.loader_iters)
    465 
    466     @staticmethod

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/trainer/supporters.py in request_next_batch(loader_iters)
    476 
    477         """
--> 478         return apply_to_collection(loader_iters, Iterator, next)
    479 
    480     @staticmethod

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pytorch_lightning/utilities/apply_func.py in apply_to_collection(data, dtype, function, wrong_dtype, *args, **kwargs)
     83     # Breaking condition
     84     if isinstance(data, dtype) and (wrong_dtype is None or not isinstance(data, wrong_dtype)):
---> 85         return function(data, *args, **kwargs)
     86 
     87     # Recursively apply to collection items

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/dataloader.py in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/dataloader.py in _next_data(self)
   1181             if len(self._task_info[self._rcvd_idx]) == 2:
   1182                 data = self._task_info.pop(self._rcvd_idx)[1]
-> 1183                 return self._process_data(data)
   1184 
   1185             assert not self._shutdown and self._tasks_outstanding > 0

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
   1227         self._try_put_index()
   1228         if isinstance(data, ExceptionWrapper):
-> 1229             data.reraise()
   1230         return data
   1231 

~/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/_utils.py in reraise(self)
    423             # have message field
    424             raise self.exc_type(message=msg)
--> 425         raise self.exc_type(msg)
    426 
    427 

RuntimeError: Caught RuntimeError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 34, in fetch
    data = next(self.dataset_iter)
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/dataset.py", line 62, in __iter__
    yield self._get_batch()
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/dataset.py", line 88, in _get_batch
    return dask.compute(batch_delayed)[0]
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/dask/base.py", line 567, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/dask/threaded.py", line 79, in get
    results = get_async(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/dask/local.py", line 514, in get_async
    raise_exception(exc, tb)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/dask/local.py", line 325, in reraise
    raise exc
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/dask/local.py", line 223, in execute_task
    result = _execute_task(task, data)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 64, in default_collate
    return default_collate([torch.as_tensor(b) for b in batch])
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [2, 128, 128, 1] at entry 0 and [1, 128, 128, 1] at entry 9

Look into odd chunks in sat zarr

When writing test data, ran into this problem with int16 dataset:

Specified zarr chunks encoding['chunks']=(36, 704, 548, 1) for variable named 'stacked_eumetsat_data' would overlap multiple dask chunks ((32, 36, 4), (704,), (548,), (1,)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using chunk(), deleting or modifying encoding['chunks'], or specify safe_chunks=False.

Create tidy Python library for loading data :)

With automated unit tests :)

User can easily say "I want this many historical timesteps; and this many forecast timesteps; and including these satellite channels, and these NWP params, and compute optical flow predictions based on the most recent pair of satellite images".

Optical flow: Predict future PV yield

Build a OpticalFlowDataSource class, which inherits from DataSource, and adds optical_flow_predictions to the Sample dict.

Pre-compute and save in the NetCDF batches.

See #18 for some more notes

Tidy up data loading & prep code

  • All code for each DataSource should live in its own class (or a python file with helper functions)
  • Standardise interface for getting list of available datetimes (for pre-computing valid datetimes)
  • Clean up the naming of zarr_chunk_sequences, segments, chunks etc. Feels confusing. Also the code is inconsistent in that sometimes it uses start and end, and sometimes it uses a Segment. Could try ripping out the whole concept of a 'Zarr chunk sequence', and see if it slows the code down noticeably. That is, maybe randomly pick a contiguous section (proportional to how long it is), and then just randomly pick any start date (no matter if it aligns perfectly with Zarr chunks or not)
  • Consistent capitalisation of PV in class names.
  • Convert all datetimes to UTC, then to naive before continuing. Then rip out all the to_naive stuff
  • Convert satellite data timestamps to 00, 05, 10, .... (instead of 04, 09, etc.)

Ideas for speeding up:

  1. Have a 'master' data loader class, which:
    • Takes an ordered list of DataSource objects.
    • Before training: Computes the intersection of the datetimes for each DataSource.
    • During training:
      • Pre-fetch data into memory: Each process randomly selects a time segment (perhaps continue with time segments aligned to the satellite Zarr's boundaries; or perhaps simplify the code by throwing away that idea?), and then hands off to worker threads for each data source to pre-load the data in parallel into memory.
      • Loop round data in memory. Construct a Sample by passing it in sequence through the DataSource objects
      • Then the Transforms are responsible for selecting
  2. Maybe need to load NWPs in 'Zarr-friendly' chunks, and then iterate around those in memory. Not entirely sure how to coordinate that with loading satellite-data Zarr-friendly chunks

Get more data

  • NWP (more UKV from CEDA but need to talk to Met Office to get access again. Maybe MOGREPs. Maybe ECMWF).
  • EUMETSAT SEVIRI RSS (extend Future Energy Associate's Airflow pipeline code for ingesting data from EUMETSAT's API)
  • PV (sub-tasks: get more data from PVOutput.org for UK, using OCF's PVOutput Python code. Get data from PassivSystems (UK only). Get PV data from European PV systems)
  • CM-SAF irradiance?
  • Precipitation (UK rainfall radar. Jacob has already loaded this into his code, I think)
  • EUMETSAT cloud mask? (Jacob has cloud mask data, I think)

Compute NWP means & std over the complete dataset

Currently computed using:

nwp_ds.data.isel(init_time=slice(0, 10)).mean(dim=['step', 'x', 'init_time', 'y']).compute()

Using 100 init_times crashes (dask tries to use > 64 GB of RAM).

Try again with a VM with more RAM

Implement PVDataSource

See notebooks/design.ipynb for ideas of interface.

Might want to share memory across worker processes; perhaps by constructing PV data & metadata as Dask DataFrames?

Implement PVDataSource.pick_locations()

PVDataSource must use dask.delayed

Currently using PV data roughly halves the training speed (from 30 it/s down to 17 it/s), even though all the PV data is in memory, so should be very fast to load.

Prob need the pv_power data (and metadata?) to be dask arrays. Perhaps as simple as keeping it in xarray, instead of converting to Pandas?

Get contiguous examples (for plotting several hours of predictions)

Maybe implement as a child class of NowcastingDataset which overrides _get_t0_datetimes_for_batch() and _get_locations_for_batch().

For each batch:

Pick random start datetime for first example, then subsequent examples use contiguous datetimes.
Pick random location for first example, then use that location throughout.

Try Satellite Zarr with quarter spatial extent (again)

Replace get_sample() with get_batch(). give each DataSource the full list of locations and timestamps for the batch, to let the DataSource load the necessary chunks from disk in a ThreadPoolExecutor. Needs to figure out which spatial chunks to load, given the Zarr chunk boundaries and the locations of the examples. Then load those chunks into memory in parallel, and return a full batch (with the examples in the correct order).

Each DataSource should have its own history_len and forecast_len

And get_sample() should just take t0_dt (not start_dt or end_dt); and should take the example so far. And convert start_datetimes to t0_datetimes by adding history_duration.

So we can do this:

HISTORY_LEN = 2
FORECAST_LEN = 12

data_sources = [
    PVDataSource(
        history_len=HISTORY_LEN,
        forecast_len=FORECAST_LEN,
        pv_system_selection=DISJOINT_HISTORY_AND_FORECAST),
    SatelliteDataSource(
        history_len=HISTORY_LEN,
        forecast_len=0,
        image_size_pixels=192,
        transform=OpticalFlow(
            include_flow_in_example=False,
            output_image_size_pixels=128,
            forecast_len=FORECAST_LEN
        ),
    NWPDataSource(
        history_len=HISTORY_LEN,
        forecast_len=FORECAST_LEN,
        params=['t'],
        transform=SinglePointAtCenter
    )
]

Re-create NWP Zarr

  • Combine the 4 existing Zarrs (or maybe re-load from grib files to fix the few issues)
  • Use minimal data types for each variable. e.g. uint8 for temperature?

Ingest numerical weather prediction data (NWP)

Use temperature at surface, precipitation, irradiance, cloud fraction, accumulated snow cover.

  • Finish NWPDataSource
  • Resample to 5-minutely
  • Standardise
  • Convert to float32
  • Plot timeseries data just before data goes into ML model

BUG: InvalidIndexError

DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening satellite data: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
DEBUG:nowcasting_dataset:Opening NWP data: gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV_zarr
ERROR:nowcasting_dataset:Exception! start_hourly=2019-11-07 15:00:00, t0_hourly=2019-11-07 16:00:00, end_hourly=2019-11-07 16:00:00, target_times_hourly=DatetimeIndex(['2019-11-07 15:00:00', '2019-11-07 16:00:00'], dtype='datetime64[ns]', freq='H'), Reindexing only valid with uniquely valued Index objects, is_increasing=True, is_unique=True
Traceback (most recent call last):
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/data_source.py", line 64, in _get_cached_time_slice
    return self._cache[t0_dt]
KeyError: Timestamp('2019-11-07 15:55:00')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/nwp_data_source.py", line 102, in _get_time_slice
    init_times = self.data.sel(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataarray.py", line 1271, in sel
    ds = self._to_temp_dataset().sel(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataset.py", line 2365, in sel
    pos_indexers, new_indexes = remap_label_indexers(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/coordinates.py", line 421, in remap_label_indexers
    pos_indexers, new_indexes = indexing.remap_label_indexers(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 274, in remap_label_indexers
    idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 200, in convert_label_indexer
    indexer = get_indexer_nd(index, label, method, tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 101, in get_indexer_nd
    flat_indexer = index.get_indexer(flat_labels, method=method, tolerance=tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3442, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
ERROR:nowcasting_dataset:Exception!  t0_dt=2019-11-07 15:55:00, x_meters_center=40000, y_meters_center=20000, Reindexing only valid with uniquely valued Index objects
Traceback (most recent call last):
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/data_source.py", line 64, in _get_cached_time_slice
    return self._cache[t0_dt]
KeyError: Timestamp('2019-11-07 15:55:00')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/dataset.py", line 122, in _get_example
    example_from_source = data_source.get_example(
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/data_source.py", line 148, in get_example
    selected_data = self._get_cached_time_slice(t0_dt)
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/data_source.py", line 66, in _get_cached_time_slice
    data = self._get_time_slice(t0_dt)
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/nwp_data_source.py", line 102, in _get_time_slice
    init_times = self.data.sel(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataarray.py", line 1271, in sel
    ds = self._to_temp_dataset().sel(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataset.py", line 2365, in sel
    pos_indexers, new_indexes = remap_label_indexers(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/coordinates.py", line 421, in remap_label_indexers
    pos_indexers, new_indexes = indexing.remap_label_indexers(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 274, in remap_label_indexers
    idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 200, in convert_label_indexer
    indexer = get_indexer_nd(index, label, method, tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 101, in get_indexer_nd
    flat_indexer = index.get_indexer(flat_labels, method=method, tolerance=tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3442, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
ERROR:nowcasting_dataset:Exception! start_hourly=2019-09-30 13:00:00, t0_hourly=2019-09-30 14:00:00, end_hourly=2019-09-30 14:00:00, target_times_hourly=DatetimeIndex(['2019-09-30 13:00:00', '2019-09-30 14:00:00'], dtype='datetime64[ns]', freq='H'), Reindexing only valid with uniquely valued Index objects, is_increasing=True, is_unique=True
Traceback (most recent call last):
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/data_source.py", line 64, in _get_cached_time_slice
    return self._cache[t0_dt]
KeyError: Timestamp('2019-09-30 13:45:00')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/nwp_data_source.py", line 102, in _get_time_slice
    init_times = self.data.sel(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataarray.py", line 1271, in sel
    ds = self._to_temp_dataset().sel(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/dataset.py", line 2365, in sel
    pos_indexers, new_indexes = remap_label_indexers(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/coordinates.py", line 421, in remap_label_indexers
    pos_indexers, new_indexes = indexing.remap_label_indexers(
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 274, in remap_label_indexers
    idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 200, in convert_label_indexer
    indexer = get_indexer_nd(index, label, method, tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/xarray/core/indexing.py", line 101, in get_indexer_nd
    flat_indexer = index.get_indexer(flat_labels, method=method, tolerance=tolerance)
  File "/home/jack/miniconda3/envs/nowcasting_dataset/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3442, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
ERROR:nowcasting_dataset:Exception!  t0_dt=2019-09-30 13:45:00, x_meters_center=40000, y_meters_center=250000, Reindexing only valid with uniquely valued Index objects
Traceback (most recent call last):
  File "/home/jack/dev/ocf/nowcasting_dataset/nowcasting_dataset/data_sources/data_source.py", line 64, in _get_cached_time_slice
    return self._cache[t0_dt]
KeyError: Timestamp('2019-09-30 13:45:00')

Train across all PV systems, then fine tune on single PV system

Modify PV Data Source so it samples from all PV systems for a few epochs, and then fixes on one. To explore the use case where we want good forecasts for a single PV system of interest.

Compare this to not fine tuning, and embedding the PV system's identity.

Implement NWPDataSource

See notebooks/design.ipynb for ideas of interface.

Almost certainly will also have to re-create NWP Zarr (#11) so we load smaller files. e.g. each chunk should be a single init_time and a single step.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.