Code Monkey home page Code Monkey logo

gtsa's People

Contributors

friedrichknuth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

gtsa's Issues

Taking stock of existing efforts and reflecting on directions

Below are existing efforts that I found which could be useful to discuss and define GTSA's clear objective and build its core structure during the Hackweek:

The obvious dependencies that are now more stable:

Apart from GeoWombat's Time Series section, I don't see anything that does what GTSA currently does (scalable spatiotemporal prediction). GeoWombat are also the only ones providing an interface to ingest raster data + chunk it + process it. The limitation is that they have to maintain all these aspects at once in a single package. While GTSA can leave the ingestion + chunking + vector operations to Rioxarray + Geocube for the most part, and focus on making the link to more easily apply scalable method on the processing side. I really like their approach of allowing any PyTorch & other algorithm to be passed, we should probably aim towards something similar.

So, in terms of package objectives, I see two core aspects:

  1. Provide routines built on top of Rioxarray to create temporal raster stacks, with possibly multiple variables, in an out-of-memory fashion from a list of raster with various extent/projections/dates (most of the heavy work is done by Rioxarray). One main issue I see is that rasters don't natively have dates in their metadata, so GTSA would need a generic interface for that (an Xarray accessor that reads dates from most auxiliary files/filenames for raster would be super useful, we want to copy the functionalities from the SatelliteImage class in GeoUtils: https://geoutils.readthedocs.io/en/latest/satimg_class.html, but it'll take a while).
  2. Provide routines to perform spatiotemporal error-aware prediction in a scalable manner: using already implemented methods from SciPy, PyTorch, etc, wherever possible. Here again, creating routines that support most specifically scalable algorithms can be a challenge. For predicting GPs, methods exist to do this such as Batch GPs: https://docs.gpytorch.ai/en/stable/examples/08_Advanced_Usage/index.html#batch-gps). For applying GPs: one only needs a chunk the scale of the covariance, then points are independent! But this is not natively supported in most packages, we'd have to write it. Another challenge would be to provide "error-aware" methods wherever possible, having mostly methods that understand and propagate uncertainties in the prediction. That would allow to later rely on ObsArray (or similar) to use the predicted datasets at different scales! (the GP + OLS + WLS in pyddem all have this!).

In terms of ideal code structure: I'm not sure what is best... Definitely not a class-based object. I feel that an Xarray accessor could maybe work quite nicely? But we'd need to grasp all the implications for out-of-memory ops.
For instance:

import gtsa

# The package itself would only be called to open the list of files and stack them out-of-memory to a certain disk location
ds = gtsa.open_rasterstack(list_raster_files=..., zarr_file=...)`
# (or this could be several functions if needed: define different tiling types? areas with different projections?)

# Then the Xarray accessor would do everything else
# For example, define additional Xarray attributes to ensure the time/space units are known, or to store the covariance of the data in space and time (based on ObsArray, maybe, if it takes off)
ds.gtsa.time_unit
ds.gtsa.space_unit

# For prediction: have a fit/apply function that returns predicted values at new spatiotemporal locations
ds_pred = ds.gtsa.predict(method=..., time_pred=..., x_pred=...)
ds_pred.to_zarr(zarr_file=...).

Do you think that would work (even out-of-memory)?

That's all I've got for now ๐Ÿ˜›!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.