Code Monkey home page Code Monkey logo

Comments (13)

fkiraly avatar fkiraly commented on May 23, 2024 1

That's what I wrote, tuning should not be part of the task. So where would you put this information? With the forecaster/strategy?

No, I don't think specifying the tuning horizon is part of evaluation, as you write - but part of a strategy (or meta-strategy). I, personally, would simply make the "tuning horizon" part of a "sliding tuning horizon grid search" wrapper, similar to GridSearchCV. One of this first-order wrapper's parameters would be the tuning offset = temporal length of the tuning window.

from sktime.

fkiraly avatar fkiraly commented on May 23, 2024

Part I: informal description of (sup.) forecasting tasks

Based on the paper I'm writing with Ahmed, I've thought about the task and model structure for (supervised) forecasting. I think it makes sense to split these in three aspects which I describe below.

All tasks revolve around making predictions of observations at future time points, but how this is to be done varies.

(1) supervised vs not. In supervised forecasting, there are i.i.d. training examples of the same situation. In (narrow sense) forecasting, there are no training examples, only a single (potentially multivariate) time series. In the supervised setting, independent examples are expected in rows of the data frame. Columns may contain named univariate or multivariate series, or primitive data.

(2) fixed or moving observation horizon ("cut-off"). When predicting the values for time points t'_1,... t'_k, may only observations up to time T be used - irrespective of t'_i? -> fixed cut-off.
Or, to predict the value at t'_i, may observations up to time t'_i - h be used, for fixed h? Including t'_j where t'_j < t'_i -j? -> moving cut-off

(3) Functional vs concrete forecasting horizon.
Do we already know at time of prediction which (finite number of) time points the prediction is for?
If yes, this is "concrete" forecasting and the task is to produce forecasts to known time points.
If no, then forecasting is "functional", and the prediction output is a function or algorithm, which given a (possiblyl later known) time point produces the value.
In the "functional" case, a time point range for the functional forcasts needs be provided when fitting.
In the "concrete" case, the finite set of time points to forecast for (absolute, or relative to cut-off) needs to be specified when fitting.

Other related time series tasks that this will not cover, for now:

  • time series segmentation - label parts of the time series, supervised or semi-supervised.
  • interpolation - given later and earlier time points, predict something in the middle
  • nowcasting - predict one missing time series given others that are observed at the same time
    Different tasks, needs different interface, i.m.o., and might complicate things too much in 1st instance.

from sktime.

fkiraly avatar fkiraly commented on May 23, 2024

Part II: tasks

tasks are a classes with parameters/attributes and utility methods
(essentially descriptors)

Parameters - private variables. Are set by constructor and should remain unchanged.

References - private variables. Reference to dataset or data provider.

Constructor _ _ init _ _
arguments: all parameters, explicitly, with sensible default setting.
behavior: Sets self.parameters to the values provided.

Parameters - private variables (set by init)

supervised - boolean (corresponds to (1) above)

data - pointer to dataset (optional)

predictfrom - list of names of covariates/inputs/features (columns) to predict from
(in prediction, features are used only until observation cut-off)

predicttargets - list of names of targets/outputs/labels (columns) to predict

predicttypes - inferred by init (would be easier with xpandas)
Necessary to distinguish "classification" vs "regression" task type
later on will have to include probabilistic vs deterministic

usetargettoforecast - boolean. If "false", for forecasting, methods may not access the target time series, even if observations lie in the past. If "true" (default), the targets are used for forecasting and visible to the forecaster method.

obshorizon - dictionary with following fields
obshorizon.type - one of "single" (default), "moving", "custom"

further fields if obshorizon.type = "single"
obshorizon.time - a timestamp

further fields if obshorizon.type = "moving"
obshorizon.start - a timestamp
obshorizon.end - a timestamp
obshorizon.step - a time difference
(models equally spaced vector of times, starting at start includive, ending at end exclusive, stepping by step value)

further fields if obshorizon.type = "custom"
obshorizon.times - list of timestamps

predhorizon - dictionary with following fields
predhorizon.type - one of "concrete" (default), "functional"
predhorizon.anchor - one of "relative" (default), "absolute"

further fields if predhorizon.type = "concrete"
predhorizon.times - vector of absolute or relative timestampe/diffs (depending on anchor type)

further fields if predhorizon.type = "functional"
predhorizon.rangestart - timestamp/diff where horizon starts (default = 0)
predhorizon.rangeend - timestamp/diff where horizon ends (default = infinite)

public method setdata
arguments: df (data frame)
behaviour: sets self.data to df

public method getpredtimestamps
arguments: none
behaviour: utility function - returns data frame that has only columns with target variables, entries are sequences of timestamps to predicts for

from sktime.

fkiraly avatar fkiraly commented on May 23, 2024

Part III - forecasters.

unless stated otherwise, forecasters follow the consensus high-level design (see Marcus' "heavy" branch)

class forecaster
uses abstract base class
private (class) variables:
Parameters - private variables Correspond to "hyper-parameters" to be set or tuned by the user (e.g., in init)
Attributes - private variables. Correspond to "model parameters" set by the fit method. Not to be set by the user or via an interface.

task - stores the task when provided
data - stores a reference to the data. Should be (or be set to be) equal to the reference in task, if not null.
fithorizon - latest timestamp at which data has been used for fitting

static/class variables:
tasktypes - a 3D boolean array with axis labels being the possible values for the task's "supervised", "predhorizon.type" and "obshorizon.type" fields. True if the method supports that task.

Constructor _ _ init _ _
arguments: all parameters, explicitly, with sensible default setting.
behavior: Sets self.parameters to the values provided.

public method issuitablefortask
arguments: task, a task object
behaviour: returns true if the forecaster can deal with the task (checks tasktype static)

public method fit
arguments:
task - fits model and stores it in attribute variables. May access but not modify parameters.
data (optional) - only needed if task does not point to data. If provided, supersedes data in task - checks for variable name compatibility needed.

public method update
arguments:
newdata - new data from the same source as "data". Default: call the data source's own incremental "retrieve update" if exists. If data frame, it is considered as new data.

public method predict
behaviour: returns a data frame with the predictions

public methods set_params/get_params
work as usual

from sktime.

fkiraly avatar fkiraly commented on May 23, 2024

Example forcasters to interface/implement:

  • ARMA (or ARIMA if not much more difficult)
  • panel ARMA (or ARIMA)
  • wrapper: pysf fixed observation horizon and concrete forecast horizon interface
  • wrapper: that time series package's sliding window thing
  • wrapper: supervised time series regression for concrete foreast horizon (loop over targets)

from sktime.

mloning avatar mloning commented on May 23, 2024

This is good, a few suggestions:

  • Change obshorizon.type options from single, moving and custom to fixed and moving (as in Part I), where fixed takes a list or single cut-off point (timestamp or integer/index)
  • For the moving obshorizon.type add expanding (boolean, if True expanding window, if False rolling/sliding window)
  • For tuning, we may want to have an additional tuninghorizon between obshorizon and predhorizon, but hat seems to be part of the evaluation setup and not the task or forecaster, so it should be dealt with in the Orchestrator or something like a ForecastingGridSearch?

from sktime.

fkiraly avatar fkiraly commented on May 23, 2024

Change obshorizon.type options from single, moving and custom to fixed and moving (as in Part I), where fixed takes a list or single cut-off point (timestamp or integer/index)

Ok, agreed - easier to implement since it removes a weird case that might not be necessary in 1st draft.

For the moving obshorizon.type add expanding (boolean, if True expanding window, if False rolling/sliding window)

Disagreed - in which situation is data on the past actually deleted? Or what is the case you have in mind?

Note that the task should not restrict what the strategy chooses to look at, bot only what it may look at if it wants.

For tuning, we may want to have an additional tuninghorizon between obshorizon and predhorizon, but hat seems to be part of the evaluation setup and not the task or forecaster, so it should be dealt with in the Orchestrator or something like a ForecastingGridSearch?

Disagreed - tuning is not the task's business, since it is nothing the strategy is evaluated against, merely something the strategy chooses to do (or not).

from sktime.

mloning avatar mloning commented on May 23, 2024

For the moving obshorizon.type add expanding (boolean, if True expanding window, if False rolling/sliding window)

Disagreed - in which situation is data on the past actually deleted? Or what is the case you have in mind?
Note that the task should not restrict what the strategy chooses to look at, bot only what it may look at if it wants.

Yes, you're right, should be part of strategy.

For tuning, we may want to have an additional tuninghorizon between obshorizon and predhorizon, but hat seems to be part of the evaluation setup and not the task or forecaster, so it should be dealt with in the Orchestrator or something like a ForecastingGridSearch?

Disagreed - tuning is not the task's business, since it is nothing the strategy is evaluated against, merely something the strategy chooses to do (or not).

That's what I wrote, tuning should not be part of the task. So where would you put this information? With the forecaster/strategy?

from sktime.

fkiraly avatar fkiraly commented on May 23, 2024

As a side note, I think the task should specify only, as the name says, the task the strategy is evaluated against. It should not specify anything about how the strategy ought to solve it.

from sktime.

mloning avatar mloning commented on May 23, 2024

Here are some ideas for the forecasting task object thanks to @sajaysurya. The main intuition was to keep it as simple as possible and similar to the TSC task interface, possibly allowing us to have a single task class rather than one for each use case. Going through @fkiraly's design above:

supervised - boolean (corresponds to (1) above)

may not be necessary, as it's identifiable from data/metadata (one or multiple rows)

predictfrom - list of names of covariates/inputs/features (columns) to predict from
(in prediction, features are used only until observation cut-off)

is identical to features in TSC task

predicttargets - list of names of targets/outputs/labels (columns) to predict

is identical to target in TSC task

usetargettoforecast - boolean. If "false", for forecasting, methods may not access the target time series, even if observations lie in the past. If "true" (default), the targets are used for forecasting and visible to the forecaster method.

not necessary as a separate argument, target column could simply be both in features and target

obshorizon - dictionary with following fields
obshorizon.type - one of "single" (default), "moving", "custom"

For classical forecasting, obshorizon could be handled with a time series splitter (similar to train_test_split, but for time indices), which supports fixed and moving splits and which accepts one or multiple fixed cut-off points or instructions for a moving cut-off point, possibly refactoring the sklearn's TimeSeriesSplitter

For the supervised forecasting case, the splitter would be a 2-dimensional splitter, splitting both temporally and across iid samples, e.g.

(train_rows, train_times), (test_rows, test_times) = SupForecastCrossValidator().split()

For predictions at multiple cut-off points, the orchestrator could be used to automatically take care of the splitting, fitting and predicting.

predhorizon - dictionary with following fields
predhorizon.type - one of "concrete" (default), "functional"
predhorizon.anchor - one of "relative" (default), "absolute"

predhorizon has to be in task but could for example be passed together with target as a tuple to the task, i.e. target=(target_name, predhorizon), rather than a separate argument.

from sktime.

mloning avatar mloning commented on May 23, 2024
  • transformer interface for transformation compatible with multiple-sample (supervised) and single-sample forecasting setting
  • in-sample forecasts
  • forecast intervals #97
  • update functionality

from sktime.

fkiraly avatar fkiraly commented on May 23, 2024

For reference, I'm uploading the notes I wrote in preparation to last week's discussion on @mloning 's architecture document.

Architecture document:
https://github.com/alan-turing-institute/sktime/wiki/Forecasting-API-proposal

My notes:
Forecasting design comments.docx

from sktime.

mloning avatar mloning commented on May 23, 2024

See #218 and corresponding API design document

from sktime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.