Comments (13)
That's what I wrote, tuning should not be part of the task. So where would you put this information? With the forecaster/strategy?
No, I don't think specifying the tuning horizon is part of evaluation, as you write - but part of a strategy (or meta-strategy). I, personally, would simply make the "tuning horizon" part of a "sliding tuning horizon grid search" wrapper, similar to GridSearchCV. One of this first-order wrapper's parameters would be the tuning offset = temporal length of the tuning window.
from sktime.
Part I: informal description of (sup.) forecasting tasks
Based on the paper I'm writing with Ahmed, I've thought about the task and model structure for (supervised) forecasting. I think it makes sense to split these in three aspects which I describe below.
All tasks revolve around making predictions of observations at future time points, but how this is to be done varies.
(1) supervised vs not. In supervised forecasting, there are i.i.d. training examples of the same situation. In (narrow sense) forecasting, there are no training examples, only a single (potentially multivariate) time series. In the supervised setting, independent examples are expected in rows of the data frame. Columns may contain named univariate or multivariate series, or primitive data.
(2) fixed or moving observation horizon ("cut-off"). When predicting the values for time points t'_1,... t'_k, may only observations up to time T be used - irrespective of t'_i? -> fixed cut-off.
Or, to predict the value at t'_i, may observations up to time t'_i - h be used, for fixed h? Including t'_j where t'_j < t'_i -j? -> moving cut-off
(3) Functional vs concrete forecasting horizon.
Do we already know at time of prediction which (finite number of) time points the prediction is for?
If yes, this is "concrete" forecasting and the task is to produce forecasts to known time points.
If no, then forecasting is "functional", and the prediction output is a function or algorithm, which given a (possiblyl later known) time point produces the value.
In the "functional" case, a time point range for the functional forcasts needs be provided when fitting.
In the "concrete" case, the finite set of time points to forecast for (absolute, or relative to cut-off) needs to be specified when fitting.
Other related time series tasks that this will not cover, for now:
- time series segmentation - label parts of the time series, supervised or semi-supervised.
- interpolation - given later and earlier time points, predict something in the middle
- nowcasting - predict one missing time series given others that are observed at the same time
Different tasks, needs different interface, i.m.o., and might complicate things too much in 1st instance.
from sktime.
Part II: tasks
tasks are a classes with parameters/attributes and utility methods
(essentially descriptors)
Parameters - private variables. Are set by constructor and should remain unchanged.
References - private variables. Reference to dataset or data provider.
Constructor _ _ init _ _
arguments: all parameters, explicitly, with sensible default setting.
behavior: Sets self.parameters to the values provided.
Parameters - private variables (set by init)
supervised - boolean (corresponds to (1) above)
data - pointer to dataset (optional)
predictfrom - list of names of covariates/inputs/features (columns) to predict from
(in prediction, features are used only until observation cut-off)
predicttargets - list of names of targets/outputs/labels (columns) to predict
predicttypes - inferred by init (would be easier with xpandas)
Necessary to distinguish "classification" vs "regression" task type
later on will have to include probabilistic vs deterministic
usetargettoforecast - boolean. If "false", for forecasting, methods may not access the target time series, even if observations lie in the past. If "true" (default), the targets are used for forecasting and visible to the forecaster method.
obshorizon - dictionary with following fields
obshorizon.type - one of "single" (default), "moving", "custom"
further fields if obshorizon.type = "single"
obshorizon.time - a timestamp
further fields if obshorizon.type = "moving"
obshorizon.start - a timestamp
obshorizon.end - a timestamp
obshorizon.step - a time difference
(models equally spaced vector of times, starting at start includive, ending at end exclusive, stepping by step value)
further fields if obshorizon.type = "custom"
obshorizon.times - list of timestamps
predhorizon - dictionary with following fields
predhorizon.type - one of "concrete" (default), "functional"
predhorizon.anchor - one of "relative" (default), "absolute"
further fields if predhorizon.type = "concrete"
predhorizon.times - vector of absolute or relative timestampe/diffs (depending on anchor type)
further fields if predhorizon.type = "functional"
predhorizon.rangestart - timestamp/diff where horizon starts (default = 0)
predhorizon.rangeend - timestamp/diff where horizon ends (default = infinite)
public method setdata
arguments: df (data frame)
behaviour: sets self.data to df
public method getpredtimestamps
arguments: none
behaviour: utility function - returns data frame that has only columns with target variables, entries are sequences of timestamps to predicts for
from sktime.
Part III - forecasters.
unless stated otherwise, forecasters follow the consensus high-level design (see Marcus' "heavy" branch)
class forecaster
uses abstract base class
private (class) variables:
Parameters - private variables Correspond to "hyper-parameters" to be set or tuned by the user (e.g., in init)
Attributes - private variables. Correspond to "model parameters" set by the fit method. Not to be set by the user or via an interface.
task - stores the task when provided
data - stores a reference to the data. Should be (or be set to be) equal to the reference in task, if not null.
fithorizon - latest timestamp at which data has been used for fitting
static/class variables:
tasktypes - a 3D boolean array with axis labels being the possible values for the task's "supervised", "predhorizon.type" and "obshorizon.type" fields. True if the method supports that task.
Constructor _ _ init _ _
arguments: all parameters, explicitly, with sensible default setting.
behavior: Sets self.parameters to the values provided.
public method issuitablefortask
arguments: task, a task object
behaviour: returns true if the forecaster can deal with the task (checks tasktype static)
public method fit
arguments:
task - fits model and stores it in attribute variables. May access but not modify parameters.
data (optional) - only needed if task does not point to data. If provided, supersedes data in task - checks for variable name compatibility needed.
public method update
arguments:
newdata - new data from the same source as "data". Default: call the data source's own incremental "retrieve update" if exists. If data frame, it is considered as new data.
public method predict
behaviour: returns a data frame with the predictions
public methods set_params/get_params
work as usual
from sktime.
Example forcasters to interface/implement:
- ARMA (or ARIMA if not much more difficult)
- panel ARMA (or ARIMA)
- wrapper: pysf fixed observation horizon and concrete forecast horizon interface
- wrapper: that time series package's sliding window thing
- wrapper: supervised time series regression for concrete foreast horizon (loop over targets)
from sktime.
This is good, a few suggestions:
- Change obshorizon.type options from single, moving and custom to fixed and moving (as in Part I), where fixed takes a list or single cut-off point (timestamp or integer/index)
- For the moving obshorizon.type add expanding (boolean, if True expanding window, if False rolling/sliding window)
- For tuning, we may want to have an additional tuninghorizon between obshorizon and predhorizon, but hat seems to be part of the evaluation setup and not the task or forecaster, so it should be dealt with in the Orchestrator or something like a ForecastingGridSearch?
from sktime.
Change obshorizon.type options from single, moving and custom to fixed and moving (as in Part I), where fixed takes a list or single cut-off point (timestamp or integer/index)
Ok, agreed - easier to implement since it removes a weird case that might not be necessary in 1st draft.
For the moving obshorizon.type add expanding (boolean, if True expanding window, if False rolling/sliding window)
Disagreed - in which situation is data on the past actually deleted? Or what is the case you have in mind?
Note that the task should not restrict what the strategy chooses to look at, bot only what it may look at if it wants.
For tuning, we may want to have an additional tuninghorizon between obshorizon and predhorizon, but hat seems to be part of the evaluation setup and not the task or forecaster, so it should be dealt with in the Orchestrator or something like a ForecastingGridSearch?
Disagreed - tuning is not the task's business, since it is nothing the strategy is evaluated against, merely something the strategy chooses to do (or not).
from sktime.
For the moving obshorizon.type add expanding (boolean, if True expanding window, if False rolling/sliding window)
Disagreed - in which situation is data on the past actually deleted? Or what is the case you have in mind?
Note that the task should not restrict what the strategy chooses to look at, bot only what it may look at if it wants.
Yes, you're right, should be part of strategy.
For tuning, we may want to have an additional tuninghorizon between obshorizon and predhorizon, but hat seems to be part of the evaluation setup and not the task or forecaster, so it should be dealt with in the Orchestrator or something like a ForecastingGridSearch?
Disagreed - tuning is not the task's business, since it is nothing the strategy is evaluated against, merely something the strategy chooses to do (or not).
That's what I wrote, tuning should not be part of the task. So where would you put this information? With the forecaster/strategy?
from sktime.
As a side note, I think the task should specify only, as the name says, the task the strategy is evaluated against. It should not specify anything about how the strategy ought to solve it.
from sktime.
Here are some ideas for the forecasting task
object thanks to @sajaysurya. The main intuition was to keep it as simple as possible and similar to the TSC task interface, possibly allowing us to have a single task class rather than one for each use case. Going through @fkiraly's design above:
supervised - boolean (corresponds to (1) above)
may not be necessary, as it's identifiable from data/metadata (one or multiple rows)
predictfrom - list of names of covariates/inputs/features (columns) to predict from
(in prediction, features are used only until observation cut-off)
is identical to features in TSC task
predicttargets - list of names of targets/outputs/labels (columns) to predict
is identical to target in TSC task
usetargettoforecast - boolean. If "false", for forecasting, methods may not access the target time series, even if observations lie in the past. If "true" (default), the targets are used for forecasting and visible to the forecaster method.
not necessary as a separate argument, target column could simply be both in features and target
obshorizon - dictionary with following fields
obshorizon.type - one of "single" (default), "moving", "custom"
For classical forecasting, obshorizon could be handled with a time series splitter (similar to train_test_split, but for time indices), which supports fixed and moving splits and which accepts one or multiple fixed cut-off points or instructions for a moving cut-off point, possibly refactoring the sklearn's TimeSeriesSplitter
For the supervised forecasting case, the splitter would be a 2-dimensional splitter, splitting both temporally and across iid samples, e.g.
(train_rows, train_times), (test_rows, test_times) = SupForecastCrossValidator().split()
For predictions at multiple cut-off points, the orchestrator
could be used to automatically take care of the splitting, fitting and predicting.
predhorizon - dictionary with following fields
predhorizon.type - one of "concrete" (default), "functional"
predhorizon.anchor - one of "relative" (default), "absolute"
predhorizon has to be in task but could for example be passed together with target as a tuple to the task, i.e. target=(target_name, predhorizon)
, rather than a separate argument.
from sktime.
- transformer interface for transformation compatible with multiple-sample (supervised) and single-sample forecasting setting
- in-sample forecasts
- forecast intervals #97
- update functionality
from sktime.
For reference, I'm uploading the notes I wrote in preparation to last week's discussion on @mloning 's architecture document.
Architecture document:
https://github.com/alan-turing-institute/sktime/wiki/Forecasting-API-proposal
My notes:
Forecasting design comments.docx
from sktime.
See #218 and corresponding API design document
from sktime.
Related Issues (20)
- [BUG] using custom `extract_path` does not work in `sktime.datasets.load_UCR_UEA_dataset` HOT 3
- [ENH] add a method or property in ForecastX to access X predictions HOT 3
- [BUG] MiniRocketMultivariate Transformation - runtimeerror: found different number of instances in transform than in fit. number of instances seen in fit: 793; number of instances seen in transform: 199 HOT 1
- [ENH] adding plotly for visualisations HOT 7
- [BUG] `make_reduction` return index is not as expected for relative forecasting horizon and global forecasting of unequal length panel HOT 2
- [BUG] `.update()` returns error about sktime compatible data container format HOT 6
- [BUG] Cannot chain STLBootstrapTransformer, Imputer and BaggingForecaster in one forecast pipeline HOT 22
- [ENH] Additional strategy to input forecasted exogenous features to forecasters HOT 4
- [BUG] Series-to-panel plus series-to-series TransformerPipeline does not work HOT 3
- [ENH] Add calibration plot method HOT 1
- [BUG] `plot_windows` is creating two figures HOT 2
- [BUG] convert to period bug in `_StatsModelsAdapter` HOT 1
- [BUG] repeated use of `nd.array` in `sktime` codebase HOT 3
- [ENH] Access the actual features (including lagged y) used as input for the model inside a reduction forecaster HOT 10
- [BUG] - Pandas custom Datetime Index has no 'Freq', which is an issue for sktime.
- [ENH] feature importance capability tag for time series classifiers HOT 3
- [ENH] streamlined updating of contributor badges for multiple simultaneous contributions HOT 1
- [BUG] Graphical pipeline requires `X` in `fit` - contradiction to forecaster interface specs HOT 1
- [ENH] multiplexer classifier and regressors
- [BUG] classifier failures of `test_multiprocessing_idempotent` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sktime.