Code Monkey home page Code Monkey logo

fforma's Introduction

FFORMA: Feature-based Forecast Model Averaging

Introduction & Citation Info

The fforma package provides tools for forecasting using a model combination approach. It can be used for model averaging or model selection. It works by training a ‘classifier’ that learns to select/combine different forecast models.

More information about metalearning for forecasting, read/cite the paper:

This package came out of the FFORMA method presented to the M4 forecasting competition, but has been improved and no longer can be used for reproducing results. For exact reproducibilty of results, check the M4metalearning github repo. For empirical performance, the fforma package, not M4metalearning, should be used.

Installation

Temporarily, as a workaround, a custom version of the xgboost package is required. You may install it manually from:

# install.packages("devtools")
devtools::install_github("pmontman/customxgboost")

Please note this if you use the latest version of the real xgboost package, it will be overwritten. We will patch this problem/workaround as soon as posssible.

Then the package can then be installed:

#install.packages("devtools")
devtools::install_github("pmontman/fforma")

Usage

The package can be used easily with two main functions: train_metalearning and forecast_metalearning. Both functions work on a list of elements with a particular data structure: A time series and some meta-data. Basically, each element in this list has at least the component $x with a time series as a ts object, which is the series we want to forecast.

Training FFORMA

The train_metalearning function will look for the component $h in the elements of the input list, where $h represents the desired prediction horizon. If not found, it will consider h to one seasonal period of the series. Then it substracts h observations from the time series $x and sets them as true future values in the component $xx. This process is named temporal holdout. If the series in the training set have the $xx component, FFORMA will use it instead of removing the last h observations of each series.

Then the metalearning model is trained (takes a bit of time, see Paralellism section below). The output of the train_metalearning are the components: the metalearning model, the training dataset (after the temporal holdout) and the information about the training process. This output of the training process can be used to forecast with the forecast_metalearning function.

In the example, we will use a dataset of time series from the Mcomp package as training set, which already follows the required format (a list with elements having the $x. Additionally the $h is provided).

set.seed(1234)
library(fforma)
#The dataset of time series we want to forecast
ts_dataset <- Mcomp::M3[sample(length(Mcomp::M3), 30)]
#train!
fforma_fit <- train_metalearning(ts_dataset)

Forecasting with FFORMA

The forecast_ metalearning takes a metaleaning model (the output of train_metalearning or equivalent) and a dataset of time series we want to forecast. This dataset is a list in the same format, though now the $h component if necessary, not optional. The dataset for forecasting can be the same as the one used for training (since it uses crossvalidation by temporal holdout for training).

fforma_forec <- forecast_metalearning(fforma_fit, ts_dataset)

Thats’ it, two lines of code! If the dataset we forecast has the $xx component in its elements, fforma will use it as the ‘true’ future values of each series $x and calculate the OWA, MASE and SMAPE forecast errors.

forecast_metalearning outputs a dataset of time series similar to its input, but with the added forecasts in the component $ff_meta_avg of each element of the list.

#get the forecasts of the first series
fforma_forec$dataset[[1]]$ff_meta_avg

Advanced: Customizing the base forecast models

FFORMA learns to combine individual forecast models. By default it uses a set of forecasting models implemented in the forecast R package, such as auto.arima, ets, thetaf, etc. The set of methods that FFORMA learns to combine is passed in the forec_methods argument to the train_metalearning function. This argument should be a list of strings. Each string is the list shoudl coincide with the name of an existing forecasting function. This forecasting function is a simple function with two arguments: a ts object and the number of forecast horizons. When forecasting, FFORMA assumes that the custom functions used for training are available in the environment. To illustrate this process, we will fully customize fforma to learn to combine two forecast models of our own design, one forecasting the mean of the series and the other outputting zeroes.

#a function that takes x, a ts() and h an integer with the desired forecast horizon
my_mean_forec <- function(x, h) { 
  rep(mean(x), h)
}

my_zero_forec <- function(x,h) {
  rep(0, h)
}

#a list of strings, each the name of the forecasting function
list_of_methods <- list("my_mean_forec", "my_zero_forec")

#call fforma with the customized forecast functions
custom_fforma_fit <- train_metalearning(ts_dataset, forec_methods=list_of_methods)
#the actual forecasting from the customized fforma
custom_fforma_forec <- forecast_metalearning(custom_fforma_fit, ts_dataset)
#errors should be quite hight
custom_fforma_forec$owa_errors

Parallelism and Save/Restore progress

Forecasting with FFORMA can take a bit of time depending of the individual models that are going to be combined for forecasting and the size of the dataset. Parallelism through the future package is provided and the processing can be periodically saved to disk and resumed in the case of failure (like power outage, or an impending Windows update).

The user just needs to select the future::plan and then paralellism is handled transparently. More info about future plans/capabilities here.

#the user enables, in this case, basic multicore parallelism through several processes
future::plan(future::multiprocess)
#train with parallelism enabled, no changes to the code
fforma_fit <- train_metalearning(ts_dataset)
#forecast with parallelism enabled, no changes to the code
fforma_forec <- forecast_metalearning(fforma_fit, ts_dataset)

For saving intermediate results, train_metalearning and forecast_metalearning have the save_foldername parameter, which must be set to the name of the folder to save the intermediate results. If this parameter is set to NULL, no saving/resume is used. If save_foldername is set to an existing folder, the functions will try to resume the processing from the state saved in the folder. So the basic use is to launch train_metalearning or test_metalearning with a specific save_foldername, and if the process is interrupted, we launch them again with the same save_foldername vale an process will resume.

An important additional parameter to use with chunk_size, which indicates how many time series are processed between savings. If we set chunk_size=1000, the traing/forecast process will stop to save progress each 1000 series. Too large value for chunk_size will run risk of losing a lot of progress, too small will waste a lot of time saving to disk. An automatic guess of chunk_size is provided if chunk_size=NULL, but it is highly recommended that the users set it manually to their needs.

Saving can be combined with parallelism.

An example of saving to disk

#run with saving to disk (NOTE chunk_size=10 is too low!, just for example)
fforma_fit <- train_metalearning(ts_dataset, chunk_size = 10, save_foldername = "my_tmp_fforma")
#imagine that the powers goes of when series 14 is being processed...
#...
#... BOOM!
#...
# Now we want to resume!
#We just call use the same function call, now it will try to resume
#train_metalearning will start from series 11 if it finds
#the temp files in save_foldername
fforma_fit <- train_metalearning(ts_dataset, chunk_size = 10, save_foldername = "my_tmp_fforma")

Forecast methods

The users can select which basic forecast methods are combined through fforma. The default is based on the fforma submission to the M4 competition (see the reference)

Combination by Model Averaging or Model Selection

The training can be fine-tuned towards either model selection or model averaging by setting the objective parameter in train_metalearning too either "averaging" (default) or "selection".

fforma_fit <- train_metalearning(ts_dataset, objective = "selection")

Advanced use

The package provides functions for manually tuning the training/forecasting processes. TO BE COMPLETED

fforma's People

Contributors

pmontman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.