Code Monkey home page Code Monkey logo

nixtla / statsforecast Goto Github PK

View Code? Open in Web Editor NEW
3.6K 36.0 243.0 181.97 MB

Lightning ⚡️ fast forecasting with statistical and econometric models.

Home Page: https://nixtlaverse.nixtla.io/statsforecast

License: Apache License 2.0

Python 99.78% Makefile 0.17% Dockerfile 0.04% Shell 0.01%
time-series statistics forecasting arima econometrics machine-learning python exponential-smoothing ets baselines predictions naive seasonal-naive fbprophet prophet neuralprophet mstl theta automl data-science

statsforecast's Introduction

Nixtla   Tweet  Slack

All Contributors

Statistical ⚡️ Forecast

Lightning fast forecasting with statistical and econometric models

CI Python PyPi conda-nixtla License docs Downloads

StatsForecast offers a collection of widely used univariate time series forecasting models, including automatic ARIMA, ETS, CES, and Theta modeling optimized for high performance using numba. It also includes a large battery of benchmarking models.

Installation

You can install StatsForecast with:

pip install statsforecast

or

conda install -c conda-forge statsforecast

Vist our Installation Guide for further instructions.

Quick Start

Minimal Example

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from statsforecast.utils import AirPassengersDF

df = AirPassengersDF
sf = StatsForecast(
    models = [AutoARIMA(season_length = 12)],
    freq = 'M'
)

sf.fit(df)
sf.predict(h=12, level=[95])

Get Started with this quick guide.

Follow this end-to-end walkthrough for best practices.

Why?

Current Python alternatives for statistical models are slow, inaccurate and don't scale well. So we created a library that can be used to forecast in production environments or as benchmarks. StatsForecast includes an extensive battery of models that can efficiently fit millions of time series.

Features

  • Fastest and most accurate implementations of AutoARIMA, AutoETS, AutoCES, MSTL and Theta in Python.
  • Out-of-the-box compatibility with Spark, Dask, and Ray.
  • Probabilistic Forecasting and Confidence Intervals.
  • Support for exogenous Variables and static covariates.
  • Anomaly Detection.
  • Familiar sklearn syntax: .fit and .predict.

Highlights

  • Inclusion of exogenous variables and prediction intervals for ARIMA.
  • 20x faster than pmdarima.
  • 1.5x faster than R.
  • 500x faster than Prophet.
  • 4x faster than statsmodels.
  • Compiled to high performance machine code through numba.
  • 1,000,000 series in 30 min with ray.
  • Replace FB-Prophet in two lines of code and gain speed and accuracy. Check the experiments here.
  • Fit 10 benchmark models on 1,000,000 series in under 5 min.

Missing something? Please open an issue or write us in Slack

Examples and Guides

📚 End to End Walkthrough: Model training, evaluation and selection for multiple time series

🔎 Anomaly Detection: detect anomalies for time series using in-sample prediction intervals.

👩‍🔬 Cross Validation: robust model’s performance evaluation.

❄️ Multiple Seasonalities: how to forecast data with multiple seasonalities using an MSTL.

🔌 Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.

📈 Intermittent Demand: forecast series with very few non-zero observations.

🌡️ Exogenous Regressors: like weather or prices

Models

Automatic Forecasting

Automatic forecasting tools search for the best parameters and select the best possible model for a group of time series. These tools are useful for large collections of univariate time series.

Model Point Forecast Probabilistic Forecast Insample fitted values Probabilistic fitted values Exogenous features
AutoARIMA
AutoETS
AutoCES
AutoTheta

ARIMA Family

These models exploit the existing autocorrelations in the time series.

Model Point Forecast Probabilistic Forecast Insample fitted values Probabilistic fitted values Exogenous features
ARIMA
AutoRegressive

Theta Family

Fit two theta lines to a deseasonalized time series, using different techniques to obtain and combine the two theta lines to produce the final forecasts.

Model Point Forecast Probabilistic Forecast Insample fitted values Probabilistic fitted values Exogenous features
Theta
OptimizedTheta
DynamicTheta
DynamicOptimizedTheta

Multiple Seasonalities

Suited for signals with more than one clear seasonality. Useful for low-frequency data like electricity and logs.

Model Point Forecast Probabilistic Forecast Insample fitted values Probabilistic fitted values Exogenous features
MSTL If trend forecaster supports

GARCH and ARCH Models

Suited for modeling time series that exhibit non-constant volatility over time. The ARCH model is a particular case of GARCH.

Model Point Forecast Probabilistic Forecast Insample fitted values Probabilistic fitted values Exogenous features
GARCH
ARCH

Baseline Models

Classical models for establishing baseline.

Model Point Forecast Probabilistic Forecast Insample fitted values Probabilistic fitted values Exogenous features
HistoricAverage
Naive
RandomWalkWithDrift
SeasonalNaive
WindowAverage
SeasonalWindowAverage

Exponential Smoothing

Uses a weighted average of all past observations where the weights decrease exponentially into the past. Suitable for data with clear trend and/or seasonality. Use the SimpleExponential family for data with no clear trend or seasonality.

Model Point Forecast Probabilistic Forecast Insample fitted values Probabilistic fitted values Exogenous features
SimpleExponentialSmoothing
SimpleExponentialSmoothingOptimized
SeasonalExponentialSmoothing
SeasonalExponentialSmoothingOptimized
Holt
HoltWinters

Sparse or Intermittent

Suited for series with very few non-zero observations

Model Point Forecast Probabilistic Forecast Insample fitted values Probabilistic fitted values Exogenous features
ADIDA
CrostonClassic
CrostonOptimized
CrostonSBA
IMAPA
TSB

🔨 How to contribute

See CONTRIBUTING.md.

Citing

@misc{garza2022statsforecast,
    author={Federico Garza, Max Mergenthaler Canseco, Cristian Challú, Kin G. Olivares},
    title = {{StatsForecast}: Lightning fast forecasting with statistical and econometric models},
    year={2022},
    howpublished={{PyCon} Salt Lake City, Utah, US 2022},
    url={https://github.com/Nixtla/statsforecast}
}

Contributors ✨

Thanks goes to these wonderful people (emoji key):

fede
fede

💻 🚧
José Morales
José Morales

💻 🚧
Sugato Ray
Sugato Ray

💻
Jeff Tackes
Jeff Tackes

🐛
darinkist
darinkist

🤔
Alec Helyar
Alec Helyar

💬
Dave Hirschfeld
Dave Hirschfeld

💬
mergenthaler
mergenthaler

💻
Kin
Kin

💻
Yasslight90
Yasslight90

🤔
asinig
asinig

🤔
Philip Gillißen
Philip Gillißen

💻
Sebastian Hagn
Sebastian Hagn

🐛 📖
Han Wang
Han Wang

💻
Ben Jeffrey
Ben Jeffrey

🐛
Beliavsky
Beliavsky

📖
Mariana Menchero García
Mariana Menchero García

💻
Nikhil Gupta
Nikhil Gupta

🐛
JD
JD

🐛
josh attenberg
josh attenberg

💻
JeroenPeterBos
JeroenPeterBos

💻
Jeroen Van Der Donckt
Jeroen Van Der Donckt

💻
Roymprog
Roymprog

📖
Nelson Cárdenas Bolaño
Nelson Cárdenas Bolaño

📖
Kyle Schmaus
Kyle Schmaus

💻
Akmal Soliev
Akmal Soliev

💻
Nick To
Nick To

💻
Kevin Kho
Kevin Kho

💻
Yiben Huang
Yiben Huang

📖
Andrew Gross
Andrew Gross

📖
taniishkaaa
taniishkaaa

📖
Manuel Calzolari
Manuel Calzolari

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

statsforecast's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statsforecast's Issues

No xreg argument on forecast method

Hi, thanks for the package! I simply cannot reproduce the error because of the peculiar error: forecast() got an unexpected keyword argument 'xreg'. I installed the package from pip for Python3.8. Package version is 0.3.0. I am on a M1 Mac but I am not currently using arm64 version.

I also checked the module with inspect package. It really does not have the "xreg" argument in forecast method. Here are the code and response.

import inspect
from statsforecast import StatsForecast
inspect.getfullargspec(StatsForecast.forecast)

The response is FullArgSpec(args=['self', 'h'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={})

p.s. I just installed from Github and xreg seems to be there. Probably it will work, too.

Nondaily time series breaks

Using non-daily time series throws the following error:

image

The minimum reproducible example would be,

import numpy as np
import pandas as pd

from statsforecast import StatsForecast
from statsforecast.models import random_walk_with_drift

rng = np.random.RandomState(0)
serie1 = np.arange(1, 8)[np.arange(100) % 7] + rng.randint(-1, 2, size=100)
serie2 = np.arange(100) + rng.rand(100)
series = pd.DataFrame(
    {
        'ds': pd.date_range('2000-01-01', periods=serie1.size + serie2.size, freq='M'),
        'y': np.hstack([serie1, serie2]),
    },
    index=pd.Index([0] * serie1.size + [1] * serie2.size, name='unique_id')
)

fcst = StatsForecast(series, models=[random_walk_with_drift], freq='M')
forecasts = fcst.forecast(5)                                       

I think the problem might be solved using pd.DatetimeIndex on self.last_dates. I'll open a PR soon.

Creation of forecast dates takes too long

Describe the bug
Creation of forecast dates takes too long.
Computation is wasted when last_dates of all time series are the same (expected in most applications).

To Reproduce
image

Possible solution
image
Desktop (please complete the following information):

  • OS: Mac
  • Version 0.5.3

[FEAT] Add environment variables for `njit`'s `cache=True` and `nogil=True`

Is your feature request related to a problem? Please describe.
To speed up numba functions, cache=True can be used to avoid compilation times each time the function is invoked; and nogil=True can be used to release Python's GIL which can be useful since when doing multiprocessing numba compiles the function for every process.

Describe the solution you'd like
I've been thinking about a solution for a while and I think the best thing to do is to include environment variables for each argument, as suggested here.

Describe alternatives you've considered
Maybe default both arguments to True, but it is too restrictive.

StatsForecast's `compute_forecasts` unintuitive tuples (model, *args)

I have been trying to explore the use of an auto_arima with a filtered hyperparameter space.

StatsForecasts' calls to the compute_forecasts method are fairly unintuitive. Leveraging the order of unnamed hyperparameters in a tuple to define different models is just weird.

Would it be better for the hyperparameters' control and visibility to define the model with a partial function with the fixed hyperparameters?

Singular matrix error.

The following problem appears,

image

Since we already have the inverse of the Hessian, I think we could use,

sol = np.matmul(res.hess_inv, A) / n_used

Instead of,

hess = np.linalg.inv(res.hess_inv)
sol = np.linalg.solve(hess * n_used, A)

After that change, the problem disappears.

ARIMA model parameters

While AutoARIMA from pmdarima is slow, it shows the model hyperparameters in a convenient manner. How can I get the same for statsforecast AutoARIMA?

A consistent interface with pmdarima (as much as possible) would be appreciated so it can be a drop in replacement.

image

Compute residuals

I'm currently trying to perform some forecastings on a set of daily time series and I was wondering whether is there a way to get the predictions on the training data, that are used to compute the residuals (difference between actual and predictions in the train). In StatsForecast class there is no possibility for doing that. I'm mainly interested to obtain them with auto_arima approach, but it could be extended also for the remaining approaches.

Is it possible to add a method or attribute to get them?

Thank you

[FEAT] Add n_windows for cross validation tasks

Is your feature request related to a problem? Please describe.
Now, the cross_validation method receives test_size, but it is unintuitive. I would like to have n_windows.

Describe the solution you'd like
Preserve test_size while adding n_windows and step_size. Both parameters should be mutually exclusive.

Describe alternatives you've considered
Since n_windows = test_size - horizon + 1 (assuming step_size=1), the inclusion of the parameter should be easy.

Uninstalling statsforecast does not completely uninstall module?

Describe the bug
Uninstalling statsforecast library does not completely uninstall the library. This causes issues in sktime checks and the following passes without raising an error after the library has been uninstalled.

 _check_soft_dependencies("statsforecast", severity="error", object=self)

To Reproduce

!pip install statsforecast

import statsforecast
statsforecast.__version__

!pip uninstall statsforecast

# This passes without any problem, but should have failed since the library has been uninstalled.
# This is essentially what sktime `_check_soft_dependencies` does
import statsforecast

# This fails, but it should have failed above itself
from statsforecast.arima import AutoARIMA as _AutoARIMA

Expected behavior
After uninstalling, the follownig should raise an exception (ModuleNotFoundError or ImportError)

Screenshots
image

Wrong usage of exogenous variables using parallel processing

n_jobs>1 uses the wrong exogenous variables. In the following example, the time series indexed by 0 has [0,...,143] as exogenous variable and the time series indexed by 1 has [144,...,287] as exogenous variable.

ap_df_2 = pd.DataFrame(
    {'ds': np.hstack([np.arange(ap.size), np.arange(ap.size)]), 
     'y': np.hstack([ap, ap])}, 
    index=pd.Index([0] * ap.size + [1] * ap.size, name='unique_id')
)
ap_df_2['x'] = np.arange(2 * ap.size)
ap_df_2 = ap_df_2.reset_index()
ap_df_2_test = ap_df_2.groupby('unique_id').tail(7)
ap_df_2_train = ap_df_2.drop(ap_df_2_test.index)

In the following image, I print x using n_jobs=1 and the data is correct.

image

But when I print x using n_jobs>1 the issue appears: the same exogenous data for both series,

image

I think the problem is related to the following lines,

for i, grp in enumerate(self):
if xreg is not None:
xr = xreg[i*h : (i+1)*h]

The index i does not consider the gas partition.

ValueError: math domain error (difference between python and R)

When I tried to run auto_arima I got a math domain error error inside armafn (inside arima) due to a negative s2.

return 0.5 * (math.log(s2) + res[1] / res[2])

R returns NaN when log is used on a negative value, so they don't have this problem.
A possible solution would be:

if s2 < 0:
     return math.nan

Wrong Badge license

Describe the bug

The badge license in the README shows GPLv3, but it should be MIT.

image

Add update feature

Thanks a lot! I look forward to them.

Another valuable feature to add is .update(). Then we don't need to re-train the models with hourly observations until the models get outdated.

Originally posted by @tuttoaposto in #71 (reply in thread)

Add ARIMA class

Is your feature request related to a problem? Please describe.
The library already has the AutoARIMA class but it would be helpful to have the ARIMA class.

Describe the solution you'd like
An ARIMA class.

`ic` key error.

For some series a ic key error occurs for the fit dictionary.
I think the problem arises because of the following lines:

if not math.isnan(fit['aic']):
            fit['bic'] = fit['aic'] + npar * (math.log(nstar) - 2)
            fit['aicc'] = fit['aic'] + 2 * npar * (npar + 1) / (nstar - npar - 1)
            fit['ic'] = fit[ic]
else:
    fit['aic'] = fit['bic'] = fit['aicc'] = math.inf

I haven't checked the R code, but adding fit['ic'] in the else statement worked for me.

Stability of the API

Hi, great work on statsforecasts! This package looks very nice.
I'm considering integrating a couple of the models in Darts (https://github.com/unit8co/darts). I'm wondering about your future plans - do you intend to maintain this package on the long term? How likely can we expect API changes in the future releases?

Also as a side note - I took a quick look at the Croston method, and it looks like the method accepts h and future_xreg, which I'm not sure is intended as those are not used.

In general I think slightly more documentations on your different models could be helpful for users :)

[question] Division by Zero error

When running on many groups of time series, some groups are giving me a 'division by zero' error, and the script stops.
Is there a way to pass through this error and complete the forecasts without errors?

exogenous variables on auto_arima

Hi there,

is it possible to add an exogenous variable to auto_arima? I can only see four parameters: y, h, season_length, and approximation.
If yes, would you please show me how ?

Thanks

Yassine

[FEAT] Use Dask and Spark clusters

Is your feature request related to a problem? Please describe.
Scale StatsForecast using Dask and Spark clusters.

Describe the solution you'd like
Include fugue as a backend just as we did with Ray.

Fitting AutoARIMA on constant time series causes TypeError

Version: v0.5.5

Description
When fitting an instance of AutoARIMA with default parameters on a constant time series, the Arima function gets called with the keyword argument "fixed", but this argument is not specified in the function.

To Reproduce

from statsforecast.arima import AutoARIMA
import numpy as np

AutoARIMA().fit(np.array([1]*36))

Expected behaviour
AutoARIMA should call the Arima function only with available arguments (see function signature below):

def Arima(
    x,
    order=(0, 0, 0),
    seasonal={'order': (0, 0, 0), 'period': 1},
    xreg=None,
    include_mean=True,
    include_drift=False,
    include_constant=None,
    blambda=None,
    biasadj=False,
    method='CSS',
    model=None,
)

Screenshots
image

Questions about auto_arima()

Discussed in #71

Originally posted by tuttoaposto March 8, 2022

  1. Is it possible to get the bestfit model from the auto_arima_f() step in auto_arima()? It would be nice to get the same level of details as in model.arima_res_.params and model.summary() in pmdarima.
  2. Is it possible to enable setting max_p, max_q , etc in auto_arima()?

Thank you!

ValueError: math domain error

I get ValueError: math domain error
cause by
tmp['bic'] = tmp['aic'] + npar*(math.log(nstar) - 2)
from statsforecast/arima.py", line 1225,

I guess nstar is not > 0

Division by zero (python and R difference)

When I tried to run auto_arima I got a division by zero error inside armafn (inside arima):

s2 = res[0] / res[2]

R returns Inf when a divison by zero happens, so they don't have this problem.
A possible solution would be:

if res[2] == 0.:
     return math.inf

Various bugs in documentation

Describe the bug
There a several bugs in the documentation :

  • Broken links
  • Links not visible
  • Background only partially applied

Below a few screenshots of the issue

"Broken link"
image
image

"Background only partially applied"

image

Error when n_series * n_models < n_jobs

When the number of series is less than n_jobs, the following problem arises:

image

I think we could change the following lines,

gas = self.ga.split(self.n_jobs)
with ProcessPoolExecutor(self.n_jobs) as executor:

To,

n_jobs = min(self.n_jobs, len(self.ga) * len(self.models))
gas = self.ga.split(n_jobs) 
with ProcessPoolExecutor(n_jobs) as executor: 

tqdm like expected time

I would like to have an estimated time for the completion of the jobs.
tqdm has a way to monitor the time taken by the code and an estimated arrival.
The parallelized version is extremely fast, still it would be handy to monitor its time.

error when arima_like's gain is 0

I got the following error:
image
I saw that it was related to the following lines within arima_like:

for j in range(d):
    gain += delta[j] * M[r + j]
if gain < 1e4:
    nu += 1
    ssq += resid * resid / gain
    sumlog += math.log(gain)
if use_resid:
     rsResid[l] = resid / math.sqrt(gain)
for i in range(rd):
    a[i] = anew[i] + M[i] * resid / gain
for i in range(rd):
    for j in range(rd):
        P[i + j * rd] = Pnew[i + j * rd] - M[i] * M[j] / gain

Using

gain = M[0]
if gain == 0.:
    gain += 1e-18

Solved the issue.

Typos in arima.ipynb

Several typos in the notebook :

  • 'Would nn autorregresive '
  • 'testing purporses,'
  • 'will let auto_arima to handle'

n_jobs = -1 breaks

Describe the bug
In sklearn, we can pass n_jobs = -1. Here, however, it breaks

To Reproduce
Run

import numpy as np
import pandas as pd

from statsforecast import StatsForecast
from statsforecast.models import seasonal_naive, auto_arima
from statsforecast.utils import AirPassengers

horizon = 12
ap_train = AirPassengers[:-horizon]
ap_test = AirPassengers[-horizon:]

series_train = pd.DataFrame(
    {
        'ds': pd.date_range(start='1949-01-01', periods=ap_train.size, freq='M'),
        'y': ap_train
    },
    index=pd.Index([0] * ap_train.size, name='unique_id')
)

fcst = StatsForecast(
    series_train,
    models=[(auto_arima, 12), (seasonal_naive, 12)],
    freq='M',
    n_jobs=-1
)
forecasts = fcst.forecast(12, level=(80, 95))

Expected behavior
n_jobs could take -1

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: mac
  • Browser: huh
  • Version:
>>> statsforecast.__version__
'0.5.3'

Additional context
Add any other context about the problem here.

[BUG] `ds` object error

Describe the bug
When ds is a object, the method forecast arises an error but at the end of the pipeline, once the forecasts are computed.

Expected behavior
Check ds type at the beginning.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.