Code Monkey home page Code Monkey logo

mlforecast's Introduction

mlforecast

Tweet Slack

Machine Learning πŸ€– Forecast

Scalable machine learning for time series forecasting

CI Python PyPi conda-forge License

mlforecast is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.

Install

PyPI

pip install mlforecast

conda-forge

conda install -c conda-forge mlforecast

For more detailed instructions you can refer to the installation page.

Quick Start

Get Started with this quick guide.

Follow this end-to-end walkthrough for best practices.

Sample notebooks

Why?

Current Python alternatives for machine learning models are slow, inaccurate and don’t scale well. So we created a library that can be used to forecast in production environments. MLForecast includes efficient feature engineering to train any machine learning model (with fit and predict methods such as sklearn) to fit millions of time series.

Features

  • Fastest implementations of feature engineering for time series forecasting in Python.
  • Out-of-the-box compatibility with pandas, polars, spark, dask, and ray.
  • Probabilistic Forecasting with Conformal Prediction.
  • Support for exogenous variables and static covariates.
  • Familiar sklearn syntax: .fit and .predict.

Missing something? Please open an issue or write us in Slack

Examples and Guides

πŸ“š End to End Walkthrough: model training, evaluation and selection for multiple time series.

πŸ”Ž Probabilistic Forecasting: use Conformal Prediction to produce prediciton intervals.

πŸ‘©β€πŸ”¬ Cross Validation: robust model’s performance evaluation.

πŸ”Œ Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.

πŸ“ˆ Transfer Learning: pretrain a model using a set of time series and then predict another one using that pretrained model.

🌑️ Distributed Training: use a Dask, Ray or Spark cluster to train models at scale.

How to use

The following provides a very basic overview, for a more detailed description see the documentation.

Data setup

Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific serie and timestamp.

from mlforecast.utils import generate_daily_series

series = generate_daily_series(
    n_series=20,
    max_length=100,
    n_static_features=1,
    static_as_categorical=False,
    with_trend=True
)
series.head()
unique_id ds y static_0
0 id_00 2000-01-01 17.519167 72
1 id_00 2000-01-02 87.799695 72
2 id_00 2000-01-03 177.442975 72
3 id_00 2000-01-04 232.704110 72
4 id_00 2000-01-05 317.510474 72

Models

Next define your models. These can be any regressor that follows the scikit-learn API.

import lightgbm as lgb
from sklearn.linear_model import LinearRegression
models = [
    lgb.LGBMRegressor(random_state=0, verbosity=-1),
    LinearRegression(),
]

Forecast object

Now instantiate an MLForecast object with the models and the features that you want to use. The features can be lags, transformations on the lags and date features. You can also define transformations to apply to the target before fitting, which will be restored when predicting.

from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean, RollingMean
from mlforecast.target_transforms import Differences
fcst = MLForecast(
    models=models,
    freq='D',
    lags=[7, 14],
    lag_transforms={
        1: [ExpandingMean()],
        7: [RollingMean(window_size=28)]
    },
    date_features=['dayofweek'],
    target_transforms=[Differences([1])],
)

Training

To compute the features and train the models call fit on your Forecast object.

fcst.fit(series)
MLForecast(models=[LGBMRegressor, LinearRegression], freq=D, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size28'], date_features=['dayofweek'], num_threads=1)

Predicting

To get the forecasts for the next n days call predict(n) on the forecast object. This will automatically handle the updates required by the features using a recursive strategy.

predictions = fcst.predict(14)
predictions
unique_id ds LGBMRegressor LinearRegression
0 id_00 2000-04-04 299.923771 311.432371
1 id_00 2000-04-05 365.424147 379.466214
2 id_00 2000-04-06 432.562441 460.234028
3 id_00 2000-04-07 495.628000 524.278924
4 id_00 2000-04-08 60.786223 79.828767
... ... ... ... ...
275 id_19 2000-03-23 36.266780 28.333215
276 id_19 2000-03-24 44.370984 33.368228
277 id_19 2000-03-25 50.746222 38.613001
278 id_19 2000-03-26 58.906524 43.447398
279 id_19 2000-03-27 63.073949 48.666783

280 rows Γ— 4 columns

Visualize results

from utilsforecast.plotting import plot_series
fig = plot_series(series, predictions, max_ids=4, plot_random=False)

How to contribute

See CONTRIBUTING.md.

mlforecast's People

Contributors

adriaanvh1 avatar azulgarza avatar dependabot[bot] avatar hahnbeelee avatar jmoralez avatar jose-moralez avatar marcogorelli avatar mergenthaler avatar naren8520 avatar rpmccarter avatar tblume1992 avatar tracykteal avatar uumami avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlforecast's Issues

lightgbmCV with MAE error

Currently mlforecast gives an error if you do not select MAPE or RMSE as it says they are not implemented yet.

[distributed] Make all categoricals known in a single pass

Summary

Use dask.dataframe.categorize to make all categorical columns known with a single pass over the data.

Motivation

When reading from parquet in distributed mode each categorical column is made known individually, making as many passes over the data as there are categorical columns.

Description

for col in data.select_dtypes(include='category'):
data[col] = data[col].cat.as_known()

Should be changed to something like:

categorical_columns = data.select_dtypes(include='category').columns
data = data.categorize(columns=categorical_columns)

References

https://docs.dask.org/en/latest/dataframe-categoricals.html

Adding Lags to dynamic features

Summary

  • Add lags to dynamic regressors

Motivation

  • I wish to include historical values of dynamic variables

Description

  • currently I have to define a fake forecast function, then preprocess, then change the names, then make a model forecast function

Example

def add_lags(df,lags,var, name_conv):
    fcst_data = MLForecast(
        models=[],
        freq='W',
        lags=[i+1 for i in range(lags)]
    )
    new_full_df = fcst_data.preprocess(data=df, id_col='unique_id',time_col='ds', target_col=var)    
    new_names = list(set(new_full_df.columns) - set(full_df.columns))
    for name in new_names:
        new_full_df.rename(columns={name:name_conv + name},inplace = True)
    
    return new_full_df

No regession to a dynamic exogenous feature in an autoregressive model with no lags

Description

The MLForecast package is encountering a bug when the auto-regressive forecasting model outputs a constant prediction at all steps when input with a dynamic exogenous feature and no lag features. This is a common scenario in PV power output forecasting, where the input features are typically weather forecasts and the target is the power generated by a solar panel.
-->

Reproducible example

import pandas as pd
from mlforecast.utils import generate_daily_series, generate_prices_for_series
from sklearn.linear_model import LinearRegression
from mlforecast import MLForecast

series = generate_daily_series(100, equal_ends=True, n_static_features=2, static_as_categorical=False)
dynamic_series = series.rename(columns={'static_1': 'product_id'})
prices_catalog = generate_prices_for_series(dynamic_series)
series_with_prices = dynamic_series.merge(prices_catalog, how='left')
# drop static features
series_with_prices = series_with_prices.drop(columns=['static_0', 'product_id'])

models = [
    LinearRegression(),
]
fcst = MLForecast(
    models,
    freq='D',
)
fcst.fit(series_with_prices)

preds = fcst.predict(7, dynamic_dfs=[prices_catalog])
print(preds)
-

Environment info

Install method: pip
'0.7.0'

Additional information

rename `backtest` to `cross_validation` and have it return a single dataframe

Summary

Currently the backtest method returns a generator for as many items as there are windows, this should be changed to cross_validation and return a single dataframe containing the results for all windows.

Motivation

Allows this library to be more in sync with statsforecast.

Description

Rename the method and collect all results before returning. This should also include a column indicating the last training date used in each window.

References

https://nixtla.github.io/statsforecast/core.html#statsforecast.cross_validation

[Core, Discussion] Make arguments compatible with the nixtlaverse

Description

Certain methods within MLForecast accept similar arguments to other components within the nixtlaverse (StatsForecast and MLForecast), but these arguments are named differently. We aim to standardize these argument names for better consistency across the package. Specifically:

  • In the fit method, the argument data should be renamed to df.
  • In the predict method, the argument horizon should be renamed to h, new_data should be changed to new_df, and the naming of dynamic_dfs needs discussion (it's currently referred to as X_df in StatsForecast and futr_df in NeuralForecast).
  • In the cross_validation method, data should be renamed to df, and window_size should be changed to h.

Use case

No response

[Core] Add interpretability capabilities trough SHAP

Description

To enhance the interpretability of models trained using MLForecast, we propose leveraging SHAP (SHapley Additive exPlanations). SHAP is compatible with XGBoost, LightGBM, and scikit-learn models. Currently, if we want to use it, we need to create the dataset for which we desire forecast explanations (using preprocess) and iterate over each trained model using the following:

explainer = shap.Explainer(model)
shap_values = explainer(X)

The goal is to introduce a method, possibly named shap_values, to generate SHAP values for the forecasts from all trained models.

Use case

No response

MLForecast pipeline : Feature extraction/creation + Feature selection + Hyperparameter tuning

Hi,

It would be nice if it was possible to configure the following pipeline :

  • Create from a target (or other data) a bunch of features (from domain knowledge or TS analysis or "usual" used features).
  • Select ONLY the best features (The quantity should be done by a model(LASSO?) for example and not the user.
  • Tuning one or multiple ML models
  • Feature importance
  • I don't know much about this part, but getting some insights or explainability...

I hope the request is clear.

Lag interaction features

Is there anyway to pass lag interaction features like (lag_1 - lag_2) or (lag_1 / lag_2) directly to MLForecast()?

Remove `TimeSeries` from `Forecast` constructor

Summary

Make Forecast the only required import and initialize it with the same arguments that TimeSeries currently takes.

Motivation

Makes the interface simpler and more like statsforecast.

Description

The basic workflow would instead be:

fcst = Forecast(
    models=[lgb.LGBMRegressor(), xgb.XGBRegressor()],
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [(rolling_mean, 7), (rolling_mean, 14)]
    },
    date_features=['dayofweek', 'month']
)
fcst.fit(data)
fcst.predict(10)

[Core] Add parameter to use the ID as a feature

Description

Currently you have to create a copy of the column and pass it in like:

import lightgbm as lgb
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series


series = generate_daily_series(100, equal_ends=True, n_static_features=2, static_as_categorical=True)

series['my_id_col'] = series['unique_id'].copy()

mlf = MLForecast(
    models=lgb.LGBMRegressor(),
    freq='D',
    lags=[7],
)
mlf.fit(series, static_features=['my_id_col', 'static_0', 'static_1'])

preds = mlf.predict(12)

If this could be a boolean parameter passed to fit or object creation that would make it a smoother user experience.

Use case

Using the ID as a feature can increase performance in many scenarios or at least alleviates the need for as many window functions.

Request for Implementation of fill_gaps Function to Handle Time Series with Gaps

Description

Hi

First of all, I would like to congratulate everyone at Nixtla for the implementations, your ecosystem is really good.

I hope this message finds you well. I am reaching out to request the implementation of the fill_gaps functio, similar to the one available in the R tsibble library, which allows us to handle time series that have gaps in the dates, as well as different start and end dates between time series.

The fill_gaps function is an extremely useful tool for dealing with time series, especially in situations where data is collected non-uniformly, resulting in implicit missing values in the series. It offers the flexibility to fill these gaps with explicit values, making time series analysis more accurate and efficient.

Follow the link to the help page: https://tsibble.tidyverts.org/reference/fill_gaps.html

The library timetk has a similar tool, called pad_by_time() : https://search.r-project.org/CRAN/refmans/timetk/html/pad_by_time.html

Use case

Key Features and Importance of the fill_gaps function:

Gap Filling: The function allows us to fill gaps in time series with default NA values, helping to maintain the data structure and ensuring that no values are overlooked during analysis.

Support for Multiple Time Series (panel data): The function can handle multiple time series, where each series may have its own start and end dates, as well as possible gaps in observations.

Flexibility of Filling: It enables users to specify custom values to replace implicit missing values, giving the freedom to choose the most suitable method for filling.

Handling Unbalanced Panel Data: When time series have different time periods, the function can appropriately fill the gaps, ensuring that each series is treated independently, even if their observations are not aligned.

Size of cross-validation windows used to calibrate the intervals

Hello, can you please explain why the size of the window for cross-validation of values for conformal interval calibration depends on the length of the prediction interval. For example why I can't have a train length 1000, n_windows = 50, window_size = 50. And then predict the value 1 point ahead ( test length 1 or horizon = 1) ?
image

[Forecast] Implement cross_validation fitted values

Description

Add fitted argument to MLForecast.cross_validation to compute the in-sample predictions for each fold and MLForecast.cross_validationfitted_values method to retrieve them.

Use case

Analyzing error on training set for each fold in CV, originally requested here

Method predict uses new_data's first row as model input but associates result date with new_data's last timestamp + 1*freq

Description

Method predict uses new_data's first row as model input but associates result date with new_data's last timestamp + 1freq.
We can see from code example that first sample is being used as X but result is being associated with last timestamp available +1
freq.

Reproducible example

import lightgbm as lgb
from sklearn.ensemble import RandomForestRegressor

models = [
    lgb.LGBMRegressor(),
    RandomForestRegressor(random_state=0),
]

from mlforecast import MLForecast

fcst = MLForecast(
    models=models,
    freq=f'3600s',
)


# train_df definition

train_df[0:2]
# HUFL	HULL	MUFL	MULL	LUFL	LULL	OT	id_col	date
# 0	5.827	2.009	1.599	0.462	4.203	1.340	30.531000	1	2016-07-01 00:00:00
# 1	5.693	2.076	1.492	0.426	4.142	1.371	27.787001	1	2016-07-01 01:00:00


def inspect_input(x):
    from IPython.display import display
    print('inspect_input')
    display(x)
    return x

preds = fcst.predict(horizon=2, new_data=train_df[0:2], before_predict_callback=inspect_input)
# inspect_input
# id_col	HUFL	HULL	MUFL	MULL	LUFL	LULL
# 1	5.827	2.009	1.599	0.462	4.203	1.34

print(preds)
# 	id_col	date	LGBMRegressor	RandomForestRegressor
# 0	1	2016-07-01 02:00:00	30.530473	30.530290
# 1	1	2016-07-01 03:00:00	29.476429	28.449670

Error message
# Stacktrace

Environment info

Install method (pip, conda, github):
pip
Package version:

0.6.0

Additional information

Access to fit params of the underlying model

Summary

I would like a parameter to be added to the MLForecast.fit() function that specifies the fit params for the underlying ML model. This parameter can be called something like "model_fit_params" and should be a set of arguments that are passed to the underlying model.fit() function.

Motivation

I would like to have access to the sample weights of my LightGBM model, which is a parameter of the fit function and therefore currently cannot be passed to the MLForecast object.

Cross-validation produces no output

I have a dataset with muItiple static and dynamic exogenous features and am using MLForecast.cross_validation, but get no output - the resulting data frame is empty. Curiously, predict function works fine.
I have checked the to use cross_validate with the same forecast object and sample data from Cross-validation example and this works too. What could be an issue? Sorry as I can't share the data to provide a reproducible example.

[Forecast] support predicting only a subset of the training series

Description

Right now MLForecast.predict generates predictions for all series that were seen during training. It would be useful to have it compute forecasts for only a subset of the series.

Use case

This can be desirable in a serving setting where forecasts are requested but only for a single or a few series. Also interactive use cases as described in #165 (reply in thread)

mlforecast for multivariate time series analysis

Hello,

I want to use "mlforecast" library for my Multivariate Time Series problem and I want to know how could I add new features, like holidays or temperature, to the dataset besides 'lags' and 'date_features'.
Below is flow configuration:

`fcst = Forecast(
    models=model,
    freq='W-MON',
    lags=[1,2,3,4,5,6,7,8],
    date_features=['month', 'week']
)
`

Is there a way to add exogenous variables to the training process? I could not find relevant information to be able to do this.

Thank you!

Accept list of models in forecast

Summary

Make it possible to accept a list of models when using the forecast object.

Motivation

This would allow to re-use the processed series and compute predictions more efficiently. Also allows this library to be more in sync with statsforecast.

Description

Something along the lines of Forecast([Model1(), Model2()], ts) where the predict method will return a dataframe with as many columns as models.

References

https://nixtla.github.io/statsforecast/core.html#statsforecast models argument.

dynamics_df in predict method doesn't work as expected

What happened + What you expected to happen

I think when exogenous variable (X) is presented, predict method simply uses the first row of X (first observation) instead of the last row (latest observation), hence
preds = fcst.predict(7, dynamic_dfs=[X]) provides the same value as
preds = fcst.predict(7) even X is not provided.

Versions / Dependencies

  • Python == 3.10.12
  • mlforecast == 0.7.4

Reproduction script

from mlforecast.utils import generate_daily_series
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from mlforecast import MLForecast
import numpy as np 
series = generate_daily_series(
    n_series=1,
    max_length=100,
    n_static_features=1,
    static_as_categorical=False,
    with_trend=True
)
series['important_col'] = series['y'] + np.random.normal(0, 1, len(series))
df_train, df_test = train_test_split(series, test_size=12, shuffle=False)

fcst = MLForecast(
    models=[LinearRegression(), ], 
    freq='M',
    lags=[1, 2, 3],
)
fcst.fit(df_train)
yhat = fcst.predict(12, dynamic_dfs = [df_test])['LinearRegression'].values
yhat_without_X = fcst.predict(12, )['LinearRegression'].values

np.testing.assert_allclose(yhat, yhat_without_X)

Issue Severity

High: It blocks me from completing my task.

exogenous variables

Description

Thank you for your labors.
Please, if it is possible, add to this package the possibility of supplying not only lag / sliding variables, but also exogenous variables

Use case

No response

Bug: When using Forecast.backtest on a series with freq='W', y_pred contains null values

Code to reproduce:

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from mlforecast.core import TimeSeries
from mlforecast.forecast import Forecast

#Generate weekly data
#https://towardsdatascience.com/forecasting-with-machine-learning-models-95a6b6579090

rng = np.random.RandomState(90)
serie_length = 52 * 4  #4 years' weekly data
dates = pd.date_range('2000-01-01', freq='W', periods=serie_length, name='ds')
y = dates.dayofweek + rng.randint(-1, 2, size=dates.size)
data = pd.DataFrame({'y': y.astype(np.float64)}, index=dates)
#data.plot(marker='.', figsize=(20, 6));

train_mlfcst = data.reset_index()[['ds', 'y']]
train_mlfcst.index = pd.Index(np.repeat(0, data.shape[0]), name='unique_id')

backtest_fcst = Forecast(
    LinearRegression(fit_intercept=False), TimeSeries(lags=[4, 8])
)
backtest_results = backtest_fcst.backtest(train_mlfcst, n_windows=2, window_size=52)

result1 = next(backtest_results)
result1

	ds	y	y_pred
unique_id			
0	2001-12-30	6.0	5.105716
0	2002-01-06	5.0	5.026820
0	2002-01-13	7.0	4.640784
0	2002-01-20	5.0	6.145316
0	2002-01-27	6.0	4.746834
0	2002-02-03	6.0	4.635672
0	2002-02-10	7.0	4.271653
0	2002-02-17	7.0	NaN
0	2002-02-24	7.0	NaN
0	2002-03-03	5.0	NaN
0	2002-03-10	5.0	NaN
0	2002-03-17	7.0	NaN
0	2002-03-24	7.0	NaN
0	2002-03-31	5.0	NaN
0	2002-04-07	7.0	NaN
0	2002-04-14	5.0	NaN
0	2002-04-21	6.0	NaN
0	2002-04-28	5.0	NaN
0	2002-05-05	7.0	NaN
0	2002-05-12	7.0	NaN
0	2002-05-19	5.0	NaN
0	2002-05-26	6.0	NaN
0	2002-06-02	5.0	NaN
0	2002-06-09	6.0	NaN
0	2002-06-16	5.0	NaN
0	2002-06-23	6.0	NaN
0	2002-06-30	6.0	NaN
0	2002-07-07	6.0	NaN
0	2002-07-14	7.0	NaN
0	2002-07-21	5.0	NaN
0	2002-07-28	6.0	NaN
0	2002-08-04	6.0	NaN
0	2002-08-11	5.0	NaN
0	2002-08-18	7.0	NaN
0	2002-08-25	7.0	NaN
0	2002-09-01	6.0	NaN
0	2002-09-08	5.0	NaN
0	2002-09-15	6.0	NaN
0	2002-09-22	5.0	NaN
0	2002-09-29	5.0	NaN
0	2002-10-06	6.0	NaN
0	2002-10-13	5.0	NaN
0	2002-10-20	6.0	NaN
0	2002-10-27	5.0	NaN
0	2002-11-03	6.0	NaN
0	2002-11-10	5.0	NaN
0	2002-11-17	7.0	NaN
0	2002-11-24	7.0	NaN
0	2002-12-01	6.0	NaN
0	2002-12-08	5.0	NaN
0	2002-12-15	5.0	NaN
0	2002-12-22	6.0	NaN

[DOCS] Add guide/recipe for target transforms

Summary

We would like to have a guide on how to use target transforms and also how to define custom target transforms (for example BoxCox).

Motivation

Currently, documentation for target transformations is only available in the end-to-end walkthrough tutorial.

References

Here's an example.

Only being able to pass one transformation per lag

It seems we can only pass one transformation per lag (the last one will be accepted). I wonder whether I am missing something of it is part of the design or bug?

series = generate_daily_series(
    n_series=20,
    max_length=100,
    n_static_features=1,
    static_as_categorical=False,
    with_trend=True
)
models = [lgb.LGBMRegressor()]
fcst = MLForecast(
    models=models,
    freq='D',
    lags=[1, 7],
    lag_transforms={
        1: [(rolling_mean, 2)],
        1: [(rolling_mean, 4)],
        1: [(rolling_mean, 6)],
        7: [(rolling_mean, 28)]
    },
    date_features=['dayofweek'],
    differences=[1],
)
fcst

MLForecast(models=[LGBMRegressor], freq=<Day>, lag_features=['lag1', 'lag7', 'rolling_mean_lag1_window_size6', 'rolling_mean_lag7_window_size28'], date_features=['dayofweek'], num_threads=1)

Unable to import Forecast from mlforecast

Description

Unable to import Forecast from mlforecast

Reproducible example

# code goes here
from mlforecast import Forecast
ImportError: cannot import name 'Forecast' from 'mlforecast' (/home//mambaforge/envs/dev/lib/python3.7/site-packages/mlforecast/__init__.py)
# Stacktrace

Environment info

python=3.7
pip installlation mlforecast

Package version:
mlforecast=0.2.0

Additional information

in sample forecast

Description

Not sure how to make in sample predictions using the current library.
To use it together with hierarchicalforecast library, and to reconcile, y_fitted_df is required, which has one of its columns predictions made with the model for the training data (unique_id, ds, y, model predictions are all the columns needed )

Reproducible example

In the current set up, there is fit, and predict, where predict only needs to receive as input, the horizon.
Wonder how to make predictions for the training dataset .

# code goes here

image

https://nixtla.github.io/hierarchicalforecast/examples/australianprisonpopulation.html

# Stacktrace

Environment info

Install method (pip, conda, github):

Package version:

Additional information

raise error when series is too short for backtest

Description

When doing backtesting if a series is empty in one of the windows all its predictions are null. This should raise an error at the start to warn the user instead.

Reproducible example

import numpy as np
import pandas as pd
from mlforecast.core import TimeSeries
from mlforecast.forecast import Forecast
from mlforecast.utils import generate_daily_series

class DummyModel:
    def fit(*args):
        ...
    def predict(self, X):
        return np.full(X.shape[0], 1)

series = generate_daily_series(10, min_length=10, max_length=50)
ts = TimeSeries(lags=[1])
model = DummyModel()
fcst = Forecast(model, ts)
res = list(fcst.backtest(series, n_windows=1, window_size=30))[0]
pd.concat(
    [
        res.groupby('unique_id').size().lt(30).rename('size_lt_window'),
        res['y_pred'].isnull().groupby(res.index).all().rename('all_predictions_nulls'),
    ],
    axis=1,
)
unique_id size_lt_window all_predictions_nulls
id_0 1 1
id_1 1 1
id_2 1 1
id_3 0 0
id_4 1 1
id_5 1 1
id_6 0 0
id_7 0 0
id_8 0 0
id_9 1 1

[Core] Parameter for scaling/normalizing

Description

Currently using different scaling or transformers such as boxcox or a standard scaler need to be supplied by the user but we may want to enforce a pattern around when the transformation occurs, for example transforming after differencing will place the values in different magnitudes than if you transform first. Also it would be nice if the transforms were numba based!

Use case

It would be great if there were a list of featured scalers we could pass as a string to the fit method or class creation such as:

import lightgbm as lgb
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series


series = generate_daily_series(100, equal_ends=True, n_static_features=2, static_as_categorical=True)

series['my_id_col'] = series['unique_id'].copy()

mlf = MLForecast(
    models=lgb.LGBMRegressor(),
    freq='D',
    lags=[7],
    scaler='standard' #new parameter
)
mlf.fit(series)

preds = mlf.predict(12)

[Core] Add support for polars

Description

Our codebase primarily relies on pandas for data handling and manipulation tasks. However, we have identified potential performance improvements that could be gained by incorporating support for Polars, a fast DataFrame library implemented in Rust and available in Python.

Polars is designed to outperform pandas in various scenarios and could provide significant speed-ups for our data processing tasks. This can benefit larger datasets and more complex operations, making our toolset more versatile and efficient.

The task would involve reviewing the codebase and integrating the possibility of using Polars as input instead of Pandas. We must ensure the transition is seamless and keeps existing functionalities intact.

This is a substantial task that might require time and careful testing. Any contributors willing to help with this task are welcome. Please feel free to comment below if you'd like to assist or have suggestions on approaching this task.

Use case

No response

ValueError when trying to use prediction_intervals on multivariate forecasting task

Hello team,

I really appreciate the work that you do, I wish I would have found this library earlier :-D.
Anyways, I am following this tutorial to try and create forecasts for a multivariate forecasting task. My dataset consists of the date column 'ds', 'unique_id', the target 'y', some lags 'y_lag_N' and , multiple regressor column of type float. I can not share the data due to confidentiality, but here is what my code looks like (where dataset_train is the before described training dataframe):

from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from mlforecast.utils import PredictionIntervals
from sklearn.linear_model import Lasso, LinearRegression, Ridge
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor

models = [
    KNeighborsRegressor(),
    Lasso(),
    LinearRegression(),
    MLPRegressor(),
    Ridge(),
]

mlf = MLForecast(
    models=models,
    target_transforms=[Differences([1])],
)

H = 12

# convert ds to int
dataset_train['ds'] = dataset_train['ds'].apply(lambda x: int(x.timestamp()))

mlf.fit(
    dataset_train, 
    id_col='unique_id', 
    time_col='ds', 
    target_col='y', 
    prediction_intervals=PredictionIntervals(
        n_windows=len(dataset_train)//H, window_size=H),
    
)

levels = [50, 80, 95]
forecasts = mlf.predict(H, level=levels)
forecasts.head()

This results in:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[~\AppData\Local\Temp\ipykernel_20272\4290687288.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/user/Documents/projectname/notebooks/~/AppData/Local/Temp/ipykernel_20272/4290687288.py) in ()
     27 # dataset_train['ds'] = dataset_train['ds'].apply(lambda x: int(x.timestamp()))
     28 
---> 29 mlf.fit(
     30     dataset_train_2,
     31     id_col='unique_id',

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\utils.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/utils.py) in inner(*args, **kwargs)
    184                             new_args.append(kwargs.pop(arg_names[i]))
    185                         new_args.append(kwargs.pop(old_name))
--> 186             return f(*new_args, **kwargs)
    187 
    188         return inner

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\forecast.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/forecast.py) in fit(self, df, id_col, time_col, target_col, static_features, dropna, keep_last_n, max_horizon, prediction_intervals, data)
    374         if prediction_intervals is not None:
    375             self.prediction_intervals = prediction_intervals
--> 376             self._cs_df = self._conformity_scores(
    377                 df=df,
    378                 id_col=id_col,

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\forecast.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/forecast.py) in _conformity_scores(self, df, id_col, time_col, target_col, static_features, dropna, keep_last_n, max_horizon, n_windows, h)
    306         is the same for all the forecasting horizon (`h=1`).
    307         """
--> 308         cv_results = self.cross_validation(
    309             df=df,
    310             n_windows=n_windows,

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\utils.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/utils.py) in inner(*args, **kwargs)
    184                             new_args.append(kwargs.pop(arg_names[i]))
    185                         new_args.append(kwargs.pop(old_name))
--> 186             return f(*new_args, **kwargs)
    187 
    188         return inner

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\forecast.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/forecast.py) in cross_validation(self, df, n_windows, h, id_col, time_col, target_col, step_size, static_features, dropna, keep_last_n, refit, max_horizon, before_predict_callback, after_predict_callback, prediction_intervals, level, input_size, fitted, data, window_size)
    696             )
    697             if result.shape[0] < valid.shape[0]:
--> 698                 raise ValueError(
    699                     "Cross validation result produced less results than expected. "
    700                     "Please verify that the frequency set on the MLForecast constructor matches your series' "

ValueError: Cross validation result produced less results than expected. Please verify that the frequency set on the MLForecast constructor matches your series' and that there aren't any missing periods.

Btw, converting the datestamp column to integer was needed or else I got the error:
"TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported. Instead of adding/subtracting n, use n * obj.freq"

I am working in an anaconda environment using Python 3.9.16 on Windows 10.
I use following packages
mlforecast 0.9.0
statsforecast 1.5.0
pandas 2.0.1
scikit-learn 1.2.2

Any advice on what could be the problem here is highly appreciated!

Cheers,
Micha

_get_dataframe_mask scales very poorly

Using backtesting with many time series results in the train-test split taking up most of the time, this is because the groupby.transform becomes a bottleneck.

Prediction results date column is shifted one step into future

Description

At training time preprocess method creates samples that map current row features -> X and current row target as -> y.
The model was trained to predict the current time-step not the next time-step.
But in prediction results prediction of that model is being associated with one step into future.
Expected: Model prediction should be associated with Nth time-step on which it was trained.
Actual: Model prediction is associated with Nth+1 time-step, while model was trained on predicting Nth time-step.

In the case when the max_horizon > 1, all rows of the prediction result DataFrame are shifted one step into the future.
Expected: The 0th row should have timestamp equal to input sample's timestamp and not next step's timestamp.
Actual: The 0th row has input sample timestamp + 1 * freq.

date + (i + 1) * self.freq

The issue is coming from the line 665 in core.py: date + (i + 1) * self.freq
I'm suggesting that +1 should be removed.
This will solve the problem for the _predict_multi, but _predict_recursive requires additional fix.

Also it is worth mentioning that predicting 0th step can be a valid use case.
However there is also a possibility to view this as a bug for sample generator logic which associates current row's X with current row's y instead of next row's y.

Reproducible example

import lightgbm as lgb
from sklearn.ensemble import RandomForestRegressor

models = [
    lgb.LGBMRegressor(),
    RandomForestRegressor(random_state=0),
]

from mlforecast import MLForecast

fcst = MLForecast(
    models=models,
    freq=f'3600s',
)

# train_df definition

train_df[0:2]
# date	HUFL	HULL	MUFL	MULL	LUFL	LULL	OT	id_col
# 0	2016-07-02 00:00:00	5.224	3.081	2.701	1.315	2.437	1.523	21.104000	1
# 1	2016-07-02 01:00:00	5.157	3.014	2.878	1.350	2.345	1.432	19.697001	1

fcst.fit(train_df, id_col=id_col, time_col=dt_col, target_col=target, dropna=True, max_horizon=10)


fcst.predict(horizon=2, new_data=train_df[0:1])
# 	id_col	date	LGBMRegressor	RandomForestRegressor
# 0	1	2016-07-02 01:00:00	20.141456	20.57573
# 1	1	2016-07-02 02:00:00	20.141456	20.57573

# 0th row date should be equal to 2016-07-02 00:00:00
Error message No error message ```python # Stacktrace ```

Environment info

Install method (pip, conda, github):
pip
Package version:
0.6.0

Additional information

MLForecast: Create a MLForecast object with pre-trained models for inference deployments

Description

Allow creation of a MLForecast with pre-trained models.

class MLForecast:
    def __init__(
        self,
        models: Models,
        freq: Optional[Freq] = None,
        lags: Optional[Lags] = None,
        lag_transforms: Optional[LagTransforms] = None,
        date_features: Optional[Iterable[DateFeature]] = None,
        differences: Optional[Iterable[int]] = None,
        num_threads: int = 1,
        target_transforms: Optional[List[BaseTargetTransform]] = None,
        is_models_pretrained: bool: = False,
    ):

Use case

Example use case would be a real time inference deployment where each server needs to create a MLForecast object on startup for inference without the need to fit the models.

[<Componente de la biblioteca: Modelo|Core|etc... >]

What happened + What you expected to happen

Hello Nixtla team.

I am making a Machine Learning model, but it turns out that I have the following problem when I make the predictions with the predict method by adding the level(confidence interval) parameter, it gives me an error:

Captura de pantalla 2023-06-28 a la(s) 12 07 18 a Β m

Versions / Dependencies

python 3.10.11

Reproduction script

mlf = MLForecast(
    models=modelos,
    freq='W',  # Nuestra serie tiene frecuencia mensual 
    lags=[6 * (i+1) for i in range(7)],
    lag_transforms={
        1: [expanding_mean],
        48: [(rolling_mean, 48)]},
    differences=[1], 
    date_features=["year", "week"] # Datos de estacionalidad
)

mlf.fit(df1, id_col='unique_id', time_col='ds', target_col='y',prediction_intervals=PredictionIntervals(n_windows=5, window_size=18))

levels = [50, 80, 95]
preds = mlf.predict(18, level=levels)# Predicimos los prΓ³ximos 18 meses
[preds.head()

Issue Severity

High: It blocks me from completing my task.

Feature Request: Detect missing values

There is no warning during the fit part if the dataset contains NaN nor the predict returns an array with NaN.

series = generate_daily_series(20)
series.iloc[1,1] = np.nan

ts = TimeSeries(
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [(rolling_mean, 7), (rolling_mean, 14)]
    },
    date_features=['dayofweek', 'month']
)
model = RandomForestRegressor()
fcst = Forecast(model, ts)
fcst.fit(series)
predictions = fcst.predict(14)

ValueError                                Traceback (most recent call last)
<ipython-input-40-e9418f34e879> in <module>
----> 1 predictions = fcst.predict(14)

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/mlforecast/forecast.py in predict(self, horizon, predict_fn, **predict_fn_kwargs)
     60         `features_order` is the list of column names that were used in the training step.
     61         """
---> 62         return self.ts.predict(self.model, horizon, predict_fn, **predict_fn_kwargs)
     63 
     64     def backtest(

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/mlforecast/core.py in predict(self, model, horizon, predict_fn, **predict_fn_kwargs)
    388         for _ in range(horizon):
    389             new_x = self._update_features()
--> 390             predictions = predict_fn(
    391                 model, new_x, self.features_order_, **predict_fn_kwargs
    392             )

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/mlforecast/core.py in simple_predict(model, new_x, *args)
    162 
    163         new_x = xgb.DMatrix(new_x)
--> 164     return model.predict(new_x)
    165 
    166 

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/sklearn/ensemble/_forest.py in predict(self, X)
    782         check_is_fitted(self)
    783         # Check data
--> 784         X = self._validate_X_predict(X)
    785 
    786         # Assign chunk of trees to jobs

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/sklearn/ensemble/_forest.py in _validate_X_predict(self, X)
    420         check_is_fitted(self)
    421 
--> 422         return self.estimators_[0]._validate_X_predict(X, check_input=True)
    423 
    424     @property

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/sklearn/tree/_classes.py in _validate_X_predict(self, X, check_input)
    405         """Validate the training data on predict (probabilities)."""
    406         if check_input:
--> 407             X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr",
    408                                     reset=False)
    409             if issparse(X) and (X.indices.dtype != np.intc or

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    419             out = X
    420         elif isinstance(y, str) and y == 'no_validation':
--> 421             X = check_array(X, **check_params)
    422             out = X
    423         else:

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    718 
    719         if force_all_finite:
--> 720             _assert_all_finite(array,
    721                                allow_nan=force_all_finite == 'allow-nan')
    722 

~/opt/anaconda3/envs/ws/lib/python3.8/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    101                 not allow_nan and not np.isfinite(X).all()):
    102             type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103             raise ValueError(
    104                     msg_err.format
    105                     (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

level in prediction for 50

When I use level 50, why it returns low and high for 50. 50 should be median and should return median. is it?

[Core] same values are scored although different time series

What happened + What you expected to happen

When using MLForecast's fit/predict or cross validation routines together with a regression model like LightGBM, XGBoost etc., the id_col ('unique_id') is not facilitated while fitting so that different entries with different unique ids are getting same prediction values. This is the case even if the dtype of the field is set to pandas pd.category type. It would be expected that either for each time series a separate model is trained and used while scoring ('local model') or all time series are used for a single model and the unique_id is used as a categorical featuere ('global model').

Versions / Dependencies

mlforecast==0.7.4
xgboost==1.7.5

Reproduction script

import pandas as pd
import numpy as np
from mlforecast import MLForecast

df = (
    pd.DataFrame(
        {
            "ds" : 3*[pd.Timestamp("2017-01-01 00:15:00"), pd.Timestamp("2017-01-01 00:30:00"), pd.Timestamp("2017-01-01 00:45:00"), pd.Timestamp("2017-01-01 01:00:00")],
            "unique_id" : 4*["ts1"] + 4*["ts2"] + 4*["ts3"],
            "y" : np.arange(0,12)
        },
    )
    .assign(unique_id = lambda df_: pd.Categorical(df_.unique_id))
)

fcst = MLForecast(
    models=[
        xgb.XGBRegressor(),
    ],
    freq='15T',
    date_features=["hour", "minute", "day", "month"]
)

scores = (
    fcst.fit(
        df,
    )
    .predict(horizon=1)
)

Result
Bildschirmfoto 2023-07-24 um 12 00 54

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Bug in PredictionIntervals

Description

If I add prediction_intervals in the MLForecast.fit and the name of target_col is not y, it will throw this error: "['y'] not found in axis".

Reproducible example

Change the y column name of example code to other names

# code goes here
import pandas as pd 
from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from mlforecast.utils import PredictionIntervals
from sklearn.linear_model import Lasso, LinearRegression, Ridge
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor

train = pd.read_csv('https://auto-arima-results.s3.amazonaws.com/M4-Hourly.csv')
test = pd.read_csv('https://auto-arima-results.s3.amazonaws.com/M4-Hourly-test.csv').rename(columns={'y': 'y_test'})
train.columns = ['unique_id', 'ds', 'demand']
n_series = 8 
uids = train['unique_id'].unique()[:n_series] # select first n_series of the dataset
train = train.query('unique_id in @uids')
test = test.query('unique_id in @uids')

models = [
    KNeighborsRegressor(),
    Lasso(),
    LinearRegression(),
    MLPRegressor(),
    Ridge(),
]

mlf = MLForecast(
    models=[Ridge(), Lasso(), LinearRegression(), KNeighborsRegressor(), MLPRegressor()],
    target_transforms=[Differences([1])],
    lags=[24 * (i+1) for i in range(7)],
)

mlf.fit(
    train, 
    id_col='unique_id', 
    time_col='ds', 
    target_col='demand', 
    prediction_intervals=PredictionIntervals(n_windows=10, window_size=48),
)
Error message
# Stacktrace
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[18], line 32
     18 models = [
     19     KNeighborsRegressor(),
     20     Lasso(),
   (...)
     23     Ridge(),
     24 ]
     26 mlf = MLForecast(
     27     models=[Ridge(), Lasso(), LinearRegression(), KNeighborsRegressor(), MLPRegressor()],
     28     target_transforms=[Differences([1])],
     29     lags=[24 * (i+1) for i in range(7)],
     30 )
---> 32 mlf.fit(
     33     train, 
     34     id_col='unique_id', 
     35     time_col='ds', 
     36     target_col='demand', 
     37     prediction_intervals=PredictionIntervals(n_windows=10, window_size=48),
     38 )

File ~/opt/anaconda3/envs/thesis/lib/python3.9/site-packages/mlforecast/forecast.py:359, in MLForecast.fit(self, data, id_col, time_col, target_col, static_features, dropna, keep_last_n, max_horizon, prediction_intervals)
    357 if prediction_intervals is not None:
    358     self.prediction_intervals = prediction_intervals
--> 359     self._cs_df = self._conformity_scores(
    360         data=data,
    361         id_col=id_col,
    362         time_col=time_col,
    363         target_col=target_col,
    364         static_features=static_features,
    365         dropna=dropna,
    366         keep_last_n=keep_last_n,
    367         n_windows=prediction_intervals.n_windows,
    368         window_size=prediction_intervals.window_size,
    369     )
    370 X, y = self.preprocess(
    371     data,
    372     id_col=id_col,
   (...)
    379     return_X_y=True,
    380 )
    381 X = X[self.ts.features_order_]

File ~/opt/anaconda3/envs/thesis/lib/python3.9/site-packages/mlforecast/forecast.py:314, in MLForecast._conformity_scores(self, data, id_col, time_col, target_col, static_features, dropna, keep_last_n, max_horizon, n_windows, window_size)
    311 for model in self.models.keys():
    312     # compute absolute error for each model
    313     cv_results[model] = np.abs(cv_results[model] - cv_results[target_col])
--> 314 return cv_results.drop("y", axis=1)

File ~/opt/anaconda3/envs/thesis/lib/python3.9/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File ~/opt/anaconda3/envs/thesis/lib/python3.9/site-packages/pandas/core/frame.py:5399, in DataFrame.drop(self, labels, axis, index, columns, level, inplace, errors)
   5251 @deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "labels"])
   5252 def drop(  # type: ignore[override]
   5253     self,
   (...)
   5260     errors: IgnoreRaise = "raise",
   5261 ) -> DataFrame | None:
   5262     """
   5263     Drop specified labels from rows or columns.
   5264 
   (...)
   5397             weight  1.0     0.8
   5398     """
-> 5399     return super().drop(
   5400         labels=labels,
   5401         axis=axis,
   5402         index=index,
   5403         columns=columns,
   5404         level=level,
   5405         inplace=inplace,
   5406         errors=errors,
   5407     )

File ~/opt/anaconda3/envs/thesis/lib/python3.9/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File ~/opt/anaconda3/envs/thesis/lib/python3.9/site-packages/pandas/core/generic.py:4505, in NDFrame.drop(self, labels, axis, index, columns, level, inplace, errors)
   4503 for axis, labels in axes.items():
   4504     if labels is not None:
-> 4505         obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   4507 if inplace:
   4508     self._update_inplace(obj)

File ~/opt/anaconda3/envs/thesis/lib/python3.9/site-packages/pandas/core/generic.py:4546, in NDFrame._drop_axis(self, labels, axis, level, errors, only_slice)
   4544         new_axis = axis.drop(labels, level=level, errors=errors)
   4545     else:
-> 4546         new_axis = axis.drop(labels, errors=errors)
   4547     indexer = axis.get_indexer(new_axis)
   4549 # Case for non-unique axis
   4550 else:

File ~/opt/anaconda3/envs/thesis/lib/python3.9/site-packages/pandas/core/indexes/base.py:6934, in Index.drop(self, labels, errors)
   6932 if mask.any():
   6933     if errors != "ignore":
-> 6934         raise KeyError(f"{list(labels[mask])} not found in axis")
   6935     indexer = indexer[~mask]
   6936 return self.delete(indexer)

KeyError: "['y'] not found in axis"

Environment info

Install method (pip, conda, github):

Package version:

Additional information

Having problem using prediction intervals

Thanks for this very useful module!

I have encountered a problem and currently when I try to define a PredictionIntervals class, it doesn't work and reported:

name 'PredictionIntervals' is not defined

I am not sure what is happening. For other functions, it all works for my dataset.

Best,

Log, log1p, and box-cox target transformations

Description

Add Log, Log1p, and Box Cox options for the target_transform argument of MLForecast

Use case

Similar to how Differences and LocalStandardScaler can be added as a target_transform argument of MLForecast() to transform and un-transform the target variable when forecasting, it would be great to have the same thing for Log, Log(x+1) (similar to np.log1p), and Box Cox transforms

Enable support of Cyclic Boosting Machines

Tried using this but getting errors about categorical variables and bins.

fcst = MLForecast(
    models={
        'cbm': cbm.CBM(),
    },
    lags=[],
    date_features=[],
    #target_transforms=[MinMaxScaler()],
    freq='W',
    num_threads=4
)

Error during conformal prediction -- cannot reshape array

Getting ValueError: cannot reshape array of size 65 into shape (3,1,24) with PredictionIntervals(n_windows=3, window_size=24). This is happening with only some datasets. Is this due to CV results not having enough datapoints ?

backtest requires input to be sorted

When using Forecast.backtest the input series must be sorted by id and date, otherwise the training-validation masks aren't computed correctly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.