Comments (5)
The expected behavior would be to use the last row (the recent time-step) of the dataset instead of the first one, and the rest would work as expected. The current workaround to this issue is to pass only single row not df to clearly know what is the input and the expected output.
from mlforecast.
Having same issue.
Created a X and y variable, where y variable is simply a slightly deviated value from X variable.
However the model cannot be fit at all using the external variable.
Again the first value of the training data has been going into the model always in the predict() function.
When using sklearn model for comparison, it fits very well. (Error : close to 0 % , mlforecast: around 30%)
Suggested Change:
In core.py
Replace df to another variable in line 465 and 466 (since it seems to be using the global df variable, and do 'merge' on 'unique_id' and 'ds' ?
@FedericoGarza @jose-moralez what do you think???
To replicate:
import pandas as pd
date_range = pd.date_range(start='2020-01-01', end='2021-01-01', freq='W')
df = pd.DataFrame(date_range, columns=['ds'])
df['unique_id'] = 1
df['X'] = df['ds'].dt.year + np.random.uniform( low = -1, high = 1, size= df.shape[0])*1000
df['y'] = df['X'] * 0.8 + np.random.rand()
cut_off = 20
train_df = df[:-cut_off]
train_df_X = train_df.loc[:,'X']
train_df_Y = train_df[['ds', 'y', 'unique_id']]
test_df = df[-cut_off:]
test_df_X = test_df.loc[:,['X', 'ds','unique_id']]
test_df_Y = test_df[['ds', 'y', 'unique_id']]
df.head()
from statsforecast import StatsForecast
StatsForecast.plot(df)
import lightgbm as lgb
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean
from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression
mlf = MLForecast(
models = [LinearRegression(), lgb.LGBMRegressor()],
#lags=[1, 12],
#lag_transforms={
# 1: [expanding_mean],
# 12: [(rolling_mean, 48)],
#},
freq = 'W',
#date_features = ['week', 'month', 'year']
)
def inspect_input(x):
from IPython.display import display
print('inspect_input')
display(x)
return x
prep_df =mlf.preprocess(train_df, id_col = 'unique_id', time_col = 'ds', target_col = 'y')
mlf.fit(train_df, id_col = 'unique_id', time_col = 'ds', target_col = 'y')
y_hat = mlf.predict(cut_off, dynamic_dfs = [test_df_X], before_predict_callback=inspect_input)
Therefore resulting forecast is a flat one, not being able to get the information from external variable.
Just to compare, if I use a different model from sklearn, it fits well with almost 0 error value.
y_hat= test_df[['ds','unique_id']]
from sklearn.linear_model import Lasso, LinearRegression
clf = LinearRegression()
clf.fit(train_df.loc[:,['X']], train_df.loc[:, 'y'])
y_hat_val = clf.predict(test_df.loc[:, ['X']])
y_hat['lr'] = y_hat_val
from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor()
clf.fit(train_df.loc[:,['X']], train_df.loc[:, 'y'])
y_hat_val = clf.predict(test_df.loc[:,['X']])
y_hat['gb'] = y_hat_val
StatsForecast.plot( test_df_Y, y_hat )
Can also be seen with numbers in case you cannot see the fit well :
Error for the mlforecast model
Error from the sklearn model
from mlforecast.
Hi @iamyihwa, thanks for the thorough example. I believe this is the same as #122, the TLDR is that you have to set static_features=[]
in your fit call, because right now mlforecast is treating X
as a static feature and thus it only repeats its last value forward when predicting.
from mlforecast.
Thanks @jmoralez for your quick answer!
Yes!!! indeed it solves the issue!!
Thanks again to the team for providing great tool and also for the help!!!
from mlforecast.
Closing as it seems we've solved the original issue.
from mlforecast.
Related Issues (20)
- [MLForecast] lag_transforms with different features packages HOT 6
- MLForecast: Core: Add prediction intervals to forecast_fitted_values
- [Core] Saving of the model HOT 20
- [core]Using multiple models can cause `new_x` lag feature shift HOT 2
- [Core]`ts.update` method may not support target_transforms,lag_transforms and date_features HOT 1
- [Model] Distributed version of the model giving Arrow Capacity error HOT 6
- predict() and cross_validation() outputs inconsistency HOT 6
- Dynamic features for training HOT 1
- model.predict encounters an error HOT 3
- Does mlforecast train a single global model or one model per serie? HOT 2
- mlforecast: can the number of exogenous variables be different for different unique_id? HOT 2
- MLforecast does not work with with PyArrow dates HOT 2
- Fcst.predict does not accept X_df with dynamic exogenous variables HOT 2
- [Model] Distributed and Non-distributed version of the models giving different result HOT 8
- [MLForecast: test set eval and early stopping ] HOT 2
- AutoDifferences, AutoSeasonalityAndDifferences result in "AttributeError: 'NoneType' object has no attribute 'AutoDifferences'" HOT 2
- Lag feature: how initial values are treated or populated once the data has been shifted? HOT 2
- [date features]: dayofweek_cat - day of week as a one hot encoding feature HOT 4
- [Core] getting an error module 'coreforecast.lag_transforms' has no attribute 'BaseLagTransform' HOT 3
- [distributed]: allow for .ts.update in DistributedMLForecast HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlforecast.