Bug Report Checklist <li class="

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Are there any additional import statements in your /notebook, or do you only exe

According to the documentation, <a href="https://docs.databricks.com/en/release-notes/

[timeseries] DirectTabular & RecursiveTabular models fail if eval_metric = "WAPE" about autogluon HOT 7 OPEN

Antoine-Schwartz commented on June 1, 2024

[timeseries] DirectTabular & RecursiveTabular models fail if eval_metric = "WAPE"

from autogluon.

Comments (7)

Antoine-Schwartz commented on June 1, 2024

To be a little more precise, I made a loop to test the combinations with all metrics. Here are the results:

SQL with DirectTabular ==> OK
SQL with RecursiveTabular ==> Error
WQL with DirectTabular ==> OK
WQL with RecursiveTabular ==> Error
MAE with DirectTabular ==> Error
MAE with RecursiveTabular ==> Error
MAPE with DirectTabular ==> Error
MAPE with RecursiveTabular ==> Error
MASE with DirectTabular ==> Error
MASE with RecursiveTabular ==> Error
MSE with DirectTabular ==> Error
MSE with RecursiveTabular ==> Error
RMSE with DirectTabular ==> OK
RMSE with RecursiveTabular ==> OK
RMSSE with DirectTabular ==> OK
RMSSE with RecursiveTabular ==> OK
SMAPE with DirectTabular ==> OK
SMAPE with RecursiveTabular ==> OK
WAPE with DirectTabular ==> Error
WAPE with RecursiveTabular ==> Error

Always the same error pattern: Can't pickle <function mean_absolute_percentage_error at 0x7f8fab2b0b80>: it's not the same object as sklearn.metrics._regression.mean_absolute_percentage_error

from autogluon.

shchur commented on June 1, 2024

Hi, thank you for reporting the problem.

Unfortunately, I wasn't able to reproduce it on my machine. In a clean Python 3.10 environment on Linux with AutoGluon installed via pip install autogluon, the code that you shared results in both models train without failures.

Can you please share the output of the following command?

from autogluon.core.utils import show_versions
show_versions()

from autogluon.

Antoine-Schwartz commented on June 1, 2024

Hey @shchur, thanks for your prompt reply.

It's seems to be a conflict with my ML environment : Databricks Runtime 14.3 LTS for ML (same for the 13.3 LTS ML).
I have no problem with the non-ML version, but in this case I don't have access to the GPUs on Databricks...

Here are the versions under 14.3 LTS ML (not working):

INSTALLED VERSIONS
------------------
date                : 2024-02-20
time                : 10:46:37.012192
python              : 3.10.12.final.0
OS                  : Linux
OS-release          : 5.15.0-1052-aws
Version             : #57~20.04.1-Ubuntu SMP Mon Jan 15 17:04:56 UTC 2024
machine             : x86_64
processor           : x86_64
num_cores           : 16
cpu_ram_mb          : 58770.0
cuda version        : 12.535.54.03
num_gpus            : 1
gpu_ram_mb          : [22511]
avail_disk_size_mb  : 10240

async-timeout       : 4.0.3
autogluon           : None
autogluon.common    : 1.0.0
autogluon.core      : 1.0.0
autogluon.features  : 1.0.0
autogluon.tabular   : 1.0.0
autogluon.timeseries: 1.0.0
boto3               : 1.24.59
catboost            : 1.2.2
fastai              : None
gluonts             : 0.14.4
hyperopt            : 0.2.7
imodels             : None
joblib              : 1.2.0
lightgbm            : 4.1.0
lightning           : 2.0.9.post0
matplotlib          : 3.7.0
mlforecast          : 0.10.0
networkx            : 3.2.1
numpy               : 1.23.5
onnxruntime-gpu     : None
orjson              : 3.9.14
pandas              : 2.1.4
psutil              : 5.9.0
pytorch-lightning   : 2.0.9.post0
ray                 : 2.6.3
requests            : 2.28.1
scikit-learn        : 1.4.1.post1
scikit-learn-intelex: None
scipy               : 1.10.0
setuptools          : 65.6.3
skl2onnx            : None
statsforecast       : 1.4.0
statsmodels         : 0.13.5
tabpfn              : None
tensorboard         : 2.14.1
torch               : 2.0.1+cu118
tqdm                : 4.64.1
utilsforecast       : 0.0.10
vowpalwabbit        : None
xgboost             : 1.7.6

Here are the versions under 14.3 LTS (working):

INSTALLED VERSIONS
------------------
date                : 2024-02-20
time                : 10:43:32.424047
python              : 3.10.12.final.0
OS                  : Linux
OS-release          : 5.15.0-1052-aws
Version             : #57~20.04.1-Ubuntu SMP Mon Jan 15 17:04:12 UTC 2024
machine             : aarch64
processor           : aarch64
num_cores           : 32
cpu_ram_mb          : 122340.0
cuda version        : None
num_gpus            : 0
gpu_ram_mb          : []
avail_disk_size_mb  : 10240

async-timeout       : 4.0.3
autogluon           : None
autogluon.common    : 1.0.0
autogluon.core      : 1.0.0
autogluon.features  : 1.0.0
autogluon.tabular   : 1.0.0
autogluon.timeseries: 1.0.0
boto3               : 1.24.59
catboost            : 1.2.2
fastai              : None
gluonts             : 0.14.4
hyperopt            : 0.2.7
imodels             : None
joblib              : 1.2.0
lightgbm            : 4.1.0
lightning           : 2.0.9.post0
matplotlib          : 3.7.0
mlforecast          : 0.10.0
networkx            : 3.2.1
numpy               : 1.26.4
onnxruntime-gpu     : None
orjson              : 3.9.14
pandas              : 2.1.4
psutil              : 5.9.0
pytorch-lightning   : 2.0.9.post0
ray                 : 2.6.3
requests            : 2.28.1
scikit-learn        : 1.4.1.post1
scikit-learn-intelex: None
scipy               : 1.10.0
setuptools          : 65.6.3
skl2onnx            : None
statsforecast       : 1.4.0
statsmodels         : 0.13.5
tabpfn              : None
tensorboard         : 2.16.2
torch               : 2.0.1
tqdm                : 4.66.2
utilsforecast       : 0.0.10
vowpalwabbit        : None
xgboost             : 2.0.3

from autogluon.

shchur commented on June 1, 2024

Are there any additional import statements in your script/notebook, or do you only execute the following 3 lines?

from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor
data = TimeSeriesDataFrame("https://autogluon.s3-us-west-2.amazonaws.com/datasets/timeseries/m4_hourly_tiny/train.csv")
predictor = TimeSeriesPredictor(eval_metric="WAPE").fit(data, hyperparameters={"DirectTabular": {}, "RecursiveTabular": {}})

from autogluon.

Antoine-Schwartz commented on June 1, 2024

I install autogluon.timeseries==1.0.0 via pip, as well as a private utilities package, whose requirements are :

boto3==1.24.59
pandas==1.3.5
freezegun==1.2.2
pyyaml==5.4.1
uritools==4.0.0
pyarrow==8.0.0
fsspec==2022.8.2
chispa==0.9.2
pydeequ==1.0.1

Then I only execute the 3 lines.

from autogluon.

shchur commented on June 1, 2024

According to the documentation, ML Runtime comes with scikit-learn-1.1.1, while the AutoGluon installation installs version scikit-learn-1.4.1post1 (as you showed here #3927 (comment)).

I guess that the following happens (but I'm not sure how to verify that this is the source of the problem, or how to fix it):

scikit-learn v1.1 gets imported
You install autogluon.timeseries, which install scikit-learn v1.4 (probably in a notebook cell with !pip install autogluon.timeseries)
The predictor cannot pickle the metric during training because scikit-learn version changed

I would recommend trying one of the following:

restarting the Jupyter notebook after installing autogluon
running pip install outside of the notebook
executing the code in a .py script rather than a notebook

If that doesn't work, I would recommend using a CPU instance. From my experience, TimeSeriesPredictor may train even faster on a machine with 32 CPU cores and no GPU compared to a machine with a GPU but fewer CPU cores.

from autogluon.

Antoine-Schwartz commented on June 1, 2024

Unfortunately I tried any kind of installation (compute level, notebook level, forcing different versions of sklearn, etc.) and impossible to make the code work with the ML runtime (in notebook).

Otherwise for my experiments with CPU vs GPU, for the deep learning algorithms (this is what works best for my problem), with pytorch and G5 machines on AWS, 32 CPU cores cannot compete at all, it is about 10 times slower (except for RNN-based inference).

from autogluon.

[timeseries] DirectTabular & RecursiveTabular models fail if eval_metric = "WAPE" about autogluon HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent