amazon-science / chronos-forecasting Goto Github PK

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting

Home Page: https://arxiv.org/abs/2403.07815

License: Apache License 2.0

Python 100.00%

forecasting large-language-models llm machine-learning time-series foundation-models pretrained-models time-series-forecasting timeseries artificial-intelligence huggingface huggingface-transformers transformers

chronos-forecasting's Introduction

Chronos: Learning the Language of Time Series

🚀 News

27 June 2024: 🚀 Released datasets used in the paper and an evaluation script to compute the WQL and MASE scores reported in the paper.
17 May 2024: 🐛 Fixed an off-by-one error in bin indices in the output_transform. This simple fix significantly improves the overall performance of Chronos. We will update the results in the next revision on ArXiv.
10 May 2024: 🚀 We added the code for pretraining and fine-tuning Chronos models. You can find it in this folder. We also added a script for generating synthetic time series data from Gaussian processes (KernelSynth; see Section 4.2 in the paper for details). Check out the usage examples.
19 Apr 2024: 🚀 Chronos is now supported on AutoGluon-TimeSeries, the powerful AutoML package for time series forecasting which enables model ensembles, cloud deployments, and much more. Get started with the tutorial.
08 Apr 2024: 🧪 Experimental MLX inference support added. If you have an Apple Silicon Mac, you can now obtain significantly faster forecasts from Chronos compared to CPU inference. This provides an alternative way to exploit the GPU on your Apple Silicon Macs together with the "mps" support in PyTorch.
25 Mar 2024: 🚀 v1.1.0 released with inference optimizations and pipeline.embed to extract encoder embeddings from Chronos.
13 Mar 2024: 🚀 Chronos paper and inference code released.

✨ Introduction

Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes.

For details on Chronos models, training data and procedures, and experimental results, please refer to the paper Chronos: Learning the Language of Time Series.

Fig. 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a predictive distribution.

Architecture

The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters.

Model	Parameters	Based on
chronos-t5-tiny	8M	t5-efficient-tiny
chronos-t5-mini	20M	t5-efficient-mini
chronos-t5-small	46M	t5-efficient-small
chronos-t5-base	200M	t5-efficient-base
chronos-t5-large	710M	t5-efficient-large

Zero-Shot Results

The following figure showcases the remarkable zero-shot performance of Chronos models on 27 datasets against local models, task-specific models and other pretrained models. For details on the evaluation setup and other results, please refer to the paper.

Fig. 2: Performance of different models on Benchmark II, comprising 27 datasets not seen by Chronos models during training. This benchmark provides insights into the zero-shot performance of Chronos models against local statistical models, which fit parameters individually for each time series, task-specific models trained on each task, and pretrained models trained on a large corpus of time series. Pretrained Models (Other) indicates that some (or all) of the datasets in Benchmark II may have been in the training corpus of these models. The probabilistic (WQL) and point (MASE) forecasting metrics were normalized using the scores of the Seasonal Naive baseline and aggregated through a geometric mean to obtain the Agg. Relative WQL and MASE, respectively.

📈 Usage

To perform inference with Chronos models, install this package by running:

pip install git+https://github.com/amazon-science/chronos-forecasting.git

Tip

The recommended way of using Chronos for production use cases is through AutoGluon, which features ensembling with other statistical and machine learning models for time series forecasting as well as seamless deployments on AWS with SageMaker 🧠. Check out the AutoGluon Chronos tutorial.

Forecasting

A minimal example showing how to perform forecasting using Chronos models:

import pandas as pd  # requires: pip install pandas
import torch
from chronos import ChronosPipeline

pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-small",
    device_map="cuda",  # use "cpu" for CPU inference and "mps" for Apple Silicon
    torch_dtype=torch.bfloat16,
)

df = pd.read_csv("https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv")

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
# forecast shape: [num_series, num_samples, prediction_length]
forecast = pipeline.predict(
    context=torch.tensor(df["#Passengers"]),
    prediction_length=12,
    num_samples=20,
)

More options for pipeline.predict can be found with:

print(ChronosPipeline.predict.__doc__)

We can now visualize the forecast:

import matplotlib.pyplot as plt  # requires: pip install matplotlib
import numpy as np

forecast_index = range(len(df), len(df) + 12)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)

plt.figure(figsize=(8, 4))
plt.plot(df["#Passengers"], color="royalblue", label="historical data")
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(forecast_index, low, high, color="tomato", alpha=0.3, label="80% prediction interval")
plt.legend()
plt.grid()
plt.show()

Extracting Encoder Embeddings

A minimal example showing how to extract encoder embeddings from Chronos models:

import pandas as pd
import torch
from chronos import ChronosPipeline

pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-small",
    device_map="cuda",
    torch_dtype=torch.bfloat16,
)

df = pd.read_csv("https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv")

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
context = torch.tensor(df["#Passengers"])
embeddings, tokenizer_state = pipeline.embed(context)

Pretraining, fine-tuning and evaluation

Scripts for pretraining, fine-tuning and evaluating Chronos models can be found in this folder.

💾 Datasets

Datasets used in the Chronos paper for pretraining and evaluation (both in-domain and zero-shot) are available through the HuggingFace repos: autogluon/chronos_datasets and autogluon/chronos_datasets_extra. Check out these repos for instructions on how to download and use the datasets.

🔥 Coverage

Adapting language model architectures for time series forecasting (Amazon Science blog post)
Amazon AI Researchers Introduce Chronos: A New Machine Learning Framework for Pretrained Probabilistic Time Series Models (Marktechpost blog post)
Chronos: The Rise of Foundation Models for Time Series Forecasting (Towards Data Science blog post by Luís Roque and Rafael Guedes)
Moirai: Time Series Foundation Models for Universal Forecasting (Towards Data Science blog post by Luís Roque and Rafael Guedes, includes comparison of Chronos with Moirai)
Chronos: The Latest Time Series Forecasting Foundation Model by Amazon (Towards Data Science blog post by Marco Peixeiro)
- The original article had a critical bug affecting the metric computation for Chronos. We opened a pull request to fix it.
How to Effectively Forecast Time Series with Amazon's New Time Series Forecasting Model (Towards Data Science blog post by Eivind Kjosbakken)
Chronos: Learning the Language of Time Series (Minimize Regret blog post by Tim Radtke)
Chronos: Another Zero-Shot Time Series Forecaster LLM (Level Up Coding blog post by Level Up Coding AI TutorMaster)
Paper Review: Chronos: Learning the Language of Time Series (Review by Andrey Lukyanenko)
Foundation Models for Forecasting: the Future or Folly? (Blog post by Radix)
Learning the Language of Time Series with Chronos (Medium post by Manuele Caddeo)
The latest advancement in Time Series Forecasting from AWS: Chronos (Medium post by Abish Pius)
Decoding the Future: How Chronos Redefines Time Series Forecasting with the Art of Language (Medium post by Zamal)
Comparison of Chronos against the SCUM ensemble of statistical models (Benchmark by Nixtla)
- We opened a pull request extending the analysis to 28 datasets (200K+ time series) and showing that zero-shot Chronos models perform comparably to this strong ensemble of 4 statistical models while being significantly faster on average. Our complete response can be found here.
Comparison of Chronos against a variety of forecasting models (Benchmark by ReadyTensor)

📝 Citation

If you find Chronos models useful for your research, please consider citing the associated paper:

@article{ansari2024chronos,
  author  = {Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Syndar and Pineda Arango, Sebastian and Kapoor, Shubham and Zschiegner, Jasper and Maddix, Danielle C. and Wang, Hao and Mahoney, Michael W. and Torkkola, Kari and Gordon Wilson, Andrew and Bohlke-Schneider, Michael and Wang, Yuyang},
  title   = {Chronos: Learning the Language of Time Series},
  journal = {arXiv preprint arXiv:2403.07815},
  year    = {2024}
}

🛡️ Security

See CONTRIBUTING for more information.

📃 License

This project is licensed under the Apache-2.0 License.

chronos-forecasting's People

Contributors

Stargazers

Watchers

Forkers

sebastianpinedaar surfcao os01 zezhishao canerturkmen lostella mivanovitch abdulfatir kashif louisbrulenaudet kage08 mitchell-xiyunfeng valeman gauravsangwan deepuonthemove shuowang-ai ssrisunt akashrajmani19 hesam7711 eogns282 notenumber1captail haemate63 nicsysca69 unlimitorbe-tacticusal truelasting-lowaller q-sushydr certrise-h sleeventi-d kiwise-k bloodsoldiscoverahon ionidad16 h-hiroroll maxroladyin dubyapi94eyemucu cdissensli dullnes-colerim istorywar yhopwator portestu48 chirpik75 sevarica30 logiclent50webexotic slipkray-z arani-k drievertnekowolfie x-parmatr eltociear gsuabinnow experters darlinedannis xrinairgi gossipak76 readerenesgotiz mediant-f tonyonst56 datumradix farerthebesthulkferdy rezabehnoud moryix whoamiwo iamleon121 misterypoem sporksenet-horatorbr 88-shoesrox medisean ararat02 claworns-w targetcoops-94 farhadrafshar saradiiba1 nirlepadhikari rexa302 clarysf ysfyf alxsbr2411 joshdayax kotamadelin facubingpodhani buferine-i crawlerty59 kroolspice-haemate ysfadlaa minmin2411 hutansilon sungaiglasis gunungtravia rightpop-j codedojokapa hamedmoo peachninja-shadesdogg gfluentie y-gotiz vanzue will7455 yfkone kuriamaingi shubhampachori12110095 biolababs qxzsilver1 hhy5277

chronos-forecasting's Issues

HuggingFace interface is broken

Not sure if this is the right place to open an issue, but the HuggingFace interface for using Amazon's Chronos models for time-series forecasting seems to be broken. Using the official code to import a model:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("time-series-forecasting", model="amazon/chronos-t5-large")

KeyError Traceback (most recent call last)
Cell In[3], line 4
1 # Use a pipeline as a high-level helper
2 from transformers import pipeline
----> 4 pipe = pipeline("time-series-forecasting", model="amazon/chronos-t5-large")

File ~/anaconda3/envs/BS5E_data_science/lib/python3.9/site-packages/transformers/pipelines/init.py:859, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
852 pipeline_class = get_class_from_dynamic_module(
853 class_ref,
854 model,
855 code_revision=code_revision,
856 **hub_kwargs,
857 )
858 else:
--> 859 normalized_task, targeted_task, task_options = check_task(task)
860 if pipeline_class is None:
861 pipeline_class = targeted_task["impl"]

File ~/anaconda3/envs/BS5E_data_science/lib/python3.9/site-packages/transformers/pipelines/init.py:543, in check_task(task)
498 def check_task(task: str) -> Tuple[str, Dict, Any]:
499 """
500 Checks an incoming task string, to validate it's correct and return the default Pipeline and Model classes, and
501 default models if they exist.
(...)
541
542 """
--> 543 return PIPELINE_REGISTRY.check_task(task)

File ~/anaconda3/envs/BS5E_data_science/lib/python3.9/site-packages/transformers/pipelines/base.py:1281, in PipelineRegistry.check_task(self, task)
1278 return task, targeted_task, (tokens[1], tokens[3])
1279 raise KeyError(f"Invalid translation task {task}, use 'translation_XX_to_YY' format")
-> 1281 raise KeyError(
1282 f"Unknown task {task}, available tasks are {self.get_supported_tasks() + ['translation_XX_to_YY']}"
1283 )

KeyError: "Unknown task time-series-forecasting, available tasks are ['audio-classification', 'automatic-speech-recognition', 'conversational', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-feature-extraction', 'image-segmentation', 'image-to-image', 'image-to-text', 'mask-generation', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'summarization', 'table-question-answering', 'text-classification', 'text-generation', 'text-to-audio', 'text-to-speech', 'text2text-generation', 'token-classification', 'translation', 'video-classification', 'visual-question-answering', 'vqa', 'zero-shot-audio-classification', 'zero-shot-classification', 'zero-shot-image-classification', 'zero-shot-object-detection', 'translation_XX_to_YY']"

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("amazon/chronos-t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained("amazon/chronos-t5-large")

OSError Traceback (most recent call last)
Cell In[2], line 4
1 # Load model directly
2 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
----> 4 tokenizer = AutoTokenizer.from_pretrained("amazon/chronos-t5-large")
5 model = AutoModelForSeq2SeqLM.from_pretrained("amazon/chronos-t5-large")

File ~/anaconda3/envs/BS5E_data_science/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py:855, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
853 tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
854 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
--> 855 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
856 else:
857 if tokenizer_class_py is not None:

File ~/anaconda3/envs/BS5E_data_science/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:2070, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
2064 logger.info(
2065 f"Can't load following files from cache: {unresolved_files} and cannot check if these "
2066 "files are necessary for the tokenizer to operate."
2067 )
2069 if all(full_file_name is None for full_file_name in resolved_vocab_files.values()):
-> 2070 raise EnvironmentError(
2071 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
2072 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "
2073 f"Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a directory "
2074 f"containing all relevant files for a {cls.name} tokenizer."
2075 )
2077 for file_id, file_path in vocab_files.items():
2078 if file_id not in resolved_vocab_files:

OSError: Can't load tokenizer for 'amazon/chronos-t5-large'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'amazon/chronos-t5-large' is the correct path to a directory containing all relevant files for a T5TokenizerFast tokenizer.

I tried running this on my local computer with Linux Mint as well as in AWS SageMaker instances, all attempts have failed.

How to finetune on custom loss function?

Hi, I wanted to fine tune the model on my own dataset however with my own custom loss. Could you give an example on how to do that? It would be very helpful for my research purpose! I am unclear on how to do that on your model

Thanks Gopal

How to implement KernelSynth

I am interested in the implementation of KernelSynth, I wonder if you will provide the code, thanks!

FineTuning input dimensions for clarity

Hello there,
so just for me and the others to avoid wrong data formatting into the finetuning script what should be the dimensions when serializing into the

    dataset = [
        {"start": start, "target": ts} for ts, start in zip(time_series, start_times)
    ]
    ArrowWriter(compression=compression).write_to_file(
        dataset,
        path=path,
    )

so if i use contextlen=512 and pred_len=64 with numtimeseries=100;
the 'start' variable should be array and have len of 100 where each element is datetime64 type and is telling us what is the starting point of the ith sequence but its corresponding to the ith sequence in the ts array, where each element in the ts array should be array of length 'contextlen' + 'pred_len' ?

my additional question would be is there a way to setup the timestep so the model knows the timegaps between datapoints if its 10min one tick or 5min

Add support for multi-GPU inference

related autogluon/autogluon#4173

How to generate forecasts with `prediction_length > 64`?

Hi,

I have time data and split to train and test (keep it unseen) by slicing the df from the end part. I used your pipeline over data_train and tried to forecast as length as data_test unsuccessfully as below :

#-----------------------------------------------------------
# Libs
#-----------------------------------------------------------
# for plotting, run: pip install pandas matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from chronos import ChronosPipeline

#-----------------------------------------------------------
# LOAD THE DATASET
#-----------------------------------------------------------

df = pd.read_csv('https://raw.githubusercontent.com/amcs1729/Predicting-cloud-CPU-usage-on-Azure-data/master/azure.csv')
df['timestamp'] =  pd.to_datetime(df['timestamp'])
data = df.rename(columns={'min cpu': 'min_cpu',
                          'max cpu': 'max_cpu',
                          'avg cpu': 'avg_cpu',})



# Data preparation
# ==============================================================================
sliced_df = data[['timestamp', 'avg_cpu']]

# Convert data from Hz to MHz
# ==============================================================================
sliced_df['avg_cpu_Mhz'] = sliced_df['avg_cpu'] / 1000000
sliced_df

# Configuration
# ==============================================================================
name_columns='avg_cpu_Mhz'
lags=288
steps=288
n_backtest=3

step_size = steps * n_backtest
data_train = sliced_df[:-step_size]
data_test  = sliced_df[-step_size:] #unseen

# Pipeline
# ==============================================================================
pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-small",
    device_map="cuda",
    torch_dtype=torch.bfloat16,
)

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
context = torch.tensor(data_train['avg_cpu_Mhz'])
prediction_length = 64 #len(data_test) #12

forecast = pipeline.predict(
    context,
    prediction_length,
    num_samples=288, #20,
    temperature=1.0,
    top_k=50,
    top_p=1.0,
) # forecast shape: [num_series, num_samples, prediction_length]

but results is as follow:

# visualize the forecast
forecast_index = range(len(data_train), len(data_train) + prediction_length)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)

plt.figure(figsize=(8, 4))
plt.plot(data_train['avg_cpu_Mhz'], color="royalblue", label="historical train data")
plt.plot(data_test['avg_cpu_Mhz'] , color="navy",      label="historical test data", linestyle='dashed')
plt.plot(forecast_index, median,    color="tomato",    label="median forecast")
plt.fill_between(forecast_index, low, high, color="tomato", alpha=0.3, label="80% prediction interval")

plt.title('Chronos forecast result')
plt.ylabel(' CPU usage [MHz]',   fontsize=15)
plt.xlabel('Timestamp', fontsize=15)
plt.legend()
plt.grid()
plt.show()

How I can configure the arguments within predict() class to have forecast autoregressive over unseen data_test ?
why prediction_length recommended to be <=64 ?

How to increase the context length?

Thanks for your great work. Chronos gets good performance on our own dataset by zero-shot prediction. But it seems that Chronos is unable to capture patterns of long periods, for example, I have a feature with a cycle of weeks, and my data is 15 minutes per step, with a sequence length of 96 * 7=602 for a single cycle, which exceeds the context_length of the model. Is there any way for the model to capture such periodic features, or can I only retrain the model?

How would you include exogenous variables (covariates)?

I was wondering if it is possible to add exogenous variables as extra features to use in the model. Cheers

PyPI pacakge

A PyPI package would be great to have though it would be problematic at present as there already exists a chronos-forecast package which installs into the chronos namespace:

Can performance on a small dataset become worse after fine-tuning?

After fine-tuning on a small data set, the performance on this data set is worse. Is this normal? What are the possible reasons?

How to reproduce the results as shown in the paper?

Hi chronos team,

Thanks for the great work! I would like to know how we can reproduce the results as shown in the paper, e.g., Figure 4. Also could we have some evaluation scripts/code to facilitate the model evaluation?

I am aware that some code snippets are provided at #75. But as mentioned "While many datasets in GluonTS have the same name as the ones used in the paper, they may be different from the evaluation in the paper in crucial aspects such as prediction length and number of rolls.", I wonder if we can have scripts to help us reproduce the results.

Training and fine tuning protocols

In Chronos paper, training is implemented with fixed step number_("The models were optimized for 200K steps using the AdamW optimizer with a weight decay of 0.01. The
learning rate was annealed linearly from its initial value of 0.001 to 0 over the training steps")_. what's the logic behind this configration? Since there is no downstream task fine tuning like LLM counterpart, how is it supposed to avoid overfitting? Are there some heuristics like the step number is equal to 1 or 2 epochs?
Also, the fine tuning is implemented in a dataset-agnostic fashion with an initial learning rate of 0.001, annealed
linearly to 0 over 1000 steps, what are the insights behind it?

Free Open Source GitHub Security Engineer plugin

Pixeebot fixes vulnerabilities, hardens code, squashes bugs, and gives engineers more time to focus on the work that counts.

More Details:
https://github.com/apps/pixeebot

ENH: Add Functionality to extract embeddings

Summary

This issue is a suggestion for adding functionality to Chronos that allows users to extract embeddings directly. This feature would greatly enhance the model's utility by enabling more detailed analysis and application in various forecasting tasks.

How to do missing value imputation?

The model performs excellent in forecasting! Can it also do imputation like the original T5?

How to do inference without connecting to HuggingFace?

The connection between my server and Hugging Face is not very smooth. I have downloaded the model weights. I would like to know if it is possible to close the connection to Hugging Face before calling Chronos, it often takes a lot of time and may fail. Thanks!

Training with past X and Y to Predict Y and Forecasting with X for next 6 months to predict Y

Is it possible to train model with past X(Time Period) and Y(Dependent) data to Predict Z and then forecast it with future Y(Dependent data available for next 6 months) to predict Z for those 6 months with X(Time Period) and Y(Dependent Data) data?

How to do model fine tuning?

Really cool project! Enjoy the paper and have had fun testing it out. Will instructions on fine tuning be released?

Thanks for your time

basic inference packages setup -> finetuning setup

I would like to ask if i already have chronos conda environment for inference setup, can i just run the pip install "chronos[training] @ git+https://github.com/amazon-science/chronos-forecasting.git" inside of my already existing conda env as it will just install a few missing packages? or is it advised to have 2 separate environments?

What is the recommended `torch_dtype`?

Hello there,
what would you recommend as the best torch_dtype param??
Given the tradeoffs??
Or was the model trained only using the bfloat16??
Thanks for the answer.

Prediction changes for every run

Hi, I was trying to fine tuning the hyperparameters and every time I run the .predict, it gives me different predictions (sometimes converges very good, sometimes very bad). Any way to solve this Issue?

Thank you!

Inference speed worse on AMD CPU than on Intel CPU

i test chronos with intel core cpu(mac pro), linux with intel cpu(server), and linux with amd(server) on same code. it seems amd cpu has ~30x worse in inference time.

in intel cpu it approximate cost 0.7s with batch_num = 1, predict_len = 1, context_len = 70.
however in AMD, it about 30s.

i don't know it's my specific case. but i found some one said turn on AMP in AMD CPU by using auto_cast to bfloat16 would case decresing performance. Bfloat16 CPU inference speed is too slow on AMD cpu

i'm quite a newbie in torch. so if someone find a solution, please post here. thx

GPU memory usage

Thank you for your contributions with this library!

Quick question on GPU memory usage. I haven't examined the underlying library code in-depth, but I'm noticing a more-than-linear increase in GPU memory usage with the number of samples that are requested.

I'm seeing the large model at bfloat16 taking up about 1.5GB in GPU memory which is what I was expecting based on the T5 documentation.
With a 4000 element time series and num_samples=100, I'm seeing my GPU memory usage increase to 7.5GB.
Doubling the num_samples to 200, increases the memory usage to over 17GB.

Just curious if you might be able to share more information surrounding GPU memory usage and any best practices on managing.

Thanks!

How to use the pre-trained or fine-tuned model for high-frequency and long-term data?

Hello, I am interested in using this model for predicting high-frequency (1s) and long-term (1e6 to 5e7 s) data. I fine-tuned the chronos-t5-mini model with the configuration below:

training_data_paths:
- "<path_to_my_arrow_file>"
probability:
- 1.0
context_length: 512
prediction_length: 64
min_past: 60
max_steps: 200_000
save_steps: 100_000
log_steps: 500
per_device_train_batch_size: 32
learning_rate: 0.001
optim: adamw_torch_fused
num_samples: 20
shuffle_buffer_length: 100_000
gradient_accumulation_steps: 1
model_id: google/t5-efficient-mini
model_type: seq2seq
random_init: true
tie_embeddings: true
output_dir: ./output/
tf32: true
torch_compile: true
tokenizer_class: "MeanScaleUniformBins"
tokenizer_kwargs:
  low_limit: -15.0
  high_limit: 15.0
n_tokens: 4096
lr_scheduler_type: linear
warmup_ratio: 0.0
dataloader_num_workers: 1
max_missing_prop: 0.9
use_eos_token: true

As the model is suggested to predict 64 timesteps at most every time, I made the model predict 64 steps and then used the predictions as context and asked for the next 64 predictions. I found that the predictions performed quite well in the first 6 rounds. Since the 7th round, the amplitude of the predictions dropped largely and the predictions converged to 0 as the plot shown below. Even though the model can perform quite well until around 1000 steps, which is far from the length that I need, I would like to ask if you have tested any case like this or do you have any suggestions? I have thought about fine-tuning the model with large context and prediction length. But it cannot solve the fundamental problem due to the limitation of the GPU memory.

How to set num_samples, top_k, top_p and temperature?

Opening this as a FAQ.

num_samples specifies the number of sample forecast paths that will be generated to construct the probabilistic forecast. We used 20 samples in the paper for all models. Using a larger value will improve the estimation of the quantiles, likely resulting in better probabilistic metrics.
top_k, top_p and temperature mean the same thing as in the case of LLMs. Check out transformers documentation for details. We used the defaults from the transformers library and did not tune these parameters. However, one might improve the accuracy of the forecasts further by carefully selecting these parameters (Let us know if you do that and have some insights!). Particularly, selecting a larger value for top_k may lead to a better coverage for certain time series. In our qualitative analysis (e.g., Fig. 12 in the paper), we set top_k = vocab_size which led to better prediction intervals than the default value of 50.

More details on input shapes?

From the docs:

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
# forecast shape: [num_series, num_samples, prediction_length]

Where could additional details on the interpretation and limitations of the inputs be found? Tangentially, replying to @abdulfatir in #13

If you have specific multivariate use cases/datasets to share with us, please do. It will helpful for us to understand the types of practical multivariate problems.

Suppose I want to model raster time-series for rainfall with this toy example:

import matplotlib.pyplot as plt
import numpy as np

# Create three 3x3 rasters to mimic rainfall over time
raster1 = np.array([[0, 2, 5], [7, 8, 1], [3, 6, 4]])
raster2 = np.array([[1, 4, 7], [5, 2, 3], [8, 6, 0]])
raster3 = np.array([[2, 5, 8], [6, 3, 7], [1, 4, 9]])

# Set up the subplots with shared axes
fig, axs = plt.subplots(1, 3, figsize=(12, 4), sharex=True, sharey=True)

# Plot the rasters
cax1 = axs[0].matshow(raster1, cmap="Blues")
cax2 = axs[1].matshow(raster2, cmap="Blues")
cax3 = axs[2].matshow(raster3, cmap="Blues")

# Add colorbars
fig.colorbar(cax1, ax=axs[0])
fig.colorbar(cax2, ax=axs[1])
fig.colorbar(cax3, ax=axs[2])

# Set titles
axs[0].set_title("Rainfall Day 1")
axs[1].set_title("Rainfall Day 2")
axs[2].set_title("Rainfall Day 3")

# Display the plot
plt.tight_layout()
plt.show()

Could these be modeled with the ... list of 1D tensors from the docs, by flattening? If so, how can each 1D tensor in the list be interpreted as? Or is this not a valid use case? I have so far tried:

Flattening the list of tensors into one 1D tensor

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
my_data = np.array([raster1, raster2, raster3]).flatten()
context = torch.tensor(my_data)
prediction_length = 9
forecast = pipeline.predict(
    context, prediction_length
)  # shape [num_series, num_samples, prediction_length]
# visualize the forecast
forecast_index = range(len(my_data), len(my_data) + prediction_length)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
plt.figure(figsize=(8, 4))
plt.plot(my_data, color="royalblue", label="historical data")
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(
    forecast_index,
    low,
    high,
    color="tomato",
    alpha=0.3,
    label="80% prediction interval",
)
plt.legend()
plt.grid()
plt.show()

Or if each raster is flattened into a list of 1D tensors is this a more appropriate representation to model? Visually this looks incorrect to me.

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
my_data = np.array([raster1.flatten(), raster2.flatten(), raster3.flatten()]) # <-- ONLY DIF IS HERE
context = torch.tensor(my_data)
prediction_length = 9
forecast = pipeline.predict(
    context, prediction_length
)  # shape [num_series, num_samples, prediction_length]
# visualize the forecast
forecast_index = range(len(my_data), len(my_data) + prediction_length)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
plt.figure(figsize=(8, 4))
plt.plot(my_data, color="royalblue", label="historical data")
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(
    forecast_index,
    low,
    high,
    color="tomato",
    alpha=0.3,
    label="80% prediction interval",
)
plt.legend()
plt.grid()
plt.show()

About KERNEL_BANK in scripts\kernel-synth.py

Are there any specific rules for the list of kernel functions in KERNEL_BANK, or are random combinations of kernel functions acceptable just to increase diversity?
Can I use other functions in gpytorch.kernels instead?
Looking forward to your reply, thank you~

If the history context is larger than 512

Does chronos truncate the history context if it is larger than 512? If it will, can I turn off this limitation.

unable to tokenize text data

can you share the sample code to tokenize the text data for forecasting using chronos pipeline

About the T5 Architecture

In my experiments, I have found that Chronos' inference time is significantly related to the prediction length, and not so much to the historical context length. I don't know much about NLP. I'm curious if T5 is an autoregressive architecture similar to GPT, where it has to generate sequentially one by one, or if it can output all the values at once in parallel (with the help of mask). Thanks!

How to use it for multivariate forecasting?

ValueError: `decoder_start_token_id` or `bos_token_id` has to be defined for encoder-decoder generation, cant get Chronos to run today

Yesterday Chronos was working great, today for some reason its throwing this error, even with the sample code with the airline passenger data:

ValueError: decoder_start_token_id or bos_token_id has to be defined for encoder-decoder generation

Chronos is throwing the same error for both AWS Sagemaker in Jupyter Lab and also ML-Azure, so the error seems to be platform independent.

thanks,

What is the size of embeddings?

I'm interested in using embeddings generated by Chronos for training a downstream anomaly detection model. For these models, if I use your sample example for generation of embeddings, I get 144 embedding vectors, which is the same length of time series in the example you provide. However, with my test data case, I have a time series of length 300000, and when I run that through your model for embedding generation I end up with 512 embedding vectors. Is there an upper bound of time series length that I should be using with this model, or is this expected output? Thanks so much!

is it working with multivariate time series with both categorical and continues values

is it working with multivariate time series with both categorical and continues values ?

Add support for MLX (Apple Silicon)

Should we add support for MLX? This would make inference faster for people with Apple Silicon Macs. Let's discuss.

Some discussion here: ml-explore/mlx-examples#589

cc @sugatoray

How to scale binned predictor value back to real raw value scala.

I have a question about scaling back to the raw data scale in the paper. I've read both the paper and the reference paper on quantization of time series, but I'm still unsure. I understand that we transformed the data and then binned it, the the LLM will predict a bin index. Then my question is, how do we scale it back to the raw data scale? Do we use the mid-point value of the bin? I'm concerned that this approach may introduce a resolution issue in the results.
Thanks!

Adjusting training script for early stopping callback

Hello there,
so i made these adjustments to the train.py
added to my config and the script main():

eval_data_paths: str,
patience: int = 20,
frequency: str = "5min"

eval_datasets = [
    Filter(
        partial(
            has_enough_observations,
            min_length=min_past + prediction_length,
            max_missing_prop=max_missing_prop,
        ),
        FileDataset(path=Path(data_path), freq=frequency),
    )
    for data_path in eval_data_paths
]

eval_dataset = ChronosDataset(
        datasets=eval_datasets,
        probabilities=probability,
        tokenizer=chronos_config.create_tokenizer(),
        context_length=context_length,
        prediction_length=prediction_length,
        min_past=min_past,
        mode="validation",
    )

inside the training_args i put

load_best_model_at_end=True,
metric_for_best_model="eval_loss", 
greater_is_better=False,

and inside the Trainer i put

logging_strategy="epoch",
save_strategy="epoch",
eval_strategy="epoch",
eval_dataset=eval_dataset
callbacks=[EarlyStoppingCallback(early_stopping_patience=patience)]

i tried finetuing the 'small' model i put 1K epochs, however i am not quite sure it had any effect as the training ran for full 1K epochs and from serialization i got checkpoint-1000 and also checkpoint-final as outputs of that run

Here i would like to ask, if i set number of epochs to 1K the checkpoint-final contains the best model over the whole training session and the checkpoint-1000 are weights after all the 1K epochs i assume right?

Multiple GPU inference

I am new to huggingface but how would I be able to do multiple GPU inference?

Can it be done there https://huggingface.co/docs/diffusers/en/training/distributed_inference?

Predictions change with batch size of context

First of all thanks for this great pretrained model!

I am however facing an issue wrt the consistency of the model's predictions with inputs of different batch sizes.
Since this is essentially a univariate model, I expected it to have same predictions for the same input context for each time-series irrespective of the batch size.

Some context:
My team is experimenting with Chronos to predict the yearly demand (365 days) for certain products within our company.
I have fine-tuned the model with our internal data to have a context_length of 512 and prediction_length of 365. Now I am evaluating the performance of this model for our use-case.

To reproduce:
(Sorry, cannot share the input data due to company policies, but including a scaled visualization of the context and forecast.)
I create two batched inputs, one of size 8 $B_1$ the other of size 32 $B_2$. Here, $B_1 \subset B_2$ and I focus on the forecasts for one time-series $T_1$ contained in both the batches.

To produce the forecast I use (replacing the context (shape: [batch_size, 512]) with $B_1$ and $B_2$):

with torch.no_grad():
    transformers.set_seed(seed=seed)
    forecast = pipeline.predict(
        context,
        prediction_length=365,
        num_samples=20,
        temperature=2.0,
        top_k=200,
        top_p=1.0
    )
    low, median, high = np.quantile(forecast.numpy(), [0.1, 0.5, 0.9], axis=1)

The issue is that the predicted values (median) for $T_1$ from $B_1$ and $B_2$ are different but the input for this series is the same in both the batches.
Showing the visualization of the context and predicted values using different batch size below.

Is there anything I am missing here or is this expected?
Thanks!

What is the loss funciton used by the model?

Hello there,
I would like to ask about the loss function given i wanted to create my own loss function lets say moving average MASE as loss function to the model, everything is there except the outputs of the model contains: loss, logits and other stuff but there are no predicted values directly,
so is there a way to use the tokenizer used for creating the input_ids, labels and attention_mask to somehow turn the logits into the predicted values, so the reverse operation?

Fine-tuning on a single time series

Hello,

I only have a single time series, and I want to do forecasting on it. Does it make sense to do fine-tuning in this case?

I was thinking maybe I could split the data chronologically (use data from 2020 to 2022 for training and data from 2022 to 2024 for testing), but I'm not sure if that makes sense.

Thanks

Pretraining dataset and recipe

Hello, would it be possible to also release the pretraining dataset ( used for TSmixup), and maybe a mention of a successful training recipe.

I would like to try to pretrain from scratch as well, and also extending the vocabulary size wayyyyy more, and with a custom dataset.

Why was there a number of tokens reduction for these chronos models compared to the t5 models?

Hello there,
I would like to ask why was there the reduction to only 4096 params from the model it was built from?
And if i have the compute wouldnt I be better of using the original model chronos was based on, given the number of tokens?
However i am guessing it would just be an empty model right, but the pro would be i could use covariates perhaps?
Thanks for answering.

How to evaluate the models' performance through metrics such as MASE?

Hi Chronos team--

Howdy!! I'm a PhD student in the States and I'm using this as a baseline for my research... thanks for building this model :)

I'm currently implementing evaluation metrics like in the paper to work for the Chronos model, and I'm starting with MASE. One thing that's unclear to me at the moment: in Appendix D in the arXiv preprint, the authors say that the MASE computation involves some seasonality parameter $S$ from in the seasonal naive model.

What seasonality parameter should I use to obtain metrics similar to how the authors did it in the paper? In other scenarios, I've seen that some people try to automatically compute a seasonality S for each dataset; I've also seen people use information about the original dataset to select $S$ (e.g. if it's a taxi dataset with hourly counts, then choosing $S=7*24$ would be a reasonable heuristic); and I've seen other people just use $S=1$, but that to me seems like a "seasonal very naive model".

Thanks in advance for your help!!

Cheers
Nate

How to perform inference on large datasets?

Opening this as a FAQ.

The pipeline.predict interface accepts either a 1D/2D tensor or a list of tensors. If you want to perform inference on a large dataset, you can either:

Send batches of shape [batch_size, context_length] to the predict function in a loop over batches in your dataset. Note: you would need to pad the time series with torch.nan on the left, if they don't have the same length.
(Easier) Send lists of tensors of length batch_size to the predict function in a loop over batches in your dataset. No need to pad here, it will be done internally.

If you're running OOM, decrease the batch_size.

How to do time series classification?

Hi all,

Thanks for open sourcing this library.

I am working on the task of classifying numeric, multivariate series. I wanted to know how I use chronos to achieve that?

Thanks!

How to utilize META information for forecasting

It seems that chronos only supports uni-variate input, but if there is additional information (meta information like event) at a future point in time that we want to predict, can we utilize it?

Use efficient implementation of attention

I am wondering what's the best way to use efficient implementations of attention. PyTorch provides the experimental torch.nn.functional.scaled_dot_product_attention (SDPA) which supports three different implementations, including flash attention. Unfortunately, we cannot use flash attention because it doesn't support arbitrary attention masks yet (something which is critical for Chronos). It's not clear when attention mask support will be added to flash attention (see Dao-AILab/flash-attention#840). Meanwhile, SDPA falls back to another efficient implementation when a mask is provided.

I monkey patched the T5Attention implementation in transformers and here are the results (script below).

Results

TL;DR: SDPA is clearly faster than the implementation in transformers that we're currently using, even without flash attention.

V100 (float32)

Note: V100 doesn't support bfloat16, so SDPA won't work with bf16 because the custom kernels won't exist.

Using transformers (current version):

                         MASE[0.5]  mean_weighted_sum_quantile_loss  inference_time
model                                                                              
amazon/chronos-t5-base    0.907140                         0.029036      108.118132
amazon/chronos-t5-large   0.950026                         0.021954      313.266375
amazon/chronos-t5-mini    0.874078                         0.024838       21.096206
amazon/chronos-t5-small   0.858876                         0.026758       31.885802
amazon/chronos-t5-tiny    1.001285                         0.029381       11.453301

Using SDPA:

                         MASE[0.5]  mean_weighted_sum_quantile_loss  inference_time
model                                                                              
amazon/chronos-t5-base    0.906459                         0.028953       92.497118
amazon/chronos-t5-large   0.943967                         0.021321      278.541993
amazon/chronos-t5-mini    0.867597                         0.026133       17.471496
amazon/chronos-t5-small   0.861364                         0.026423       26.355608
amazon/chronos-t5-tiny    0.983139                         0.028681        9.756106

A100 (float32)

Using transformers (current version):

                         MASE[0.5]  mean_weighted_sum_quantile_loss  inference_time
model                                                                              
amazon/chronos-t5-base    0.907520                         0.029853       76.029036
amazon/chronos-t5-large   0.938383                         0.021884      217.341671
amazon/chronos-t5-mini    0.875678                         0.025812       13.985228
amazon/chronos-t5-small   0.860030                         0.025327       20.903673
amazon/chronos-t5-tiny    0.984638                         0.029327        8.722677

Using SDPA:

                         MASE[0.5]  mean_weighted_sum_quantile_loss  inference_time
model                                                                              
amazon/chronos-t5-base    0.901114                         0.029077       63.078673
amazon/chronos-t5-large   0.944282                         0.022607      185.249409
amazon/chronos-t5-mini    0.870160                         0.026177       11.738740
amazon/chronos-t5-small   0.850184                         0.026167       18.250515
amazon/chronos-t5-tiny    0.975677                         0.029291        8.546939

A100 (bfloat16)

Using transformers (current version):

                         MASE[0.5]  mean_weighted_sum_quantile_loss  inference_time
model                                                                              
amazon/chronos-t5-base    0.903433                         0.026808       52.598027
amazon/chronos-t5-large   0.945507                         0.022141      149.007310
amazon/chronos-t5-mini    0.874791                         0.024425       10.292101
amazon/chronos-t5-small   0.871871                         0.027540       14.947764
amazon/chronos-t5-tiny    0.994311                         0.030779        7.021869

Using SDPA:

                         MASE[0.5]  mean_weighted_sum_quantile_loss  inference_time
model                                                                              
amazon/chronos-t5-base    0.902784                         0.029677       36.885420
amazon/chronos-t5-large   0.938067                         0.020137      134.648429
amazon/chronos-t5-mini    0.867450                         0.025005        5.402657
amazon/chronos-t5-small   0.861055                         0.027413        7.715756
amazon/chronos-t5-tiny    0.979267                         0.029882        5.227138

Script

import timeit

import numpy as np
import pandas as pd
import torch
from gluonts.dataset.repository import get_dataset
from gluonts.dataset.split import split
from gluonts.ev.metrics import MASE, MeanWeightedSumQuantileLoss
from gluonts.model.evaluation import evaluate_forecasts
from gluonts.model.forecast import SampleForecast
from torch.nn.functional import scaled_dot_product_attention as sdpa
from transformers.models.t5.modeling_t5 import T5Attention

from chronos import ChronosPipeline


def sdpa_forward(
    self,
    hidden_states,
    mask=None,
    key_value_states=None,
    position_bias=None,
    past_key_value=None,
    layer_head_mask=None,
    query_length=None,
    use_cache=False,
    output_attentions=False,
):
    """
    Self-attention (if key_value_states is None) or attention over source sentence (provided by key_value_states).
    """
    # Input is (batch_size, seq_length, dim)
    # Mask is (batch_size, key_length) (non-causal) or (batch_size, key_length, key_length)
    # past_key_value[0] is (batch_size, n_heads, q_len - 1, dim_per_head)
    batch_size, seq_length = hidden_states.shape[:2]

    real_seq_length = seq_length

    if past_key_value is not None:
        if len(past_key_value) != 2:
            raise ValueError(
                f"past_key_value should have 2 past states: keys and values. Got { len(past_key_value)} past states"
            )
        real_seq_length += (
            past_key_value[0].shape[2] if query_length is None else query_length
        )

    key_length = (
        real_seq_length if key_value_states is None else key_value_states.shape[1]
    )

    def shape(states):
        """projection"""
        return states.view(
            batch_size, -1, self.n_heads, self.key_value_proj_dim
        ).transpose(1, 2)

    def unshape(states):
        """reshape"""
        return states.transpose(1, 2).contiguous().view(batch_size, -1, self.inner_dim)

    def project(hidden_states, proj_layer, key_value_states, past_key_value):
        """projects hidden states correctly to key/query states"""
        if key_value_states is None:
            # self-attn
            # (batch_size, n_heads, seq_length, dim_per_head)
            hidden_states = shape(proj_layer(hidden_states))
        elif past_key_value is None:
            # cross-attn
            # (batch_size, n_heads, seq_length, dim_per_head)
            hidden_states = shape(proj_layer(key_value_states))

        if past_key_value is not None:
            if key_value_states is None:
                # self-attn
                # (batch_size, n_heads, key_length, dim_per_head)
                hidden_states = torch.cat([past_key_value, hidden_states], dim=2)
            elif past_key_value.shape[2] != key_value_states.shape[1]:
                # checking that the `sequence_length` of the `past_key_value` is the same as
                # the provided `key_value_states` to support prefix tuning
                # cross-attn
                # (batch_size, n_heads, seq_length, dim_per_head)
                hidden_states = shape(proj_layer(key_value_states))
            else:
                # cross-attn
                hidden_states = past_key_value
        return hidden_states

    # get query states
    query_states = shape(
        self.q(hidden_states)
    )  # (batch_size, n_heads, seq_length, dim_per_head)

    # get key/value states
    key_states = project(
        hidden_states,
        self.k,
        key_value_states,
        past_key_value[0] if past_key_value is not None else None,
    )
    value_states = project(
        hidden_states,
        self.v,
        key_value_states,
        past_key_value[1] if past_key_value is not None else None,
    )

    if position_bias is None:
        if not self.has_relative_attention_bias:
            position_bias = torch.zeros(
                (1, self.n_heads, real_seq_length, key_length),
                device=query_states.device,
                dtype=query_states.dtype,
            )
            if self.gradient_checkpointing and self.training:
                position_bias.requires_grad = True
        else:
            position_bias = self.compute_bias(
                real_seq_length, key_length, device=query_states.device
            )

        # if key and values are already calculated
        # we want only the last query position bias
        if past_key_value is not None:
            position_bias = position_bias[:, :, -hidden_states.size(1) :, :]

        if mask is not None:
            position_bias = (
                position_bias + mask
            )  # (batch_size, n_heads, seq_length, key_length)

    if self.pruned_heads:
        mask = torch.ones(position_bias.shape[1])
        mask[list(self.pruned_heads)] = 0
        position_bias_masked = position_bias[:, mask.bool()]
    else:
        position_bias_masked = position_bias

    assert layer_head_mask is None, "Cannot use layer_head_mask when using SDPA kernel"
    assert not output_attentions, "Cannot output attn_weights when using SDPA kernel"
    attn_output = unshape(
        sdpa(
            query_states,
            key_states,
            value_states,
            attn_mask=position_bias_masked,
            dropout_p=self.dropout if self.training else 0.0,
            scale=1.0,
        )
    )
    attn_output = self.o(attn_output)
    present_key_value_state = (
        (key_states, value_states) if (self.is_decoder and use_cache) else None
    )
    outputs = (attn_output,) + (present_key_value_state,) + (position_bias,)

    return outputs


def benchmark_model(
    pipeline: ChronosPipeline,
    gluonts_dataset: str = "m4_hourly",
    batch_size: int = 32,
):
    dataset = get_dataset(gluonts_dataset)
    prediction_length = dataset.metadata.prediction_length
    _, test_template = split(dataset.test, offset=-prediction_length)
    test_data = test_template.generate_instances(prediction_length)
    test_data_input = list(test_data.input)

    start_time = timeit.default_timer()
    forecasts = []
    for idx in range(0, len(test_data_input), batch_size):
        batch = [
            torch.tensor(item["target"])
            for item in test_data_input[idx : idx + batch_size]
        ]
        batch_forecasts = pipeline.predict(batch, prediction_length)
        forecasts.append(batch_forecasts)
    forecasts = torch.cat(forecasts)
    end_time = timeit.default_timer()

    print(f"Inference time: {end_time-start_time:.2f}s")

    results_df = evaluate_forecasts(
        forecasts=[
            SampleForecast(fcst.numpy(), start_date=label["start"])
            for fcst, label in zip(forecasts, test_data.label)
        ],
        test_data=test_data,
        metrics=[MASE(), MeanWeightedSumQuantileLoss(np.arange(0.1, 1, 0.1))],
    )
    results_df["inference_time"] = end_time - start_time
    return results_df


if __name__ == "__main__":
    gluonts_dataset = "m4_hourly"
    models = [
        "amazon/chronos-t5-tiny",
        "amazon/chronos-t5-mini",
        "amazon/chronos-t5-small",
        "amazon/chronos-t5-base",
        "amazon/chronos-t5-large",
    ]
    batch_sizes = [64, 32, 32, 8, 4]

    # Comment out the following line to run the regular transformers version
    T5Attention.forward = sdpa_forward  # Monkey patch forward

    results = []
    for model_name, batch_size in zip(models, batch_sizes):
        pipeline = ChronosPipeline.from_pretrained(
            model_name,
            device_map="cuda:0",
            torch_dtype=torch.float32,
        )
        result_df = benchmark_model(
            pipeline, gluonts_dataset=gluonts_dataset, batch_size=batch_size
        )
        result_df["model"] = model_name
        print(result_df)
        results.append(result_df)
    results = pd.concat(results).set_index("model").sort_index()
    print(results)

>64 sample sizes returns results for 64 samples.

Hi there! I recently utilized Chronos for a school project and encountered an issue regarding prediction length. Specifically, when setting the prediction length higher than 64 and using the sample size as the prediction length, Chronos consistently returns the same predictions, regardless of the specified length. Despite changing the prediction length, the predictions remain consistent at [64, 64, 64, 9].


            
df = ali[['date', 'open']]
context = torch.tensor(df["open"]) 
prediction_length = 201  
forecast = pipeline.predict(
    context,
    prediction_length,
    num_samples= 201,
    temperature=1.0,
    limit_prediction_length=False,
    top_k=50,
    top_p=1.0,
)

And the result is;

How to make predictions deterministic?

Hello everyone,
how to prevent it from changing new predictions after each training?
Of course it is a LLM, but it is problem that changing every time. for example; my mape value is changing rapidly, bad or better..
have any recommend?

thank you already now.
abdül

amazon-science / chronos-forecasting Goto Github PK

chronos-forecasting's Introduction

Chronos: Learning the Language of Time Series

🚀 News

✨ Introduction

Architecture

Zero-Shot Results

📈 Usage

Forecasting

Extracting Encoder Embeddings

Pretraining, fine-tuning and evaluation

💾 Datasets

🔥 Coverage

📝 Citation

🛡️ Security

📃 License

chronos-forecasting's People

Contributors

Stargazers

Watchers

Forkers

chronos-forecasting's Issues

Summary

Results

V100 (float32)

A100 (float32)

A100 (bfloat16)

Script

Recommend Projects

Recommend Topics

Recommend Org