Code Monkey home page Code Monkey logo

sparsetsf's Introduction

SparseTSF

Welcome to the official repository of the SparseTSF paper: "SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters"

Updates

🚩 News (2024.07) We have now fixed a long-standing bug (see description in FITS and TFB ) in the code framework and supplemented the full results (including MSE and MAE) of SparseTSF after fixing the bug in this table.

🚩 News (2024.06) SparseTSF paper has been selected for an Oral presentation at ICML 2024 (acceptance rate less than 1.5%).

🚩 News (2024.05) SparseTSF has been accepted as a paper at ICML 2024, receiving an average rating of 7 with confidence of 4.5.

Introduction

SparseTSF is a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF). At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity and trend in time series data.

Technically, it first downsamples the original sequences with constant periodicity into subsequences, then performs predictions on each downsampled subsequence, simplifying the original time series forecasting task into a cross-period trend prediction task.

image

Intuitively, SparseTSF can be perceived as a sparsely connected linear layer performing sliding prediction across periods

image

This approach yields two benefits: (i) effective decoupling of data periodicity and trend, enabling the model to stably identify and extract periodic features while focusing on predicting trend changes, and (ii) extreme compression of the model's parameter size, significantly reducing the demand for computational resources.

img.png

SparseTSF achieves near state-of-the-art prediction performance with less than 1k trainable parameters, which makes it 1 ~ 4 orders of magnitude smaller than its counterparts.

img.png

Additionally, SparseTSF showcases remarkable generalization capabilities (cross-domain), making it well-suited for scenarios with limited computational resources, small samples, or low-quality data.

img.png

From the distribution of normalized weights for both the trained Linear model and the SparseTSF model, it can be observed that SparseTSF learns more distinct, evenly spaced weight distribution stripes compared to the Linear model. This indicates that SparseTSF has a stronger capability in extracting periodic features. This benefit arises because the Sparse technique enables the model to focus more effectively on cross-period historical information.

img.png

Note a special case where the dataset’s period is excessively large (for instance, a period of 144 for ETTm1). Resampling with too large a period results in very short subsequences with sparse connections, leading to underutilization of information. In such cases, setting the period length to [2-6], i.e., adopting a denser sparse strategy, can be beneficial. This might be because an appropriate sparse strategy can help the model focus more on useful information and reduce the influence of irrelevant noise. We will continue to explore this aspect in future research.

img.png

Through examining SparseTSF’s performance with varying input lengths, we can observe that the model experiences a significant performance shift with input lengths of 96-192 on the Electricity and Traffic datasets. This is because Traffic not only has a significant daily periodic pattern (w = 24) but also a noticeable weekly periodic pattern (w = 168). In this case, a look-back of 96 cannot cover the entire weekly periodic pattern, leading to a significant performance drop.

img.png

This underscores the necessity of sufficiently long look-back lengths (at least covering the entire cycle length) for accurate prediction. Given the extremely lightweight nature of SparseTSF, we strongly recommend providing sufficiently long look-back windows whenever feasible. Therefore, SparseTSF defaults to using an input length of 720. Even with this configuration, predicting a super long horizon of 720 on these datasets requires only 925 parameters (less than 1K).

Getting Started

Environment Requirements

To get started, ensure you have Conda installed on your system and follow these steps to set up the environment:

conda create -n SparseTSF python=3.8
conda activate SparseTSF
pip install -r requirements.txt

Data Preparation

All the datasets needed for SparseTSF can be obtained from the Google Drive provided in Autoformer. Create a separate folder named ./dataset and place all the CSV files in this directory. Note: Place the CSV files directly into this directory, such as "./dataset/ETTh1.csv"

Training Example

You can easily reproduce the results from the paper by running the provided script command. For instance, to reproduce the main results, execute the following command:

sh run_all.sh

Similarly, you can specify separate scripts to run independent tasks, such as obtaining results on etth1:

sh scripts/SparseTSF/etth1.sh

Usage on Your Data

SparseTSF relies on the inherent periodicity in the data. If you intend to use SparseTSF on your data, please first ascertain whether your data exhibits periodicity, which can be determined through ACF analysis.

We provide an example in the ACF_ETTh1.ipynb notebook to determine the primary period of the ETTh1 dataset. You can utilize it to ascertain the periodicity of your dataset and set the period_len parameter accordingly. Alternatively, you can set it to [2-6] when the period length is excessively large, as mentioned earlier.

Further Reading

The objective of this work is to explore an ultra-lightweight yet sufficiently powerful method to be applicable in edge scenarios with limited resources and small datasets for transfer learning and generalization.

If you seek higher predictive performance, we recommend our alternative work, SegRNN, which is an innovative RNN-based model specifically designed for LTSF. By integrating Segment-wise Iterations and Parallel Multi-step Forecasting (PMF) strategies, SegRNN achieves state-of-the-art results with just a single layer of GRU, making it extremely lightweight and efficient.

Citation

If you find this repo useful, please cite our paper.

@article{lin2024sparsetsf,
  title={SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters},
  author={Lin, Shengsheng and Lin, Weiwei and Wu, Wentai and Chen, Haojun and Yang, Junjie},
  journal={arXiv preprint arXiv:2405.00946},
  year={2024}
}

Full results

There was a longstanding bug in our current framework where the last batch of data was discarded during the testing phase (i.e., drop_last = False). This might have affected the model's performance, especially when using a large batch size on small datasets. We have now fixed this issue (see data_provider/data_factory.py and exp/exp_main.py).

We have now supplemented the full results (including MSE and MAE) of SparseTSF after fixing the bug as follows. Herein, we consistently used a lookback length of 720 and MSE as the loss function. For FITS, we defaulted to using COF at the 5th harmonic.

SegRNN FITS SparseTSF
Dataset Horizon MSE MAE MSE MAE MSE MAE
ETTh1 96 0.351 0.392 0.382 0.405 0.362 0.388
192 0.390 0.418 0.417 0.425 0.403 0.411
336 0.449 0.452 0.436 0.442 0.434 0.428
720 0.492 0.494 0.433 0.455 0.426 0.447
ETTh2 96 0.275 0.338 0.272 0.336 0.294 0.346
192 0.338 0.380 0.333 0.375 0.339 0.377
336 0.419 0.445 0.355 0.396 0.359 0.397
720 0.431 0.464 0.378 0.423 0.383 0.424
ETTm1 96 0.295 0.356 0.311 0.354 0.312 0.354
192 0.334 0.382 0.340 0.369 0.347 0.376
336 0.359 0.401 0.367 0.385 0.367 0.386
720 0.415 0.435 0.416 0.412 0.419 0.413
ETTm2 96 0.165 0.251 0.163 0.254 0.163 0.252
192 0.226 0.300 0.217 0.291 0.217 0.290
336 0.282 0.341 0.268 0.326 0.270 0.327
720 0.361 0.392 0.349 0.378 0.352 0.379
Electricity 96 0.130 0.228 0.145 0.248 0.138 0.233
192 0.152 0.251 0.159 0.260 0.151 0.244
336 0.170 0.272 0.175 0.275 0.166 0.260
720 0.203 0.304 0.212 0.305 0.205 0.293
Solar 96 0.175 0.236 0.192 0.241 0.195 0.243
192 0.193 0.268 0.214 0.253 0.215 0.254
336 0.209 0.263 0.231 0.261 0.232 0.262
720 0.205 0.264 0.237 0.265 0.237 0.263
traffic 96 0.356 0.255 0.398 0.286 0.389 0.268
192 0.374 0.268 0.409 0.289 0.398 0.270
336 0.393 0.273 0.421 0.294 0.411 0.275
720 0.434 0.294 0.457 0.311 0.448 0.297
weather 96 0.141 0.205 0.170 0.225 0.169 0.223
192 0.185 0.250 0.212 0.260 0.214 0.262
336 0.241 0.297 0.258 0.294 0.257 0.293
720 0.318 0.352 0.320 0.339 0.321 0.340

Contact

If you have any questions or suggestions, feel free to contact:

Acknowledgement

We extend our heartfelt appreciation to the following GitHub repositories for providing valuable code bases and datasets:

https://github.com/lss-1138/SegRNN

https://github.com/VEWOXIC/FITS

https://github.com/yuqinie98/patchtst

https://github.com/cure-lab/LTSF-Linear

https://github.com/zhouhaoyi/Informer2020

https://github.com/thuml/Autoformer

https://github.com/MAZiqing/FEDformer

https://github.com/alipay/Pyraformer

https://github.com/ts-kim/RevIN

https://github.com/timeseriesAI/tsai

sparsetsf's People

Contributors

lss-1138 avatar

Stargazers

Xiangfei Qiu avatar  avatar  avatar  avatar liumin-serendipity avatar Fengyx avatar Xu avatar DengErrrr avatar  avatar Muhamad Bestagi Romadhon avatar  avatar  avatar Kiana Hooshanfar avatar Where2stay avatar Trevor Hobenshield avatar V3I avatar Yu Guoqi avatar 贝中一 avatar wdkhuans avatar  avatar zhangbowen avatar Seungjae Park avatar Matt Shaffer avatar  avatar Zhenhua Yang avatar  avatar Qiu Chunyun avatar  avatar  avatar EnanaAwa avatar Dean avatar  avatar  avatar Daksh Aggarwal avatar Thomas Betton avatar Ladbaby avatar LI MING avatar Mouad En-nasiry avatar  avatar Rhett avatar Li Zhaoyun avatar oasis avatar bigqy avatar  avatar  avatar  avatar Du avatar  avatar  avatar  avatar  avatar Liqun Chen avatar  avatar  avatar Daehoon Gwak avatar khalid OUBLAL avatar Araz avatar Whedon avatar Matteo Ciotola avatar christie avatar  avatar Seok-Ju Hahn (Adam) avatar  avatar  avatar casxter avatar  avatar Z avatar  avatar 123 avatar Peiyuan Liu avatar clyee avatar 汪商炯 avatar Uniguri avatar Anbc avatar Yixin Yan avatar  avatar  avatar Kexuan Zhang avatar  avatar  avatar Kristof S. avatar lipZ avatar Mr. Fisher avatar lilijian avatar Yuankai Wu avatar  avatar Tobias avatar  avatar Lann avatar sc zz avatar Ümit Kaan Usta avatar Eugene Zatepyakin avatar Xinhui Lin avatar  avatar liuyijian avatar Chenxi Liu avatar  avatar Hongbo Zhao avatar YZGGGA avatar  avatar

Watchers

Kostas Georgiou avatar  avatar Matt Shaffer avatar

sparsetsf's Issues

categorize model

Dear developer.
This model convolutional neural network or how to properly categorize this model?

Thanks.

Model Reproducibility Issue

Hello, when I used the command sh scripts/SparseTSF/etth1.sh to reproduce the results from the paper, I found that the predictions with sequence lengths of 336 and 720 differ significantly from the results reported in the paper. However, the results for the first two lengths are within a reasonable range. Could you please help me understand why this discrepancy might be occurring?
image
image

odd period_len and kernel size

In Sparse/models/SparseTSF.py, the kernel size of nn.Conv1d() is set to 1 + 2 * self.period_len // 2. As I try to use an odd period_len, It will raise error saying the shape of input x can not match itself after 1D convolution aggregation. I guess the kernel size should be set to 1 + self.period_len // 2 * 2 or 1 + 2 * (self.period_len // 2) ?

Stat_models example

Dear Developer. Let me thank you for your work.
Very interesting approach.

I noticed that you have Stat_models figuring in both model repositories (SegRNN, SparseTSF), while you do not experiment in articles with this code. You have an example script with an implementation of how the models work from the Stat_modelss file ?

Problem about transformer+sparse boost

I replaced the Linear Layer in the prediction part with a traditional transformer, but the results were not satisfactory (ETTh1 dataset). The performance is far from the indicators of the transformer+sparse boost in Table 5 of the paper. May I ask how the author conducted this experiment?
Here are my experimental results as follows:
ETTh1_720_96_SparseTSF_ETTh1_ftM_sl720_pl96_test_0_seed2023
mse:0.6976203322410583, mae:0.560204029083252, rse:0.7933546900749207

ETTh1_720_192_SparseTSF_ETTh1_ftM_sl720_pl192_test_0_seed2023
mse:0.7232381701469421, mae:0.5843048095703125, rse:0.807603120803833

ETTh1_720_336_SparseTSF_ETTh1_ftM_sl720_pl336_test_0_seed2023
mse:0.7192806601524353, mae:0.5902762413024902, rse:0.8074232339859009

ETTh1_720_720_SparseTSF_ETTh1_ftM_sl720_pl720_test_0_seed2023
mse:0.7250006794929504, mae:0.6054915189743042, rse:0.8151193261146545

I got SOTA result with hparam search

I put the Sparse TSF in the PatchTST repo and did a hyper parameter search. I got a MSE of 0.31 for 336 prediction length which is better than SOTA. Am I doing something wrong or are the SOTA results not optimized?

Bug Issue: drop_last=True for test dataset may lead to inaccurate test results

Dear authors:

I am impressed by this amazing work! But when I am reading another paper in ICLR 24' about time series forecasting (FITS: Modeling Time Series with
Parameters) I noticed the authors reported this bug in their repo (https://github.com/vewoxic/fits).The benchmark paper of PVLDB2024 "TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods" also pointed out that the droplast operation should be abandoned during the testing process, otherwise it will cause unfair comparisons.

In short, this bug, originitated from Informer (AAAI 21'), drops some test samples in the last test batch thus resulting in inaccurate results measured. (Please refer to FITS' repo for a detailed description.) Could you please confirm whether your results are affected by this piece of code? If so, could you please modify it and correct the results in the tables in your assey?

Thanks!

results in icml talk

Was your results table fixed with respect to the evaluation bug at ICML talk?

Screenshot 2024-07-24 at 16 24 50

Is the paper in arXiv final version?

I noticed that SparseTSF adopts the channel-independent strategy in Section 3.1.

However, The channel-dependent strategy is used in the source code.

Is there a conflict between them?

How to adjust the period_len parameter for my data

I read your source code and found that the input data you have here is indexed "y-MM-dd HH:mm:ss" and n columns of data. If my input data, the index is just "y-MM-dd", that is, one line represents a day, and a day has 96 points in time. How can I find the right "period_len" and whether the network should make any adjustments to the way I enter data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.