The awesome_imputation from wenjiedu

The repository for the paper TSI-Bench: Benchmarking Time Series Imputation from PyPOTS Research. The code and configurations for reproducing the experimental results in the paper are available under the folder benchmark_code. The README file here maintains a list of must-read papers on time-series imputation, and a collection of time-series imputation toolkits and resources.

🤗 Contributions to update new resources and articles are very welcome!

❖ Time-Series Imputation Toolkits

`Datasets`

TSDB (Time Series Data Beans): a Python toolkit can load 170 public time-series datasets with a single line of code.

BenchPOTS: a Python suite provides standard preprocessing pipelines of 170 public datasets for benchmarking machine learning on POTS (Partially-Observed Time Series).

`Missingness`

PyGrinder: a Python library grinds data beans into the incomplete by introducing missing values with different missing patterns.

`Algorithms`

PyPOTS: a Python toolbox for data mining on POTS (Partially-Observed Time Series)

MICE: Multivariate Imputation by Chained Equations

AutoImpute: a Python package for Imputation Methods

Impyute: a library of missing data imputation algorithms

❖ Must-Read Papers on Time-Series Imputation

The papers listed here may be not from top publications, some of them even are not deep-learning methods, but are all interesting papers related to time-series imputation that deserve reading to researchers and practitioners who are interested in this field.

`Year 2024`

[KDD] ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation [paper] [official code]

[ICML] BayOTIDE: Bayesian Online Multivariate Time Series Imputation with Functional Decomposition [paper] [official code]

[ICLR] Conditional Information Bottleneck Approach for Time Series Imputation [paper] [official code]

[AISTATS] SADI: Similarity-Aware Diffusion Model-Based Imputation for Incomplete Temporal EHR Data [paper] [official code]

`Year 2023`

[ICLR] Multivariate Time-series Imputation with Disentangled Temporal Representations [paper] [official code]

[ICDE] PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation [paper] [official code]

[ESWA] SAITS: Self-Attention-based Imputation for Time Series [paper] [official code]

[TMLR] Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models [paper] [official code]

[ICML] Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion [paper] [official code]

[ICML] Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation [paper] [official code]

[ICML] Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion [paper]

[ICML] Probabilistic Imputation for Time-series Classification with Missing Data [paper]

[KDD] Source-Free Domain Adaptation with Temporal Imputation for Time Series Data [paper] [official code]

[KDD] Networked Time Series Imputation via Position-aware Graph Enhanced Variational Autoencoders [paper]

[KDD] An Observed Value Consistent Diffusion Model for Imputing Missing Values in Multivariate Time Series [paper]

[TKDE] Selective Imputation for Multivariate Time Series Datasets With Missing Values [paper] [official code]

[TKDE] PATNet- Propensity-Adjusted Temporal Network for Joint Imputation and Prediction using Binary EHRs with Observation Bias [paper]

[TKDD] Multiple Imputation Ensembles for Time Series (MIE-TS) [paper]

[CIKM] Density-Aware Temporal Attentive Step-wise Diffusion Model For Medical Time Series Imputation [paper]

`Year 2022`

[ICLR] Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks [paper] [official code]

[AAAI] Online Missing Value Imputation and Change Point Detection with the Gaussian Copula [paper] [official code]

[AAAI] Dynamic Nonlinear Matrix Completion for Time-Varying Data Imputation [paper]

[AAAI] Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values [paper]

`Year 2021`

[NeurIPS] CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation [paper] [official code]

[AAAI] Generative Semi-supervised Learning for Multivariate Time Series Imputation [paper]

[VLDB] Missing Value Imputation on Multidimensional Time Series [paper]

[ICDM] STING: Self-attention based Time-series Imputation Networks using GAN [paper]

`Year 2020`

[AISTATS] GP-VAE: Deep Probabilistic Time Series Imputation [paper] [official code]

[CVPR] Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation [paper]

[ICLR] Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks [paper]

[TNNLS] Adversarial Recurrent Time Series Imputation [paper]

`Year 2019`

[NeurIPS] NAOMI: Non-Autoregressive Multiresolution Sequence Imputation [paper] [official code]

[IJCAI] E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation [paper] [official code]

[WWW] How Do Your Neighbors Disclose Your Information: Social-Aware Time Series Imputation [paper] [official code]

`Year 2018`

[NeurIPS] BRITS: Bidirectional Recurrent Imputation for Time Series [paper] [official code]

[Scientific Reports] Recurrent Neural Networks for Multivariate Time Series with Missing Values [paper] [official code]

[NeurIPS] Multivariate Time Series Imputation with Generative Adversarial Networks [paper] [official code]

`Year 2017`

[IEEE Transactions on Biomedical Engineering] Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks [paper] [official code]

`Year 2016`

[IJCAI] ST-MVL: Filling Missing Values in Geo-sensory Time Series Data [paper] [official code]

❖ Other Resources

`Articles about General Missingness and Imputation`

[blog] Data Imputation: An essential yet overlooked problem in machine learning

[Journal of Big Data] A survey on missing data in machine learning [paper]

`Repos about General Time Series`

Transformers in Time Series

LLMs and Foundation Models for Time Series and Spatio-Temporal Data

AI for Time Series (AI4TS) Papers, Tutorials, and Surveys

❖ Citing This Work

If you find this repository and PyPOTS Ecosystem helpful to your work, please kindly star it and cite our benchmark paper, survey paper, and PyPOTS as follows:

@article{du2024tsibench,
title={TSI-Bench: Benchmarking Time Series Imputation},
author={Wenjie Du and Jun Wang and Linglong Qian and Yiyuan Yang and Fanxing Liu and Zepu Wang and Zina Ibrahim and Haoxin Liu and Zhiyuan Zhao and Yingjie Zhou and Wenjia Wang and Kaize Ding and Yuxuan Liang and B. Aditya Prakash and Qingsong Wen},
journal={arXiv preprint arXiv:2406.12747},
year={2024}
}

@article{wang2024deep,
title={Deep Learning for Multivariate Time Series Imputation: A Survey},
author={Jun Wang and Wenjie Du and Wei Cao and Keli Zhang and Wenjia Wang and Yuxuan Liang and Qingsong Wen},
journal={arXiv preprint arXiv:2402.04059},
year={2024}
}

@article{du2023pypots,
title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
author={Wenjie Du},
journal={arXiv preprint arXiv:2305.18811},
year={2023},
}

🏠 Visits

About comparison fairness and dataset splitting

Dear Authors,

Thank you for your invaluable contributions to this repository. I am currently exploring the field of time series imputation and have encountered some aspects regarding the evaluation protocols that I believe could benefit from further discussion.

Dataset Splitting: The choice to split the dataset chronologically is well-suited for time series forecasting to prevent data leakage. However, for imputation tasks where the goal is to address the missingness in available data, such splitting may not be necessary. Given that the primary concern in imputation is dealing with inherently missing data, a non-chronological split might be more appropriate as it reflects real-world scenarios where all available data is subject to imputation, instead of the recent ones.
Evaluation Comparisons: The evaluation process raises somewhat questions about fairness and consistency across different methods. We compare for instance the Transformer and mean imputer. While the Transformer model is assessed using test data, the approach for evaluating a mean imputer remains unclear. Should the mean imputer also have access to the test data since the non-missing data in the test data should also be available in model serving? There are two options:

training the mean imputer on the train-eval set is unfair since the non-missing data in the test set should be available for the mean imputer too, which does not cause leakage and has been exploited by nn models.
training the mean imputer exclusively on the test set does not leverage the potentially informative train-eval sets, which seems equally unfair.

In view of these points, I suggest the following:

For generalized imputation methods like those in HyperImpute, should we maintain merely the unavailability of missing values in the test set while considering the rest of the data as usable (including the non-missing values in the test data, the train and eval data)?
Could we use a non-chronological train-val-test split, given that in practical applications, the emphasis is on imputing the entire dataset rather than the recent months? More importantly, in the case of missing value imputation, the non-missing data is often unavailable (kindly see the protocol of HyperImpute for reference)

I look forward to your insights and any suggestions you might have on aligning the evaluation framework with real-world imputation tasks.

Best regards,

Hao

wenjiedu / awesome_imputation Goto Github PK

awesome_imputation's Introduction

❖ Time-Series Imputation Toolkits

Datasets

Missingness

Algorithms

❖ Must-Read Papers on Time-Series Imputation

Year 2024

Year 2023

Year 2022

Year 2021

Year 2020

Year 2019

Year 2018

Year 2017

Year 2016

❖ Other Resources

Articles about General Missingness and Imputation

Repos about General Time Series

❖ Citing This Work

awesome_imputation's People

Contributors

Stargazers

Watchers

Forkers

awesome_imputation's Issues

Recommend Projects

Recommend Topics

Recommend Org