Code Monkey home page Code Monkey logo

wenjiedu / awesome_imputation Goto Github PK

View Code? Open in Web Editor NEW
130.0 6.0 18.0 3.15 MB

Awesome Deep Learning for Time-Series Imputation, including a must-read paper list about applying neural networks to impute incomplete time series containing NaN missing values/data

License: BSD 3-Clause "New" or "Revised" License

Python 55.26% Shell 2.01% Jupyter Notebook 42.73%
benchmark data-mining deep-learning imputation machine-learning missing-data missing-values neural-network probablistic survey time-series interpolation missingness nan incomplete-time-series irregular-time-series time-series-imputation

awesome_imputation's Introduction

Time Series Imputation Survey and Benchmark

The repository for the paper TSI-Bench: Benchmarking Time Series Imputation from PyPOTS Research. The code and configurations for reproducing the experimental results in the paper are available under the folder benchmark_code. The README file here maintains a list of must-read papers on time-series imputation, and a collection of time-series imputation toolkits and resources.

🤗 Contributions to update new resources and articles are very welcome!

❖ Time-Series Imputation Toolkits

Datasets

TSDB (Time Series Data Beans): a Python toolkit can load 170 public time-series datasets with a single line of code.

BenchPOTS: a Python suite provides standard preprocessing pipelines of 170 public datasets for benchmarking machine learning on POTS (Partially-Observed Time Series).

Missingness

PyGrinder: a Python library grinds data beans into the incomplete by introducing missing values with different missing patterns.

Algorithms

PyPOTS: a Python toolbox for data mining on POTS (Partially-Observed Time Series)

MICE: Multivariate Imputation by Chained Equations

AutoImpute: a Python package for Imputation Methods

Impyute: a library of missing data imputation algorithms

❖ Must-Read Papers on Time-Series Imputation

The papers listed here may be not from top publications, some of them even are not deep-learning methods, but are all interesting papers related to time-series imputation that deserve reading to researchers and practitioners who are interested in this field.

Year 2024

[KDD] ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation [paper] [official code]

[ICML] BayOTIDE: Bayesian Online Multivariate Time Series Imputation with Functional Decomposition [paper] [official code]

[ICLR] Conditional Information Bottleneck Approach for Time Series Imputation [paper] [official code]

[AISTATS] SADI: Similarity-Aware Diffusion Model-Based Imputation for Incomplete Temporal EHR Data [paper] [official code]

Year 2023

[ICLR] Multivariate Time-series Imputation with Disentangled Temporal Representations [paper] [official code]

[ICDE] PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation [paper] [official code]

[ESWA] SAITS: Self-Attention-based Imputation for Time Series [paper] [official code]

[TMLR] Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models [paper] [official code]

[ICML] Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion [paper] [official code]

[ICML] Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation [paper] [official code]

[ICML] Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion [paper]

[ICML] Probabilistic Imputation for Time-series Classification with Missing Data [paper]

[KDD] Source-Free Domain Adaptation with Temporal Imputation for Time Series Data [paper] [official code]

[KDD] Networked Time Series Imputation via Position-aware Graph Enhanced Variational Autoencoders [paper]

[KDD] An Observed Value Consistent Diffusion Model for Imputing Missing Values in Multivariate Time Series [paper]

[TKDE] Selective Imputation for Multivariate Time Series Datasets With Missing Values [paper] [official code]

[TKDE] PATNet- Propensity-Adjusted Temporal Network for Joint Imputation and Prediction using Binary EHRs with Observation Bias [paper]

[TKDD] Multiple Imputation Ensembles for Time Series (MIE-TS) [paper]

[CIKM] Density-Aware Temporal Attentive Step-wise Diffusion Model For Medical Time Series Imputation [paper]

Year 2022

[ICLR] Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks [paper] [official code]

[AAAI] Online Missing Value Imputation and Change Point Detection with the Gaussian Copula [paper] [official code]

[AAAI] Dynamic Nonlinear Matrix Completion for Time-Varying Data Imputation [paper]

[AAAI] Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values [paper]

Year 2021

[NeurIPS] CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation [paper] [official code]

[AAAI] Generative Semi-supervised Learning for Multivariate Time Series Imputation [paper]

[VLDB] Missing Value Imputation on Multidimensional Time Series [paper]

[ICDM] STING: Self-attention based Time-series Imputation Networks using GAN [paper]

Year 2020

[AISTATS] GP-VAE: Deep Probabilistic Time Series Imputation [paper] [official code]

[CVPR] Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation [paper]

[ICLR] Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks [paper]

[TNNLS] Adversarial Recurrent Time Series Imputation [paper]

Year 2019

[NeurIPS] NAOMI: Non-Autoregressive Multiresolution Sequence Imputation [paper] [official code]

[IJCAI] E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation [paper] [official code]

[WWW] How Do Your Neighbors Disclose Your Information: Social-Aware Time Series Imputation [paper] [official code]

Year 2018

[NeurIPS] BRITS: Bidirectional Recurrent Imputation for Time Series [paper] [official code]

[Scientific Reports] Recurrent Neural Networks for Multivariate Time Series with Missing Values [paper] [official code]

[NeurIPS] Multivariate Time Series Imputation with Generative Adversarial Networks [paper] [official code]

Year 2017

[IEEE Transactions on Biomedical Engineering] Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks [paper] [official code]

Year 2016

[IJCAI] ST-MVL: Filling Missing Values in Geo-sensory Time Series Data [paper] [official code]

❖ Other Resources

Articles about General Missingness and Imputation

[blog] Data Imputation: An essential yet overlooked problem in machine learning

[Journal of Big Data] A survey on missing data in machine learning [paper]

Repos about General Time Series

Transformers in Time Series

LLMs and Foundation Models for Time Series and Spatio-Temporal Data

AI for Time Series (AI4TS) Papers, Tutorials, and Surveys

❖ Citing This Work

If you find this repository and PyPOTS Ecosystem helpful to your work, please kindly star it and cite our benchmark paper, survey paper, and PyPOTS as follows:

@article{du2024tsibench,
title={TSI-Bench: Benchmarking Time Series Imputation},
author={Wenjie Du and Jun Wang and Linglong Qian and Yiyuan Yang and Fanxing Liu and Zepu Wang and Zina Ibrahim and Haoxin Liu and Zhiyuan Zhao and Yingjie Zhou and Wenjia Wang and Kaize Ding and Yuxuan Liang and B. Aditya Prakash and Qingsong Wen},
journal={arXiv preprint arXiv:2406.12747},
year={2024}
}
@article{wang2024deep,
title={Deep Learning for Multivariate Time Series Imputation: A Survey},
author={Jun Wang and Wenjie Du and Wei Cao and Keli Zhang and Wenjia Wang and Yuxuan Liang and Qingsong Wen},
journal={arXiv preprint arXiv:2402.04059},
year={2024}
}
@article{du2023pypots,
title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
author={Wenjie Du},
journal={arXiv preprint arXiv:2305.18811},
year={2023},
}
🏠 Visits Awesome_Imputation visits

awesome_imputation's People

Contributors

augustjw avatar fanxingliu2020 avatar linglongqian avatar qingsongedu avatar wenjiedu avatar yyysjz1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

awesome_imputation's Issues

About comparison fairness and dataset splitting

Dear Authors,

Thank you for your invaluable contributions to this repository. I am currently exploring the field of time series imputation and have encountered some aspects regarding the evaluation protocols that I believe could benefit from further discussion.

  1. Dataset Splitting: The choice to split the dataset chronologically is well-suited for time series forecasting to prevent data leakage. However, for imputation tasks where the goal is to address the missingness in available data, such splitting may not be necessary. Given that the primary concern in imputation is dealing with inherently missing data, a non-chronological split might be more appropriate as it reflects real-world scenarios where all available data is subject to imputation, instead of the recent ones.
  2. Evaluation Comparisons: The evaluation process raises somewhat questions about fairness and consistency across different methods. We compare for instance the Transformer and mean imputer. While the Transformer model is assessed using test data, the approach for evaluating a mean imputer remains unclear. Should the mean imputer also have access to the test data since the non-missing data in the test data should also be available in model serving? There are two options:
  • training the mean imputer on the train-eval set is unfair since the non-missing data in the test set should be available for the mean imputer too, which does not cause leakage and has been exploited by nn models.
  • training the mean imputer exclusively on the test set does not leverage the potentially informative train-eval sets, which seems equally unfair.

In view of these points, I suggest the following:

  • For generalized imputation methods like those in HyperImpute, should we maintain merely the unavailability of missing values in the test set while considering the rest of the data as usable (including the non-missing values in the test data, the train and eval data)?
  • Could we use a non-chronological train-val-test split, given that in practical applications, the emphasis is on imputing the entire dataset rather than the recent months? More importantly, in the case of missing value imputation, the non-missing data is often unavailable (kindly see the protocol of HyperImpute for reference)

I look forward to your insights and any suggestions you might have on aligning the evaluation framework with real-world imputation tasks.

Best regards,

Hao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.