Code Monkey home page Code Monkey logo

awesome-machine-unlearning's Introduction

Awesome Machine Unlearning

Awesome arXiv Website GitHub stars visitor badge Contrib

A collection of academic articles, published methodology, and datasets on the subject of machine unlearning.

A sortable version is available here: https://awesome-machine-unlearning.github.io/

Please read and cite our paper: arXiv

Nguyen, T.T., Huynh, T.T., Nguyen, P.L., Liew, A.W.C., Yin, H. and Nguyen, Q.V.H., 2022. A Survey of Machine Unlearning. arXiv preprint arXiv:2209.02299.

Citation

@article{nguyen2022survey,
  title={A Survey of Machine Unlearning},
  author={Nguyen, Thanh Tam and Huynh, Thanh Trung and Nguyen, Phi Le and Liew, Alan Wee-Chung and Yin, Hongzhi and Nguyen, Quoc Viet Hung},
  journal={arXiv preprint arXiv:2209.02299},
  year={2022}
}

A Framework of Machine Unlearning

timeline


Existing Surveys

Paper Title Venue Year
An Introduction to Machine Unlearning arXiv 2022
Machine Unlearning: Its Need and Implementation Strategies IC3 2021
Making machine learning forget Annual Privacy Forum 2019
“Amnesia” - A Selection of Machine Learning Models That Can Forget User Data Very Fast CIDR 2019
Humans forget, machines remember: Artificial intelligence and the Right to Be Forgotten Computer Law & Security Review 2018
Algorithms that remember: model inversion attacks and data protection law Philosophical Transactions of the Royal Society A 2018

Model-Agnostic Approaches

Model-Agnostic Model-agnostic machine unlearning methodologies include unlearning processes or frameworks that are applicable for different models. In some cases, they provide theoretical guarantees for only a class of models (e.g. linear models). But we still consider them model-agnostic as their core ideas are applicable to complex models (e.g. deep neural networks) with practical results.

Paper Title Year Author Venue Model Code
Certified Data Removal in Sum-Product Networks 2022 Becker and Liebig ICKG UNLEARNSPN [Code]
Learning with Recoverable Forgetting 2022 Ye et al. ECCV LIRF -
Continual Learning and Private Unlearning 2022 Liu et al. CoLLAs CLPU [Code]
Verifiable and Provably Secure Machine Unlearning 2022 Eisenhofer et al. arXiv - [Code]
VeriFi: Towards Verifiable Federated Unlearning 2022 Gao et al. arXiv VERIFI -
FedRecover: Recovering from Poisoning Attacks in Federated Learning using Historical Information 2022 Cao et al. S&P FedRecover -
Fast Yet Effective Machine Unlearning 2022 Tarun et al. arXiv UNSIR -
Membership Inference via Backdooring 2022 Hu et al. IJCAI MIB [Code]
Forget Unlearning: Towards True Data-Deletion in Machine Learning 2022 Chourasia et al. ICLR - -
Zero-Shot Machine Unlearning 2022 Chundawat et al. arXiv - -
Efficient Attribute Unlearning: Towards Selective Removal of Input Attributes from Feature Representations 2022 Guo et al. arXiv attribute unlearning -
Few-Shot Unlearning 2022 Yoon et al. ICLR - -
Federated Unlearning: How to Efficiently Erase a Client in FL? 2022 Halimi et al. UpML Workshop - -
Machine Unlearning Method Based On Projection Residual 2022 Cao et al. DSAA - -
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning 2022 Marchant et al. AAAI - [Code]
Athena: Probabilistic Verification of Machine Unlearning 2022 Sommer et al. PoPETs ATHENA -
FP2-MIA: A Membership Inference Attack Free of Posterior Probability in Machine Unlearning 2022 Lu et al. ProvSec FP2-MIA -
Deletion Inference, Reconstruction, and Compliance in Machine (Un)Learning 2022 Gao et al. PETS - -
Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization 2022 Zhang et al. NeurIPS PCMU -
The Right to be Forgotten in Federated Learning: An Efficient Realization with Rapid Retraining 2022 Liu et al. INFOCOM - [Code]
Backdoor Defense with Machine Unlearning 2022 Liu et al. INFOCOM BAERASER -
Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten 2022 Nguyen et al. ASIA CCS MCU -
Federated Unlearning for On-Device Recommendation 2022 Yuan et al. arXiv - -
Can Bad Teaching Induce Forgetting? Unlearning in Deep Networks using an Incompetent Teacher 2022 Chundawat et al. arXiv - -
Efficient Two-Stage Model Retraining for Machine Unlearning 2022 Kim and Woo CVPR Workshop - -
Learn to Forget: Machine Unlearning Via Neuron Masking 2021 Ma et al. IEEE Forsaken -
Adaptive Machine Unlearning 2021 Gupta et al. NeurIPS - [Code]
Descent-to-Delete: Gradient-Based Methods for Machine Unlearning 2021 Neel et al. ALT - -
Remember What You Want to Forget: Algorithms for Machine Unlearning 2021 Sekhari et al. NeurIPS - -
FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models 2021 Liu et al. IWQoS FedEraser -
Federated Unlearning 2021 Liu et al. IWQoS FedEraser [Code]
Machine Unlearning via Algorithmic Stability 2021 Ullah et al. COLT TV -
EMA: Auditing Data Removal from Trained Models 2021 Huang et al. MICCAI EMA [Code]
Knowledge-Adaptation Priors 2021 Khan and Swaroop NeurIPS K-prior [Code]
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models 2020 Wu et al. NeurIPS PrIU -
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks 2020 Golatkar et al. CVPR - -
Learn to Forget: User-Level Memorization Elimination in Federated Learning 2020 Liu et al. arXiv Forsaken -
Certified Data Removal from Machine Learning Models 2020 Guo et al. ICML - -
Class Clown: Data Redaction in Machine Unlearning at Enterprise Scale 2020 Felps et al. arXiv - -
A Novel Online Incremental and Decremental Learning Algorithm Based on Variable Support Vector Machine 2019 Chen et al. Cluster Computing - -
Making AI Forget You: Data Deletion in Machine Learning 2019 Ginart et al. NeurIPS - -
Lifelong Anomaly Detection Through Unlearning 2019 Du et al. CCS - -
Learning Not to Learn: Training Deep Neural Networks With Biased Data 2019 Kim et al. CVPR - -
Efficient Repair of Polluted Machine Learning Systems via Causal Unlearning 2018 Cao et al. ASIACCS KARMA [Code]
Understanding Black-box Predictions via Influence Functions 2017 Koh et al. ICML - [Code]
Towards Making Systems Forget with Machine Unlearning 2015 Cao and Yang S&P -
Towards Making Systems Forget with Machine Unlearning 2015 Cao et al. S&P - -
Incremental and decremental training for linear classification 2014 Tsai et al. KDD - [Code]
Multiple Incremental Decremental Learning of Support Vector Machines 2009 Karasuyama et al. NIPS - -
Incremental and Decremental Learning for Linear Support Vector Machines 2007 Romero et al. ICANN - -
Decremental Learning Algorithms for Nonlinear Langrangian and Least Squares Support Vector Machines 2007 Duan et al. OSB - -
Multicategory Incremental Proximal Support Vector Classifiers 2003 Tveit et al. KES - -
Incremental and Decremental Proximal Support Vector Classification using Decay Coefficients 2003 Tveit et al. DaWak - -
Incremental and Decremental Support Vector Machine Learning 2000 Cauwenberg et al. NeurIPS - -

Model-Intrinsic Approaches

Model-Intrinsic The model-intrinsic approaches include unlearning methods designed for a specific type of models. Although they are model-intrinsic, their applications are not necessarily narrow, as many ML models can share the same type.

Paper Title Year Author Venue Model Code
Unrolling SGD: Understanding Factors Influencing Machine Unlearning 2022 Thudi et al. EuroS&P - [Code]
Graph Unlearning 2022 Chen et al. CCS GraphEraser [Code]
Certified Graph Unlearning 2022 Chien et al. GLFrontiers Workshop - [Code]
Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification 2022 Bevan and Atapour-Abarghouei ICML - [Code]
Near-Optimal Task Selection for Meta-Learning with Mutual Information and Online Variational Bayesian Unlearning 2022 Chen et al. AISTATS - -
Unlearning Protected User Attributes in Recommendations with Adversarial Training 2022 Ganhor et al. SIGIR ADV-MULTVAE [Code]
Recommendation Unlearning 2022 Chen et al. TheWebConf RecEraser [Code]
Knowledge Neurons in Pretrained Transformers 2022 Dai et al. ACL - [Code]
Memory-Based Model Editing at Scale 2022 Mitchell et al. MLR SERAC [Code]
Forgetting Fast in Recommender Systems 2022 Liu et al. arXiv AltEraser -
Unlearning Nonlinear Graph Classifiers in the Limited Training Data Regime 2022 Pan et al. arXiv - -
Deep Regression Unlearning 2022 Tarun et al. arXiv Blindspot -
Quark: Controllable Text Generation with Reinforced Unlearning 2022 Lu et al. arXiv Quark [Code]
Forget-SVGD: Particle-Based Bayesian Federated Unlearning 2022 Gong et al. DSL Workshop Forget-SVGD -
Machine Unlearning of Federated Clusters 2022 Pan et al. arXiv SCMA -
Machine Unlearning for Image Retrieval: A Generative Scrubbing Approach 2022 Zhang et al. MM - -
Privacy Matters! Efficient Graph Representation Unlearning with Data Removal Guarantee 2022 Cong and Mahdavi - PROJECTOR -
Machine Unlearning: Linear Filtration for Logit-based Classifiers 2022 Baumhauer et al. Machine Learning normalizing filtration -
Deep Unlearning via Randomized Conditionally Independent Hessians 2022 Mehta et al. CVPR L-CODEC [Code]
Challenges and Pitfalls of Bayesian Unlearning 2022 Rawat et al. UPML Workshop - -
Federated Unlearning via Class-Discriminative Pruning 2022 Wang et al. WWW - -
Active forgetting via influence estimation for neural networks 2022 Meng et al. Int. J. Intel. Systems SCRUBBER -
Variational Bayesian unlearning 2022 Nguyen et al. NeurIPS VI -
Revisiting Machine Learning Training Process for Enhanced Data Privacy 2021 Goyal et al. IC3 - -
Knowledge Removal in Sampling-based Bayesian Inference 2021 Fu et al. ICLR - [Code]
Mixed-Privacy Forgetting in Deep Networks 2021 Golatkar et al. CVPR - -
HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning 2021 Schelter et al. SIGMOD HedgeCut [Code]
A Unified PAC-Bayesian Framework for Machine Unlearning via Information Risk Minimization 2021 Jose et al. MLSP PAC-Bayesian -
DeepObliviate: A Powerful Charm for Erasing Data Residual Memory in Deep Neural Networks 2021 He et al. arXiv DEEPOBLIVIATE -
Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations 2021 Izzo et al. AISTATS PRU [Code]
Bayesian Inference Forgetting 2021 Fu et al. arXiv BIF [Code]
Approximate Data Deletion from Machine Learning Models 2021 Izzo et al. AISTATS PRU [Code]
Online Forgetting Process for Linear Regression Models 2021 Li et al. AISTATS FIFD-OLS -
RevFRF: Enabling Cross-domain Random Forest Training with Revocable Federated Learning 2021 Liu et al. IEEE RevFRF -
Coded Machine Unlearning 2021 Aldaghri et al. IEEE Access - -
Machine Unlearning for Random Forests 2021 Brophy and Lowd ICML DaRE RF -
Bayesian Variational Federated Learning and Unlearning in Decentralized Networks 2021 Gong et al. SPAWC - -
Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations 2020 Golatkar et al. ECCV - -
Influence Functions in Deep Learning Are Fragile 2020 Basu et al. arXiv - -
Deep Autoencoding Topic Model With Scalable Hybrid Bayesian Inference 2020 Zhang et al. IEEE DATM -
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks 2020 Golatkar et al. CVPR - -
Uncertainty in Neural Networks: Approximately Bayesian Ensembling 2020 Pearce et al. AISTATS - [Code]
Certified Data Removal from Machine Learning Models 2020 Guo et al. ICML - -
DeltaGrad: Rapid retraining of machine learning models 2020 Wu et al. ICML DeltaGrad [Code]
Making AI Forget You: Data Deletion in Machine Learning 2019 Ginart et al. NeurIPS - -
“Amnesia” – Towards Machine Learning Models That Can Forget User Data Very Fast 2019 Schelter AIDB Workshop - [Code]
A Novel Online Incremental and Decremental Learning Algorithm Based on Variable Support Vector Machine 2019 Chen et al. Cluster Computing - -
Neural Text Degeneration With Unlikelihood Training 2019 Welleck et al. arXiv unlikelihood training [Code]
Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes 2018 Roth et al. IEEE DP [Code]

Data-Driven Approaches

Data-Driven The approaches fallen into this category use data partition, data augmentation and data influence to speed up the retraining process. Methods of attack by data manipulation (e.g. data poisoning) are also included for reference.

Paper Title Year Author Venue Model Code
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks 2022 Di et al. NeurIPS - [Code]
Forget Unlearning: Towards True Data Deletion in Machine Learning 2022 Chourasia et al. ICLR - -
ARCANE: An Efficient Architecture for Exact Machine Unlearning 2022 Yan et al. IJCAI ARCANE -
PUMA: Performance Unchanged Model Augmentation for Training Data Removal 2022 Wu et al. AAAI PUMA -
Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study 2022 Mahadevan and Mathioudakis MAKE - [Code]
Zero-Shot Machine Unlearning 2022 Chundawat et al. arXiv - -
GRAPHEDITOR: An Efficient Graph Representation Learning and Unlearning Approach 2022 Cong and Mahdavi - GRAPHEDITOR [Code]
Fast Model Update for IoT Traffic Anomaly Detection with Machine Unlearning 2022 Fan et al. IEEE IoT-J ViFLa -
Learning to Refit for Convex Learning Problems 2021 Zeng et al. arXiv OPTLEARN -
Fast Yet Effective Machine Unlearning 2021 Ayush et al. arXiv - -
Learning with Selective Forgetting 2021 Shibata et al. IJCAI - -
SSSE: Efficiently Erasing Samples from Trained Machine Learning Models 2021 Peste et al. NeurIPS SSSE -
How Does Data Augmentation Affect Privacy in Machine Learning? 2021 Yu et al. AAAI - [Code]
Coded Machine Unlearning 2021 Aldaghri et al. IEEE - -
Machine Unlearning 2021 Bourtoule et al. IEEE SISA [Code]
How Does Data Augmentation Affect Privacy in Machine Learning? 2021 Yu et al. AAAI - [Code]
Amnesiac Machine Learning 2021 Graves et al. AAAI AmnesiacML [Code]
Unlearnable Examples: Making Personal Data Unexploitable 2021 Huang et al. ICLR - [Code]
Descent-to-Delete: Gradient-Based Methods for Machine Unlearning 2021 Neel et al. ALT - -
Fawkes: Protecting Privacy against Unauthorized Deep Learning Models 2020 Shan et al. USENIX Sec. Sym. Fawkes [Code]
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models 2020 Wu et al. SIGMOD PrIU/PrIU-opt -
DeltaGrad: Rapid retraining of machine learning models 2020 Wu et al. ICML DeltaGrad [Code]

Datasets

Type: Image

Dataset #Items Disk Size Downstream Application #Papers Used
MNIST 70K 11MB Classification 29+ papers
CIFAR 60K 163MB Classification 16+ papers
SVHN 600K 400MB+ Classification 8+ papers
LSUN 69M+ 1TB+ Classification 1 paper
ImageNet 14M+ 166GB Classification 6 papers

Type: Tabular

Dataset #Items Disk Size Downstream Application #Papers Used
Adult 48K+ 10MB Classification 8+ papers
Breast Cancer 569 <1MB Classification 2 papers
Diabetes 442 <1MB Regression 3 papers

Type: Text

Dataset #Items Disk Size Downstream Application #Papers Used
IMDB Review 50k 66MB Sentiment Analysis 1 paper
Reuters 11K+ 73MB Categorization 1 paper
Newsgroup 20K 1GB+ Categorization 1 paper

Type: Sequence

Dataset #Items Disk Size Downstream Application #Papers Used
Epileptic Seizure 11K+ 7MB Timeseries Classification 1 paper
Activity Recognition 10K+ 26MB Timeseries Classification 1 paper
Botnet 72M 3GB+ Clustering 1 paper

Type: Graph

Dataset #Items Disk Size Downstream Application #Papers Used
OGB 100M+ 59MB Classification 2 papers
Cora 2K+ 4.5MB Classification 3 papers
MovieLens 1B+ 3GB+ Recommender Systems 1 paper

Evaluation Metrics

Metrics Formula/Description Usage
Accuracy Accuracy on unlearned model on forget set and retrain set Evaluating the predictive performance of unlearned model
Completeness The overlapping (e.g. Jaccard distance) of output space between the retrained and the unlearned model Evaluating the indistinguishability between model outputs
Unlearn time The amount of time of unlearning request Evaluating the unlearning efficiency
Relearn Time The epochs number required for the unlearned model to reach the accuracy of source model Evaluating the unlearning efficiency (relearn with some data sample)
Layer-wise Distance The weight difference between original model and retrain model Evaluate the indistinguishability between model parameters
Activation Distance An average of the L2-distance between the unlearned model and retrained model’s predicted probabilities on the forget set Evaluating the indistinguishability between model outputs
JS-Divergence Jensen-Shannon divergence between the predictions of the unlearned and retrained model Evaluating the indistinguishability between model outputs
Membership Inference Attack Recall (#detected items / #forget items) Verify the influence of forget data on the unlearned model
ZRF score $\mathcal{ZRF} = 1 - \frac{1}{nf}\sum\limits_{i=0}^{n_f} \mathcal{JS}(M(x_i), T_d(x_i))$ The unlearned model should not intentionally give wrong output $(\mathcal{ZRF} = 0)$ or random output $(\mathcal{ZRF} = 1)$ on the forget item
Anamnesis Index (AIN) $AIN = \frac{r_t (M_u, M_{orig}, \alpha)}{r_t (M_s, M_{orig}, \alpha)}$ Zero-shot machine unlearning
Epistemic Uncertainty if $\mbox{i(w;D) &gt; 0}$, then $\mbox{efficacy}(w;D) = \frac{1}{i(w; D)}$;
otherwise $\mbox{efficacy}(w;D) = \infty$
How much information the model exposes
Model Inversion Attack Visualization Qualitative verifications and evaluations

Disclaimer

Feel free to contact us if you have any queries or exciting news on machine unlearning. In addition, we welcome all researchers to contribute to this repository and further contribute to the knowledge of machine unlearning fields.

If you have some other related references, please feel free to create a Github issue with the paper information. We will glady update the repos according to your suggestions. (You can also create pull requests, but it might take some time for us to do the merge)

awesome-machine-unlearning's People

Contributors

adasken avatar tamlhp avatar rand0musername avatar

Stargazers

jiahaoli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.