Awesome Machine Unlearning

A collection of academic articles, published methodology, and datasets on the subject of machine unlearning.

Awesome-Machine-Unlearning

A sortable version is available here: https://awesome-machine-unlearning.github.io/

Please read and cite our paper:

Nguyen, T.T., Huynh, T.T., Nguyen, P.L., Liew, A.W.C., Yin, H. and Nguyen, Q.V.H., 2022. A Survey of Machine Unlearning. arXiv preprint arXiv:2209.02299.

Citation

@article{nguyen2022survey,
  title={A Survey of Machine Unlearning},
  author={Nguyen, Thanh Tam and Huynh, Thanh Trung and Nguyen, Phi Le and Liew, Alan Wee-Chung and Yin, Hongzhi and Nguyen, Quoc Viet Hung},
  journal={arXiv preprint arXiv:2209.02299},
  year={2022}
}

A Framework of Machine Unlearning

Existing Surveys

Paper Title	Venue	Year
An Introduction to Machine Unlearning	arXiv	2022
Machine Unlearning: Its Need and Implementation Strategies	IC3	2021
Making machine learning forget	Annual Privacy Forum	2019
“Amnesia” - A Selection of Machine Learning Models That Can Forget User Data Very Fast	CIDR	2019
Humans forget, machines remember: Artificial intelligence and the Right to Be Forgotten	Computer Law & Security Review	2018
Algorithms that remember: model inversion attacks and data protection law	Philosophical Transactions of the Royal Society A	2018

Model-Agnostic Approaches

Model-agnostic machine unlearning methodologies include unlearning processes or frameworks that are applicable for different models. In some cases, they provide theoretical guarantees for only a class of models (e.g. linear models). But we still consider them model-agnostic as their core ideas are applicable to complex models (e.g. deep neural networks) with practical results.

Paper Title	Year	Author	Venue	Model	Code
Certified Data Removal in Sum-Product Networks	2022	Becker and Liebig	ICKG	UNLEARNSPN	[Code]
Learning with Recoverable Forgetting	2022	Ye et al.	ECCV	LIRF	-
Continual Learning and Private Unlearning	2022	Liu et al.	CoLLAs	CLPU	[Code]
Verifiable and Provably Secure Machine Unlearning	2022	Eisenhofer et al.	arXiv	-	[Code]
VeriFi: Towards Verifiable Federated Unlearning	2022	Gao et al.	arXiv	VERIFI	-
FedRecover: Recovering from Poisoning Attacks in Federated Learning using Historical Information	2022	Cao et al.	S&P	FedRecover	-
Fast Yet Effective Machine Unlearning	2022	Tarun et al.	arXiv	UNSIR	-
Membership Inference via Backdooring	2022	Hu et al.	IJCAI	MIB	[Code]
Forget Unlearning: Towards True Data-Deletion in Machine Learning	2022	Chourasia et al.	ICLR	-	-
Zero-Shot Machine Unlearning	2022	Chundawat et al.	arXiv	-	-
Efficient Attribute Unlearning: Towards Selective Removal of Input Attributes from Feature Representations	2022	Guo et al.	arXiv	attribute unlearning	-
Few-Shot Unlearning	2022	Yoon et al.	ICLR	-	-
Federated Unlearning: How to Efficiently Erase a Client in FL?	2022	Halimi et al.	UpML Workshop	-	-
Machine Unlearning Method Based On Projection Residual	2022	Cao et al.	DSAA	-	-
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning	2022	Marchant et al.	AAAI	-	[Code]
Athena: Probabilistic Verification of Machine Unlearning	2022	Sommer et al.	PoPETs	ATHENA	-
FP2-MIA: A Membership Inference Attack Free of Posterior Probability in Machine Unlearning	2022	Lu et al.	ProvSec	FP2-MIA	-
Deletion Inference, Reconstruction, and Compliance in Machine (Un)Learning	2022	Gao et al.	PETS	-	-
Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization	2022	Zhang et al.	NeurIPS	PCMU	-
The Right to be Forgotten in Federated Learning: An Efficient Realization with Rapid Retraining	2022	Liu et al.	INFOCOM	-	[Code]
Backdoor Defense with Machine Unlearning	2022	Liu et al.	INFOCOM	BAERASER	-
Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten	2022	Nguyen et al.	ASIA CCS	MCU	-
Federated Unlearning for On-Device Recommendation	2022	Yuan et al.	arXiv	-	-
Can Bad Teaching Induce Forgetting? Unlearning in Deep Networks using an Incompetent Teacher	2022	Chundawat et al.	arXiv	-	-
Efficient Two-Stage Model Retraining for Machine Unlearning	2022	Kim and Woo	CVPR Workshop	-	-
Learn to Forget: Machine Unlearning Via Neuron Masking	2021	Ma et al.	IEEE	Forsaken	-
Adaptive Machine Unlearning	2021	Gupta et al.	NeurIPS	-	[Code]
Descent-to-Delete: Gradient-Based Methods for Machine Unlearning	2021	Neel et al.	ALT	-	-
Remember What You Want to Forget: Algorithms for Machine Unlearning	2021	Sekhari et al.	NeurIPS	-	-
FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models	2021	Liu et al.	IWQoS	FedEraser	-
Federated Unlearning	2021	Liu et al.	IWQoS	FedEraser	[Code]
Machine Unlearning via Algorithmic Stability	2021	Ullah et al.	COLT	TV	-
EMA: Auditing Data Removal from Trained Models	2021	Huang et al.	MICCAI	EMA	[Code]
Knowledge-Adaptation Priors	2021	Khan and Swaroop	NeurIPS	K-prior	[Code]
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models	2020	Wu et al.	NeurIPS	PrIU	-
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks	2020	Golatkar et al.	CVPR	-	-
Learn to Forget: User-Level Memorization Elimination in Federated Learning	2020	Liu et al.	arXiv	Forsaken	-
Certified Data Removal from Machine Learning Models	2020	Guo et al.	ICML	-	-
Class Clown: Data Redaction in Machine Unlearning at Enterprise Scale	2020	Felps et al.	arXiv	-	-
A Novel Online Incremental and Decremental Learning Algorithm Based on Variable Support Vector Machine	2019	Chen et al.	Cluster Computing	-	-
Making AI Forget You: Data Deletion in Machine Learning	2019	Ginart et al.	NeurIPS	-	-
Lifelong Anomaly Detection Through Unlearning	2019	Du et al.	CCS	-	-
Learning Not to Learn: Training Deep Neural Networks With Biased Data	2019	Kim et al.	CVPR	-	-
Efficient Repair of Polluted Machine Learning Systems via Causal Unlearning	2018	Cao et al.	ASIACCS	KARMA	[Code]
Understanding Black-box Predictions via Influence Functions	2017	Koh et al.	ICML	-	[Code]
Towards Making Systems Forget with Machine Unlearning	2015	Cao and Yang	S&P	-
Towards Making Systems Forget with Machine Unlearning	2015	Cao et al.	S&P	-	-
Incremental and decremental training for linear classification	2014	Tsai et al.	KDD	-	[Code]
Multiple Incremental Decremental Learning of Support Vector Machines	2009	Karasuyama et al.	NIPS	-	-
Incremental and Decremental Learning for Linear Support Vector Machines	2007	Romero et al.	ICANN	-	-
Decremental Learning Algorithms for Nonlinear Langrangian and Least Squares Support Vector Machines	2007	Duan et al.	OSB	-	-
Multicategory Incremental Proximal Support Vector Classifiers	2003	Tveit et al.	KES	-	-
Incremental and Decremental Proximal Support Vector Classification using Decay Coefficients	2003	Tveit et al.	DaWak	-	-
Incremental and Decremental Support Vector Machine Learning	2000	Cauwenberg et al.	NeurIPS	-	-

Model-Intrinsic Approaches

The model-intrinsic approaches include unlearning methods designed for a specific type of models. Although they are model-intrinsic, their applications are not necessarily narrow, as many ML models can share the same type.

Paper Title	Year	Author	Venue	Model	Code
Unrolling SGD: Understanding Factors Influencing Machine Unlearning	2022	Thudi et al.	EuroS&P	-	[Code]
Graph Unlearning	2022	Chen et al.	CCS	GraphEraser	[Code]
Certified Graph Unlearning	2022	Chien et al.	GLFrontiers Workshop	-	[Code]
Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification	2022	Bevan and Atapour-Abarghouei	ICML	-	[Code]
Near-Optimal Task Selection for Meta-Learning with Mutual Information and Online Variational Bayesian Unlearning	2022	Chen et al.	AISTATS	-	-
Unlearning Protected User Attributes in Recommendations with Adversarial Training	2022	Ganhor et al.	SIGIR	ADV-MULTVAE	[Code]
Recommendation Unlearning	2022	Chen et al.	TheWebConf	RecEraser	[Code]
Knowledge Neurons in Pretrained Transformers	2022	Dai et al.	ACL	-	[Code]
Memory-Based Model Editing at Scale	2022	Mitchell et al.	MLR	SERAC	[Code]
Forgetting Fast in Recommender Systems	2022	Liu et al.	arXiv	AltEraser	-
Unlearning Nonlinear Graph Classifiers in the Limited Training Data Regime	2022	Pan et al.	arXiv	-	-
Deep Regression Unlearning	2022	Tarun et al.	arXiv	Blindspot	-
Quark: Controllable Text Generation with Reinforced Unlearning	2022	Lu et al.	arXiv	Quark	[Code]
Forget-SVGD: Particle-Based Bayesian Federated Unlearning	2022	Gong et al.	DSL Workshop	Forget-SVGD	-
Machine Unlearning of Federated Clusters	2022	Pan et al.	arXiv	SCMA	-
Machine Unlearning for Image Retrieval: A Generative Scrubbing Approach	2022	Zhang et al.	MM	-	-
Privacy Matters! Efficient Graph Representation Unlearning with Data Removal Guarantee	2022	Cong and Mahdavi	-	PROJECTOR	-
Machine Unlearning: Linear Filtration for Logit-based Classifiers	2022	Baumhauer et al.	Machine Learning	normalizing filtration	-
Deep Unlearning via Randomized Conditionally Independent Hessians	2022	Mehta et al.	CVPR	L-CODEC	[Code]
Challenges and Pitfalls of Bayesian Unlearning	2022	Rawat et al.	UPML Workshop	-	-
Federated Unlearning via Class-Discriminative Pruning	2022	Wang et al.	WWW	-	-
Active forgetting via influence estimation for neural networks	2022	Meng et al.	Int. J. Intel. Systems	SCRUBBER	-
Variational Bayesian unlearning	2022	Nguyen et al.	NeurIPS	VI	-
Revisiting Machine Learning Training Process for Enhanced Data Privacy	2021	Goyal et al.	IC3	-	-
Knowledge Removal in Sampling-based Bayesian Inference	2021	Fu et al.	ICLR	-	[Code]
Mixed-Privacy Forgetting in Deep Networks	2021	Golatkar et al.	CVPR	-	-
HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning	2021	Schelter et al.	SIGMOD	HedgeCut	[Code]
A Unified PAC-Bayesian Framework for Machine Unlearning via Information Risk Minimization	2021	Jose et al.	MLSP	PAC-Bayesian	-
DeepObliviate: A Powerful Charm for Erasing Data Residual Memory in Deep Neural Networks	2021	He et al.	arXiv	DEEPOBLIVIATE	-
Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations	2021	Izzo et al.	AISTATS	PRU	[Code]
Bayesian Inference Forgetting	2021	Fu et al.	arXiv	BIF	[Code]
Approximate Data Deletion from Machine Learning Models	2021	Izzo et al.	AISTATS	PRU	[Code]
Online Forgetting Process for Linear Regression Models	2021	Li et al.	AISTATS	FIFD-OLS	-
RevFRF: Enabling Cross-domain Random Forest Training with Revocable Federated Learning	2021	Liu et al.	IEEE	RevFRF	-
Coded Machine Unlearning	2021	Aldaghri et al.	IEEE Access	-	-
Machine Unlearning for Random Forests	2021	Brophy and Lowd	ICML	DaRE RF	-
Bayesian Variational Federated Learning and Unlearning in Decentralized Networks	2021	Gong et al.	SPAWC	-	-
Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations	2020	Golatkar et al.	ECCV	-	-
Influence Functions in Deep Learning Are Fragile	2020	Basu et al.	arXiv	-	-
Deep Autoencoding Topic Model With Scalable Hybrid Bayesian Inference	2020	Zhang et al.	IEEE	DATM	-
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks	2020	Golatkar et al.	CVPR	-	-
Uncertainty in Neural Networks: Approximately Bayesian Ensembling	2020	Pearce et al.	AISTATS	-	[Code]
Certified Data Removal from Machine Learning Models	2020	Guo et al.	ICML	-	-
DeltaGrad: Rapid retraining of machine learning models	2020	Wu et al.	ICML	DeltaGrad	[Code]
Making AI Forget You: Data Deletion in Machine Learning	2019	Ginart et al.	NeurIPS	-	-
“Amnesia” – Towards Machine Learning Models That Can Forget User Data Very Fast	2019	Schelter	AIDB Workshop	-	[Code]
A Novel Online Incremental and Decremental Learning Algorithm Based on Variable Support Vector Machine	2019	Chen et al.	Cluster Computing	-	-
Neural Text Degeneration With Unlikelihood Training	2019	Welleck et al.	arXiv	unlikelihood training	[Code]
Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes	2018	Roth et al.	IEEE	DP	[Code]

Data-Driven Approaches

The approaches fallen into this category use data partition, data augmentation and data influence to speed up the retraining process. Methods of attack by data manipulation (e.g. data poisoning) are also included for reference.

Paper Title	Year	Author	Venue	Model	Code
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks	2022	Di et al.	NeurIPS	-	[Code]
Forget Unlearning: Towards True Data Deletion in Machine Learning	2022	Chourasia et al.	ICLR	-	-
ARCANE: An Efficient Architecture for Exact Machine Unlearning	2022	Yan et al.	IJCAI	ARCANE	-
PUMA: Performance Unchanged Model Augmentation for Training Data Removal	2022	Wu et al.	AAAI	PUMA	-
Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study	2022	Mahadevan and Mathioudakis	MAKE	-	[Code]
Zero-Shot Machine Unlearning	2022	Chundawat et al.	arXiv	-	-
GRAPHEDITOR: An Efficient Graph Representation Learning and Unlearning Approach	2022	Cong and Mahdavi	-	GRAPHEDITOR	[Code]
Fast Model Update for IoT Traffic Anomaly Detection with Machine Unlearning	2022	Fan et al.	IEEE IoT-J	ViFLa	-
Learning to Refit for Convex Learning Problems	2021	Zeng et al.	arXiv	OPTLEARN	-
Fast Yet Effective Machine Unlearning	2021	Ayush et al.	arXiv	-	-
Learning with Selective Forgetting	2021	Shibata et al.	IJCAI	-	-
SSSE: Efficiently Erasing Samples from Trained Machine Learning Models	2021	Peste et al.	NeurIPS	SSSE	-
How Does Data Augmentation Affect Privacy in Machine Learning?	2021	Yu et al.	AAAI	-	[Code]
Coded Machine Unlearning	2021	Aldaghri et al.	IEEE	-	-
Machine Unlearning	2021	Bourtoule et al.	IEEE	SISA	[Code]
How Does Data Augmentation Affect Privacy in Machine Learning?	2021	Yu et al.	AAAI	-	[Code]
Amnesiac Machine Learning	2021	Graves et al.	AAAI	AmnesiacML	[Code]
Unlearnable Examples: Making Personal Data Unexploitable	2021	Huang et al.	ICLR	-	[Code]
Descent-to-Delete: Gradient-Based Methods for Machine Unlearning	2021	Neel et al.	ALT	-	-
Fawkes: Protecting Privacy against Unauthorized Deep Learning Models	2020	Shan et al.	USENIX Sec. Sym.	Fawkes	[Code]
PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models	2020	Wu et al.	SIGMOD	PrIU/PrIU-opt	-
DeltaGrad: Rapid retraining of machine learning models	2020	Wu et al.	ICML	DeltaGrad	[Code]

Datasets

Type: Image

Dataset	#Items	Disk Size	Downstream Application	#Papers Used
MNIST	70K	11MB	Classification	29+ papers
CIFAR	60K	163MB	Classification	16+ papers
SVHN	600K	400MB+	Classification	8+ papers
LSUN	69M+	1TB+	Classification	1 paper
ImageNet	14M+	166GB	Classification	6 papers

Type: Tabular

Dataset	#Items	Disk Size	Downstream Application	#Papers Used
Adult	48K+	10MB	Classification	8+ papers
Breast Cancer	569	<1MB	Classification	2 papers
Diabetes	442	<1MB	Regression	3 papers

Type: Text

Dataset	#Items	Disk Size	Downstream Application	#Papers Used
IMDB Review	50k	66MB	Sentiment Analysis	1 paper
Reuters	11K+	73MB	Categorization	1 paper
Newsgroup	20K	1GB+	Categorization	1 paper

Type: Sequence

Dataset	#Items	Disk Size	Downstream Application	#Papers Used
Epileptic Seizure	11K+	7MB	Timeseries Classification	1 paper
Activity Recognition	10K+	26MB	Timeseries Classification	1 paper
Botnet	72M	3GB+	Clustering	1 paper

Type: Graph

Dataset	#Items	Disk Size	Downstream Application	#Papers Used
OGB	100M+	59MB	Classification	2 papers
Cora	2K+	4.5MB	Classification	3 papers
MovieLens	1B+	3GB+	Recommender Systems	1 paper

Evaluation Metrics

Metrics	Formula/Description	Usage
Accuracy	Accuracy on unlearned model on forget set and retrain set	Evaluating the predictive performance of unlearned model
Completeness	The overlapping (e.g. Jaccard distance) of output space between the retrained and the unlearned model	Evaluating the indistinguishability between model outputs
Unlearn time	The amount of time of unlearning request	Evaluating the unlearning efficiency
Relearn Time	The epochs number required for the unlearned model to reach the accuracy of source model	Evaluating the unlearning efficiency (relearn with some data sample)
Layer-wise Distance	The weight difference between original model and retrain model	Evaluate the indistinguishability between model parameters
Activation Distance	An average of the L2-distance between the unlearned model and retrained model’s predicted probabilities on the forget set	Evaluating the indistinguishability between model outputs
JS-Divergence	Jensen-Shannon divergence between the predictions of the unlearned and retrained model	Evaluating the indistinguishability between model outputs
Membership Inference Attack	Recall (#detected items / #forget items)	Verify the influence of forget data on the unlearned model
ZRF score	$\mathcal{ZRF} = 1 - \frac{1}{nf}\sum\limits_{i=0}^{n_f} \mathcal{JS}(M(x_i), T_d(x_i))$	The unlearned model should not intentionally give wrong output $(\mathcal{ZRF} = 0)$ or random output $(\mathcal{ZRF} = 1)$ on the forget item
Anamnesis Index (AIN)	$AIN = \frac{r_t (M_u, M_{orig}, \alpha)}{r_t (M_s, M_{orig}, \alpha)}$	Zero-shot machine unlearning
Epistemic Uncertainty	if $\mbox{i(w;D) > 0}$, then $\mbox{efficacy}(w;D) = \frac{1}{i(w; D)}$; otherwise $\mbox{efficacy}(w;D) = \infty$	How much information the model exposes
Model Inversion Attack	Visualization	Qualitative verifications and evaluations

Disclaimer

Feel free to contact us if you have any queries or exciting news on machine unlearning. In addition, we welcome all researchers to contribute to this repository and further contribute to the knowledge of machine unlearning fields.

If you have some other related references, please feel free to create a Github issue with the paper information. We will glady update the repos according to your suggestions. (You can also create pull requests, but it might take some time for us to do the merge)

lj-hao / awesome-machine-unlearning Goto Github PK