Code Monkey home page Code Monkey logo

unlearn-sparse's Introduction

Model Sparsity Can Simplify Machine Unlearning

preprint License: MIT Venue:NeurIPS 2023

GitHub top language GitHub repo size GitHub stars

Image 1
Figure 1: Schematic overview of our proposal on model sparsity driven MU.

This is the official code repository for the NeurIPS 2023 Spotlight paper Model Sparsity Can Simplify Machine Unlearning.

Abstract

In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process to remove the influence of specific examples from a given model. Although exact unlearning can be achieved through complete model retraining using the remaining dataset, the associated computational costs have driven the development of efficient, approximate unlearning techniques. Moving beyond data-centric MU approaches, our study introduces a novel model-based perspective: model sparsification via weight pruning, which is capable of reducing the gap between exact unlearning and approximate unlearning. We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. This leads to a new MU paradigm, termed prune first, then unlearn, which infuses a sparse model prior into the unlearning process. Building on this insight, we also develop a sparsity-aware unlearning method that utilizes sparsity regularization to enhance the training process of approximate unlearning. Extensive experiments show that our proposals consistently benefit MU in various unlearning scenarios. A notable highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest unlearning methods) when using sparsity-aware unlearning. Furthermore, we demonstrate the practical impact of our proposed MU methods in addressing other machine learning challenges, such as defending against backdoor attacks and enhancing transfer learning.

Requirements

conda env create -f environment.yml

Code Structure

The source code is organized as follows:

evaluation: contains MIA evaluation code.

models: contains the model definitions.

utils.py: contains the utility functions.

main_imp.py: contains the code for training and pruning.

main_forget.py: contains the main executable code for unlearning.

main_backdoor.py: contains the main executable code for backdoor cleanse.

Commands

Pruning

OMP

python -u main_imp.py --data ./data --dataset $data --arch $arch --prune_type rewind_lt --rewind_epoch 8 --save_dir ${save_dir} --rate ${rate} --pruning_times 2 --num_workers 8

IMP

python -u main_imp.py --data ./data --dataset $data --arch $arch --prune_type rewind_lt --rewind_epoch 8 --save_dir ${save_dir} --rate 0.2 --pruning_times ${pruning_times} --num_workers 8

SynFlow

python -u main_synflow.py --data ./data --dataset cifar10 --prune_type rewind_lt --rewind_epoch 8 --save_dir ${save_dir} --rate ${rate} --pruning_times 1 --num_workers 8

Unlearning

Retrain

python -u main_forget.py --save_dir ${save_dir} --mask ${mask_path} --unlearn retrain --num_indexes_to_replace 4500 --unlearn_epochs 160 --unlearn_lr 0.1

FT

python -u main_forget.py --save_dir ${save_dir} --mask ${mask_path} --unlearn FT --num_indexes_to_replace 4500 --unlearn_lr 0.01 --unlearn_epochs 10

GA

python -u main_forget.py --save_dir ${save_dir} --mask ${mask_path} --unlearn GA --num_indexes_to_replace 4500 --unlearn_lr 0.0001 --unlearn_epochs 5

FF

python -u main_forget.py --save_dir ${save_dir} --mask ${mask_path} --unlearn fisher_new --num_indexes_to_replace 4500 --alpha ${alpha}

IU

python -u main_forget.py --save_dir ${save_dir} --mask ${mask_path} --unlearn wfisher --num_indexes_to_replace 4500 --alpha ${alpha}

$\ell_1$-sparse

python -u main_forget.py --save_dir ${save_dir} --mask ${mask_path} --unlearn FT_prune --num_indexes_to_replace 4500 --alpha ${alpha} --unlearn_lr 0.01 --unlearn_epochs 10

Trojan model cleanse

python -u main_backdoor.py --save_dir ${save_dir} --mask ${mask_path} --unlearn FT --num_indexes_to_replace 4500

BibTeX

If you find this repository or the ideas presented in our paper useful, please consider citing.

@inproceedings{jia2023model,
  title={Model Sparsity Can Simplify Machine Unlearning},
  author={Jia, Jinghan and Liu, Jiancheng and Ram, Parikshit and Yao, Yuguang and Liu, Gaowen and Liu, Yang and Sharma, Pranay and Liu, Sijia},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

unlearn-sparse's People

Contributors

jinghanjia avatar ljcc0930 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

unlearn-sparse's Issues

Reproducing the result of cifar100

Hello, first and foremost, I would like to express my gratitude for the work you have done. Thank you.
I was trying to reproduce the result of cifar100 in the appendix using retrain and FT. But for some reason, I couldn't reproduce the result of unlearning by FT.

I tested cifar100 under 6 different independent trials. As you can see FT with class-wise forgetting has quite a low score of forget efficacy on Sparse(pruned model)/Dense(unpruned) class-wise forgetting and Sparse random data forgetting.
image

The hyperparameters of these experiments are the same as ciafar10.

for seed in 1 2 3 4 5 6
do
    python -u main_forget.py --save_dir ${save_dir}/random/${seed}_retrain --mask ${base_dir}/${seed}/base/0model_SA_best.pth.tar --unlearn retrain --unlearn_epochs 160 --unlearn_lr 0.1 --dataset $data --class_to_replace -1 --num_indexes_to_replace 4500 --seed $seed
    python -u main_forget.py --save_dir ${save_dir}/random/${seed}_FT_only --mask ${base_dir}/${seed}/base/0model_SA_best.pth.tar --unlearn FT --unlearn_lr 0.04 --unlearn_epochs 10 --dataset $data --class_to_replace -1 --num_indexes_to_replace 4500 --seed $seed
    
    python -u main_forget.py --save_dir ${save_dir}/class/${seed}_retrain --mask ${base_dir}/${seed}/base/0model_SA_best.pth.tar --unlearn retrain --unlearn_epochs 160 --unlearn_lr 0.1 --dataset $data --seed $seed
    python -u main_forget.py --save_dir ${save_dir}/class/${seed}_FT_only --mask ${base_dir}/${seed}/base/0model_SA_best.pth.tar --unlearn FT --unlearn_lr 0.01 --unlearn_epochs 10 --dataset $data --seed $seed

    python -u main_forget.py --save_dir ${save_dir2}/random/${seed}_retrain --mask ${base_dir}/${seed}/base/1model_SA_best.pth.tar --unlearn retrain --unlearn_epochs 160 --unlearn_lr 0.1 --dataset $data --class_to_replace -1 --num_indexes_to_replace 4500 --seed $seed
    python -u main_forget.py --save_dir ${save_dir2}/random/${seed}_FT_only --mask ${base_dir}/${seed}/base/1model_SA_best.pth.tar --unlearn FT --unlearn_lr 0.04 --unlearn_epochs 10 --dataset $data --class_to_replace -1 --num_indexes_to_replace 4500 --seed $seed
    
    python -u main_forget.py --save_dir ${save_dir2}/class/${seed}_retrain --mask ${base_dir}/${seed}/base/1model_SA_best.pth.tar --unlearn retrain --unlearn_epochs 160 --unlearn_lr 0.1 --dataset $data --seed $seed
    python -u main_forget.py --save_dir ${save_dir2}/class/${seed}_FT_only --mask ${base_dir}/${seed}/base/1model_SA_best.pth.tar --unlearn FT --unlearn_lr 0.01 --unlearn_epochs 10 --dataset $data --seed $seed
done

#4
I'm aware that you've used different learning rates of cifar10 for class-wise forgetting(0.01) and random data forgetting(0.04). Is learning rates for cifar100 different?

Thank you.

Challenges in Replicating Model Pruning and Unlearning Results

Hello,

I've been working on replicating some results from your paper using the provided commands and code modifications in the README. However, I am encountering some discrepancies in the results, particularly with the MIA-Efficacy values. Below, I detail the steps taken and the issues encountered.

Steps and Code Used:

  1. Initial pruning of the model was done using the command:
python -u main_imp.py --data ./data --dataset cifar10 --arch resnet18 --prune_type rewind_lt --rewind_epoch 8 --save_dir omp --rate 0.95 --pruning_times 2 --num_workers 8
  1. I modified arg_parser.py with the following additions:
parser.add_argument(
        "--num_indexes_to_replace",
        type=int,
        default=None,
        help="Number of data to forget",
    )
    parser.add_argument(
        "--class_to_replace", type=int, default=None, help="Specific class to forget"
    )
    parser.add_argument(
        "--indexes_to_replace",
        type=list,
        default=None,
        help="Specific index data to forget",
    )

When class_to_replace is set to None, a random selection of indexes equal to the number specified by num_indexes_to_replace will be chosen for the unlearning process.

args = arg_parser.parse_args()
    if args.seed:
        utils.setup_seed(args.seed)
    if args.class_to_replace is None:
        if args.dataset == "cifar10":
            num_indexes_to_replace = args.num_indexes_to_replace
            if args.indexes_to_replace is None:
                args.indexes_to_replace = np.random.choice(
                    45000, num_indexes_to_replace, replace=False
                )
  1. I used the following command to unlearning
python -u main_forget.py --save_dir omp --mask omp/1model_SA_best.pth.tar --unlearn retrain --num_indexes_to_replace 4500 --unlearn_epochs 160 --unlearn_lr 0.1

As shown in Figure 1, the results obtained under the 95%-sparse model are calculated as follows: UA=6.78, RA=99.99, TA=92.77.
image
Figure 1

I would like to inquire which value in SVC_MIA_forget_efficacy represents MIA-Efficacy. Is it the confidence value that closely matches the one mentioned in the paper, or is it the average of these values?

  1. I used the following command to get the unlearning result of the Dense model.
python -u main_forget.py --save_dir omp_dense --mask omp_dense/0model_SA_best.pth.tar --unlearn retrain --num_indexes_to_replace 4500 --unlearn_epochs 160 --unlearn_lr 0.1

As shown in Figure 2, the results under the Dense model are calculated as follows: UA=4.9, RA=99.52, TA=94.62.
image
Figure 2

  1. The above results are relatively close to those reported in the paper; however, I conducted separate tests on GA for both the 95%-sparse model and the Dense Model by the following commands:
    sparse model command:
python -u main_forget.py --save_dir omp --mask omp/1model_SA_best.pth.tar --unlearn GA --num_indexes_to_replace 4500 --unlearn_lr 0.0001 --unlearn_epochs 5

Dense Model command:

python -u main_forget.py --save_dir omp_dense --mask omp_dense/0model_SA_best.pth.tar --unlearn GA --num_indexes_to_replace 4500 --unlearn_lr 0.0001 --unlearn_epochs 5

As shown in Figure 3, the results for the 95%-sparse model are calculated as follows: UA=0.62, RA=99.39, TA=94.23. The UA value differs significantly from the 5.62±0.46 reported in the paper. Additionally, the MIA-Efficacy, whether it's the average or a specific value, shows a considerable discrepancy from the reported 11.76±0.52.
image
Figure 3 95%-sparse model result

As illustrated in Figure 4, the results for the Dense Model are calculated as follows: UA=0.78, RA=99.52, TA=94.52. The UA value shows a significant difference from the 7.54±0.29 mentioned in the paper. Moreover, the average MIA-Efficacy is 8.5, which slightly deviates from the 10.04±0.31 reported in the paper.
image
Figure 4 Dense Model result

  1. While running FF under the 95%-sparse model, since the specific value for alpha was not known, we set it to 10^(-8) based on the description in the 'Additional training details of MU' section of the paper. The result is shown in Figure 5.
python -u main_forget.py --save_dir omp --mask omp/1model_SA_best.pth.tar --unlearn fisher_new --num_indexes_to_replace 4500 --alpha 0.00000001

image
Figure 5
As shown in Figure 5, the results exhibit some discrepancies compared to the results shown in Figure 6 from the paper.

image
Figure 6

Questions:

  1. What are the specific parameter settings for each unlearning method?
  2. How is MIA-Efficacy calculated in SVC_MIA_forget_efficacy?
  3. What could be the reasons for the discrepancies in replicating the results?

Thank you for your time and help.

Best regards,
David

cannot find the mask model

Hello @ljcc0930 , I'm very happy to read your excellent work. I wonder to ask you which file is the mask model?
when i run the retrain after pruning,

python -u main_forget.py --save_dir "./cifar10_results" \
    --mask "./cifar10_results/1model_SA_best.pth.tar" \
    --unlearn retrain --num_indexes_to_replace 4500 --unlearn_epochs 160 \
    --unlearn_lr 0.1 2>&1 | tee -a ./logs/Retrain.log

i have this issue

setup random seed = 2
4500
setup random seed = 2
40500
Pruning with custom mask (all conv layers)
* remain weight ratio =  50.0 %
Traceback (most recent call last):
  File "/home/rram/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 348, in _check_seekable
    f.seek(f.tell())
AttributeError: 'NoneType' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/machine_unlearning/examples/Unlearn-Sparse/main_forget.py", line 248, in <module>
    main()
  File "/machine_unlearning/examples/Unlearn-Sparse/main_forget.py", line 151, in main
    unlearn_method(unlearn_data_loaders, model, criterion, args)
  File "/machine_unlearning/examples/Unlearn-Sparse/unlearn/impl.py", line 63, in _wrapped
    initialization = torch.load(
  File "/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 771, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 275, in _open_file_like
    return _open_buffer_reader(name_or_buffer)
  File "/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 260, in __init__
    _check_seekable(buffer)
  File "/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 351, in _check_seekable
    raise_err_msg(["seek", "tell"], e)
  File "/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 344, in raise_err_msg
    raise type(e)(msg)
AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

I have the trained files after run your pruning code
image

Can you help me? I'm looking forward to your reply~

Difference in commands for running OMP and IMP Pruning

There seems to be no difference in the commands for using OMP and IMP pruning as mentioned in the README (except for the fact that the OMP command takes rate as an argument and the IMP command takes pruning_times as an argument:

OMP

python -u main_imp.py --data ./data --dataset $data --arch $arch --prune_type rewind_lt --rewind_epoch 8 --save_dir ${save_dir} --rate ${rate} --pruning_times 2 --num_workers 8

IMP

python -u main_imp.py --data ./data --dataset $data --arch $arch --prune_type rewind_lt --rewind_epoch 8 --save_dir ${save_dir} --rate 0.2 --pruning_times ${pruning_times} --num_workers 8

Is this a mistake? Which method would the command mentioned run by default? I am not able to determine that by looking at the code. I would like to run OMP pruning, what would be the correct command to do so?

Mistake in code?

Hello, is there a problem in the line 259 of dataset.py? It appears as if indexes_to_replace has been considered to be an int and is supposed to be a list.. was the or clause supposed to be added to line 256 with num_indexes_to_replace? Thanks!

Reproducing the result of Dense Network with FT

First of all, I like your work. It is impressive.

I'm trying to reproduce your result using Dense Network with the unlearn method FT. However, I wasn't able to reproduce your result using the default Batch size. I could reproduce your result using a batch size of 64 for random data forgetting and a batch size of 512 for Class-wise forgetting. Is this the correct Batch size you used? or did I do something wrong?

Thank You.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.