bboylyg / nad Goto Github PK

This is an implementation demo of the ICLR 2021 paper [Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks](https://openreview.net/pdf?id=9l0K4OM-oXE) in PyTorch.

Python 100.00%

backdoor-attacks backdoor-defense deep-neural-networks erasing-backdoor-triggers pytorch

nad's Introduction

Hi there, I am Yige Li👋

I am a research fellow at the School of Computing and Information Systems at Singapore Management University supervised by Prof. Jun Sun. I also work closely with the Prof. Xingjun Ma at Fudan university. I have completed my Ph.D. degree at Xidian University supervised by Prof. Xixiang Lyu. Research publications in Google Scholar.

🔭 My research mainly focus on:

Understanding the eﬀectiveness of backdoor attacks
Robust training against backdoor attacks
Design and implement a general defense framework for backdoor attacks

🌱 Publications:

Yige Li, Xingjun Ma, et al., “Multi-Trigger Backdoor Attacks: More Triggers, More Threats”, submitting, 2024.
Yige Li, Xixiang Lyu, et al., “Reconstructive Neuron Pruning for Backdoor Defense”, ICML 2023.
Yige Li, Xixiang Lyu, et al., “Anti-Backdoor Learning: Training Clean Models on Poisoned Data”, NeurIPS 2021.
Yige Li, Xixiang Lyu, et al., “Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks”, ICLR 2021.

⚡ Significance of our works:

Neural Attention Distillation (NAD)
- A simple and universal method against 6 state-of-the-art backdoor attacks via knowledge distillation
- Only a small amount of clean data is required (5%)
- Only a few epochs of fine-tuning (2-10 epochs) are required
Anti-Backdoor Learning (ABL)
- Simple, effective, and universal, can defend against 10 state-of-the-art backdoor attacks
- 1% isolation data is required
- A novel stratrgy benefit companies, research institutes, or government agencies to train backdoor-free machine learning models

📫 How to reach me:

nad's People

Contributors

Stargazers

Watchers

Forkers

milkigit xrosliang mcdragon trendingtechnology killsking tdteach bxz9200 yizeng623 lance10t sunbing7 naymyatmin vanadisarya yotta2020

nad's Issues

What config did you use to have the model return the activations?

Hello,

I'm trying to understand why your model is returning 3 activations along with the outputs when running inferences.

line 27 of main.py: activation1_s, activation2_s, activation3_s, output_s = snet(img)

Was there some thought process in returning the last 3 activations?

The reproducibility of experiments in the paper

Hi author, I have some questions about this paper and the public code.

Your v1 version of arXiv paper is 15 Jan 2021 (https://arxiv.org/abs/2101.05930v1), and your first commit on the Github is 21 Jan 2021 (a5a10f1).

The most important loss that you proposed in your paper is Eq(3), which is controlled by the hyperparameter beta.

However, in your first commit code, this part is written as :

cls_loss = criterionCls(output_s, target)  
at3_loss = criterionAT(activation3_s, activation3_t).detach() * opt.beta3  
at2_loss = criterionAT(activation2_s, activation2_t).detach() * opt.beta2  
at1_loss = criterionAT(activation1_s, activation1_t).detach() * opt.beta1  
at_loss = at1_loss + at2_loss + at3_loss + cls_loss

The detach() in Pytorch is used to return a new Tensor, detached from the current graph (see the doc).. So if the model uses this at_loss to optimize, the at1_loss, at2_loss, at3_loss will not contribute anything to the training of the model. The user @zeabin has submit the issue #8 and fortunately, you fixed it in 6907ea2 at 10 Jan 2022:

at3_loss = criterionAT(activation3_s, activation3_t.detach()) * opt.beta3
at2_loss = criterionAT(activation2_s, activation2_t.detach()) * opt.beta2
at1_loss = criterionAT(activation1_s, activation1_t.detach()) * opt.beta1

Based on the above facts, my question is whether the results in your paper are based on the first commit code or the fixed code.

If the experiments were run with the correct code, why the first commit is wrong?
If the experiments were run with the wrong code, the idea does not work in your paper, it's just a fine-tuning. I have valid reasons to doubt the reliability of the results.
For the papers published from 21 Jan 2021 -- 10 Jan 2022, and use NAD as a comparison, whether the results of these papers are reliable. Because during this time, the code in this repository is wrong.

I will appreciate it if you can solve my above questions.

Why is normalization not applied in the data preprocessing?

I'd like to thank you for your great work on this project.

I noticed that during data preprocessing, normalization is not applied to the images. This seems unusual, as normalization is a common preprocessing step that helps improve the performance and stability of deep learning models.

Could you please provide an explanation for this decision? Is there a specific reason why normalization was not used in this case? Would it be beneficial to include normalization, or is it intentionally omitted for some reason?

Thank you in advance for your response.

performance on GTSRB

Hi! Thanks for your great work!

Have you tested the defense effect of NAD against attacks other than refool on gtsrb ? such as badnets, blend, sig,
If so, could you share the experimental results.
I'd appreciate it very much!

best!

Trojan trigger（/NAD/trigger/best_square_trigger_cifar10.npz） not effective

I have been using your open source project NAD for defending against Trojan attacks in deep learning models, and I have noticed a potential issue with the trojan trigger provided in the project.

Specifically, I have found that the trojan trigger provided in the project may not be effective against Trojan attacks, as it can be easily detected and eliminated by a simple fine-tuning process. By generating a teacher model and fine-tuning it on the target dataset, I was able to significantly reduce the ASR of the model with the trojan trigger.

Therefore, I suspect that the trojan trigger provided in the project may not be a true Trojan trigger, or at least may not be effective against sophisticated Trojan attacks. I would appreciate it if you could provide more information or guidance on how to improve the effectiveness of the trojan trigger.

Thank you for your time and attention.

RuntimeError: view size is not compatible with input tensor's size and stride

Hello,

When I deploy and try to run the codes, here comes an issue:

----------- Train Initialization --------------
epoch: 0  lr: 0.1000
Traceback (most recent call last):
  File "main.py", line 204, in <module>
    main()
  File "main.py", line 201, in main
    train(opt)
  File "main.py", line 171, in train
    test(opt, test_clean_loader, test_bad_loader, nets,
  File "main.py", line 72, in test
    prec1, prec5 = accuracy(output_s, target, topk=(1, 5))
  File "/home/longkangli/NAD/utils/util.py", line 63, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Then I change the code according to the error messages.
In the File: "~/NAD/utils/util.py", line 63, in accuracy
change: "correct_k = correct[:k].view(-1).float().sum(0)" to "correct_k = correct[:k].contiguous().view(-1).float().sum(0)".
Then it works.

My environment: python3.8, py-torch1.7, cuda10.2.

Not sure if the problem comes from the different versions of environments. Anyway~

Regards.

A few question

Hello. I am looking for the possible solution for backdoor attack. I've read the interesting and promising research, but still in confusion.

Why distillation with pruned model as teacher can purify the poisoned model, do you have more detailed insights?
Have you have tried bigger model and dataset?
There is an attack against the pruning-defense(through pruning in the training period, however unrealistic in real world), what do you think of such attackers which are specially designed for pruning.

Looking for your reply.

the loss function is not useful in the experiment?

Hello, I'm very interested in this paper, when I see the main.py, the three at_loss are used .detach() to out of the calculate graph in the PyTorch. So I delete the at1_loss、at2_loss、at3_loss in the loss function. But, when I run the changed code, the ASP is still very low. I think the at_loss is not useful in the code. The training dataset in the main code is the clean dataset, not the backdoor dataset, so the NAD ASR is very low. However, the training dataset in the train_badnets code uses the backdoor dataset, so the baseline ASR is high. I changed the training dataset in the main code to the backdoor dataset. Unfortunately, the NAD is not useful in the backdoor dataset.

is it possible to transfer NAD to other models?( for example resnet18)

I would like to inquire about the possibility of transferring the NAD technique to other models, specifically ResNet18. Currently, NAD is implemented with WideRes, but I am interested in exploring its applicability to different architectures.

Could you provide insights or guidance on whether it is feasible to adapt NAD to models other than the original one it was designed for? If so, are there any specific considerations or modifications that need to be taken into account? I would appreciate any information or recommendations regarding this matter.

Thank you!

How does the attention loss work

Hi, thanks for sharing the code.

I notice that detach() is called before backward() for attention loss in train_step and the back propagation should not go through attention loss. So how can the attention loss work?

NAD/main.py

Lines 30 to 34 in d61e4d7

    
           cls_loss = criterionCls(output_s, target) 
        
           at3_loss = criterionAT(activation3_s, activation3_t).detach() * opt.beta3 
        
           at2_loss = criterionAT(activation2_s, activation2_t).detach() * opt.beta2 
        
           at1_loss = criterionAT(activation1_s, activation1_t).detach() * opt.beta1 
        
           at_loss = at1_loss + at2_loss + at3_loss + cls_loss

How to get the teacher model?

Hello, I have read your paper and learnt about NAD, but I have a question that how to get the teacher model? In your paper, it say "The teacher network can be obtained by an independent finetune process on the same clean data", but I have no idea how to "finetune". If you just finetune the last layer or all layers using clean data, I think it's hard to get a clean network, because the gradient of loss function in the clean data may be very low with convergency model

Wheather the so-called attebtion distillation machanism work?

Hi, dear author.

Based on the codes you provided, i have done some experiments by changing the hyper-parameters.

However, When I mask the feature attention loss in training, and only use the cls_loss as follows, it still could get a good results.
# in train_step function
cls_loss = criterionCls(output_s, target)
# at3_loss = criterionAT(activation3_s, activation3_t.detach()) * opt.beta3
# at2_loss = criterionAT(activation2_s, activation2_t.detach()) * opt.beta2
# at1_loss = criterionAT(activation1_s, activation1_t.detach()) * opt.beta1
# at_loss = at1_loss + at2_loss + at3_loss + cls_loss
at_loss = cls_loss

The results are:
epoch: 19 lr: 0.0100
testing the models......
[clean]Prec@1: 83.41
[bad]Prec@1: 6.18

I wonder whether the attention distillation works in the processes, because just retraininig on small clean set could remove the backdoor in your checkpoints.

Besides, when i am applying your meshnism in other networks, It dosen't work no matter how i change the hyperparameters.

How to train CL and Refool backdoored model?

Hello,
I'm very interested in this paper, and I try to reproduce the work.
When I tried to train the backdoored model on CIFAR-10, I followed the advice in readme.md and successfully trained the BadNets, Trojan, Blend and SIG backdoored models mentioned in the paper .
However, when I tried to train the Clean-label and Refool backdoored models, I found that it seemed impossible to do this by simply modifying parameters in configs.py. Then I went to the CL and Refool links mentioned in readme.md, but I still didn't know how to implement these two backdoor attacks.
Now, I have absolutely no idea how to train these two backdoored models, if you could give me some advice and help I would be very grateful

Results of BadNet and Fine-tuning

Hi,

Thanks for providing the code for us. I tried to rerun the code to replicate the baseline for further improvement. But the results are pretty different. My major changes focus on two aspects:

Fix the random seeds at main function of train_badnet.py and main.py:

def main():
    seed = 93
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

Disable the default path of t_model and s_model in config.py so I can retrain the model.

I don't change any hyperparameters and the script for train_badnet.py is

OUTPUT=results/nad/backdoor/

python train_badnet.py \
--checkpoint_root $OUTPUT \
--log_root $OUTPUT \

The result in csv file is:

epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
1,57.01111111111111,99.67777777777778,0.00816304203728214
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
2,62.666666666666664,99.77777777777777,0.006588545432469497
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
3,70.1,99.4888888888889,0.01506445547990087
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
4,75.15555555555555,99.91111111111111,0.002418477892476302
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
5,77.68888888888888,99.9888888888889,0.00043397019659460056
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
6,75.83333333333333,99.84444444444445,0.0036796125145895833
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
7,75.61111111111111,99.9888888888889,0.0006442612384966601
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
8,77.87777777777778,99.9888888888889,0.0006178246608526226
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
9,76.8,99.6,0.012300759838609438

The BadNet accuracy is 76.80 and ASR is 99.60.

Then I tried fine-tune baseline, the script is

OUTPUT=results/nad/finetune

python main.py \
--s_model results/nad/backdoor/WRN-16-1-S-model_best.pth.tar \
--checkpoint_root $OUTPUT \
--log_root $OUTPUT \
--beta1 0 \
--beta2 0 \
--beta3 0 \

The results in the csv file are

epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
0,76.8000,99.6000,0.0123,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
1,69.4444,11.0333,7.8091,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
2,62.1333,0.1333,9.8440,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
3,78.8222,4.1444,7.9249,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
4,80.4556,3.9778,8.1341,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
5,79.7333,4.9778,7.5043,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
6,80.9667,3.3667,8.6271,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
7,81.3333,4.1111,8.5744,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
8,81.4778,3.8222,8.4739,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
9,80.8556,4.2667,8.1581,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
10,81.7667,3.0667,9.1183,0.0000

The accuracy is 81.76 and ASR is 3.06.

The results are different from those on the github in two aspects:

The accuracy of the backdoored model is much lower (85.65~76.80)
The ASR of fine-tuned model is pretty different (18.13~3.06). The ASR result of my replication is low enough.

I run the code multiple times and the results are consistent.

Configuration

Hi i wonder what's the configuration of teacher and student or they are having the same architecture and train on the same train set

Do you provide the code to fine-tune the student model?

Hello,
I tried to apply your work to my own badnets, and I wonder how do you fine-tune the student model to the teacher model?

A question when analyzing the code

Hello! I am a student working on backdoor defense related to CV model. Thanks to your code, from which I have learnt much. However, when I am reading the codes to add some logs to it, I found something. In train_badnet.py, line 143, maybe the maximum value instead of the minimum one between bad_acc[0] and threshold_bad should be saved, according to line 142, which shows that the higher bad_acc[0] the better. Also, in main.py, line 184, I can not understand why the minimum value between bad_acc[0] and threshold_clean is saved. In this case, threshold_clean will be lesser than 10% in just one epoch, which would lead to the save of the model every epoch.

	cls_loss = criterionCls(output_s, target)
	at3_loss = criterionAT(activation3_s, activation3_t).detach() * opt.beta3
	at2_loss = criterionAT(activation2_s, activation2_t).detach() * opt.beta2
	at1_loss = criterionAT(activation1_s, activation1_t).detach() * opt.beta1
	at_loss = at1_loss + at2_loss + at3_loss + cls_loss