yaodongyu / trades Goto Github PK
View Code? Open in Web Editor NEWTRADES (TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization)
License: MIT License
TRADES (TRadeoff-inspired Adversarial DEfense via Surrogate-loss minimization)
License: MIT License
I have read your article many times and feel that the article is well written, but I have a question. Should the sample be standardized to [0, 1]? Can it be standardized to [-1, 1]?why? Thank you very much,。
In the wideresnet.py
I find there is a self.sub_block1
in WideResNet.__init__()
, but it doesn't appear in forward()
.
For comparability with other evaluations in the literature, it might be useful to first add random noise to the image then begin PGD.
https://github.com/yaodongyu/TRADES/blob/master/pgd_attack_cifar10.py#L51
In a quick-and-dirty experiment, adding attack random initialization decreased TRADES model adversarial accuracy by ~1%.
Recently, it has also become apparent that multiple random restarts in evaluation is important, so adding that functionality might be useful as well.
Where can I get the Hyperparameters (such as steps, n_restarts) of evaluated methods in leaderboard ?.
I test Deepfool to attack TRADES and got acc of 54%, but in your leaderboard, Deepfool linf only got 61.38%. It seems strange. Maybe some of hyperparameters are different. I want to use some of your result in my paper if you can share more informantion.
Hi, @yaodongyu , I'm very interested in your work at ICML'19, and I attempt to use it in the competition. I tried to train resnet50 with trades_loss but there was an Error alert:CUDA out of memory. I wonder if trades_loss needs more CUDA memory.
I trained the model resnet50 on NVIDIA 1080ti with cross entropy loss, and the batch size can be set to 128.However, when I trained with trades_loss, it raise an error"CUDA out of memory" with batch_size 16.
I'm not sure whether there's a problem with my code, or trades_loss needs more CUDA memory.
thank you!
I've changed loss_robust
in trades.py
to use cross entropy loss as the following, but then found that in that case the training fails to converge when beta=6.0
. I suspect that this is because dCE(f(x), f(x'))/dw and dKL(f(x), f(x'))/dw is different as f(x') should be also a variable.
As the paper claims that one should use classification-calibrated CE in TRADES to avoid the method to be the logit pairing, I wonder if is it ok to use KL instead of CE for loss_robust
. Or have you considered some practices when loss_robust
is implemented in CE, e.g. different beta
?
def _cross_entropy(input, targets, reduction='mean'):
targets_prob = F.softmax(targets, dim=1)
xent = (-targets_prob * F.log_softmax(input, dim=1)).sum(1)
if reduction == 'sum':
return xent.sum()
elif reduction == 'mean':
return xent.mean()
elif reduction == 'none':
return xent
else:
raise NotImplementedError()
loss_robust = _cross_entropy(model(x_adv), model(x_natural), reduction='mean')
Hi, I tried running the code train_trades_cifar10.py directly with '--beta 6.0' twice, but I failed to achieve the adversarial accuracy as showed in your paper. My final result is only about 49%. I wonder if some other details like training set partition should be done to reach the performance or else. Thank you!
Hi,
Probably a typo
shuffle=True → shuffle=False
https://github.com/yaodongyu/TRADES/blob/master/train_trades_mnist.py#L64
Hi,
Thank you for providing the code and cheers to the great work!
I am training the model on CIFAR-10 using an NVIDIA Titan RTX-24G gpu. Unfortunately, the code is prohibitively slow and each iteration tasks about 4 seconds. Does it run at the same speed on your machine? The WRN model is several times larger than an ordinary classifier for CIFAR-10. I know that the model for adv training should be large, but is it necessary to use such a huge model?
Regards,
Ali
Hi,
just tried out the pre-trained models and came across an unused ResNet block in your Wide ResNet: sub_block1
. Unfortunately, the pre-trained models include parameters for this block, making it impossible to load the models using a Wide ResNet implementation that does not have sub_block1
.
As quick fix, I loaded the models using your Wide ResNet implementation, set sub_block1
to a simple dummy identity layer (or any other layer without parameters) and saved them again. Afterwards, they can be loaded using an implementation without sub_block1
.
Thought that might be interesting for others, or worth fixing (and re-uploading the models) as the unused block also incurs an unncessary memory overhead.
Thank you for sharing an implementation of TRADES - it really helps understand your paper. However, there one thing was unclear to me when comparing the paper and the code. According to the paper (and also the github readme), in the regularization term the adversarial prediction f(X’) plays the role of the label (i.e. second argument to
Which version is the correct one (i.e. the one used to train the publicly available CIFAR-10 model)?
Hi,
In trades_loss you have used the argument 'optimizer' and in line 77 you call 'optimizer.zero_grad()'.
Was there a need for this? In which part of the calculation of trade_loss gradients of the model are updated that we need to zero them?
Thanks a lot.
There is a bug somewhere in the loss_trade with l_2 norm (the l_inf norm is okay). The consuming memory will increase with the increase of the iterations (batch) and finally out of memory.
Thank you for your contribution!
If I want to apply trades on a small network, which has only 2M parameters, and train it on my own dataset, I found the result really bad on both standard acc and robust acc. Is it normal? Or what can I do to modify it?
There seems to be a bug in the adjust_learning_rate function in train_trades_cifar10.py; it only decreases the learning rate once at epoch 75 (the code in the elif clauses is never reached).
Hi,
Thanks for your great work! I am trying to run your code with Distributed DataParallel(DDP) in Pytorch, but met some errors when using trades_loss
function. Here's the error
RuntimeError: one of the variables needed for gradient computation has been modified by
an inplace operation: [torch.cuda.FloatTensor [512]] is at version 4; expected version 3
instead. Hint: the backtrace further above shows the operation that failed to compute i
ts gradient. The variable in question was changed in there or anywhere later. Good luck!
After setting torch.autograd.set_detect_anomaly(True)
, I've got this:
[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnBatchNormBackward. Trace
back of forward call that caused the error:
I think it's the BatchNorm
that caused this error. Do you have the plan to make some modifications to the code to make it fit the DDP? Cuz it's too slow when training with DP or a single GPU 😹
Hi, Dear Hongyang Zhang and Yaodong Yu,
This is a reminder that I have submitted our results(CAA) on TRADES White-box leaderboard. I have sent e-mail to your address. The issue can be closed once you have check the mailbox and no other problems occurred.
Feeling sorry to interrupt you and I would be grateful for your help. Thanks.
Xiaofeng Mao
I want to test the method with deepfool and C&W attacks, but I don't know how to set the hyperparameters. Could you please tell me how to set the hyperparameter?
Hi , forgive me for my ignorance, I'm trying to pgd_attack_cifar10.py, with your WideResnext, and I'm getting the following:
I have a hard time to understand the results, what does each "err pdg" means?
What does the last two lines means? I mean not the names, but the values "1508", "4281"
Maybe I'm just used to top1, top5...
Thanks!
According to the paper eq. (5), the loss L should be the same, saying cross entropy loss in the paper. While in trades.py, there are two kinds of loss used. For max, torch.nn.KLDivLoss is adopted, while for min, cross entropy used for f(x),y and torch.nn.KLDivLoss used for f(x),f(x'). So why use two different loss here? using both cross entropy loss at the both place is ok? performance?
dear,yaodong yu and hongyang zhang:
I am very happy to read such a good paper, and thank you very much for providing the white box MNIST and CIFAR-10 leaderboards. I recently(2020.8.15) submitted the results of my adversarial attack to you. If you have time, could you check my results and update the MNIST and CIFAR-10 leaderboards?
Thank you very much!
My name is ye Liu.
We were trying to evaluate our attack with the CIFAR-10 model. This is our script to convert saved images to a .npy
file: https://github.com/admk/TRADES/blob/master/convert.py
We are using the same xadv = torch.clamp(xadv - x, -epsilon, epsilon) + x
as in https://github.com/yaodongyu/TRADES/blob/master/pgd_attack_cifar10.py#L76
to guarantee the boundaries, but it didn't work for us because of floating-point rounding errors:
Do you know how we can reliably torch.clamp
the ranges for your checks?
Update: PyTorch==1.7.0, CPU and GPU gave different magnitudes of rounding errors.
Hey, thanks for this repo. I came across this and I wanted to see if it was an issue. There seems to be a significant change in run time when switching from one of the models implemented in your repo and the official torchvision models.
Running train_trades_cifar10.py
and specifying the model as model = ResNet18()
from the TRADES repo, the time per batch is ~ 0.8 seconds.
Running train_trades_cifar10.py
and specifying the model as model = torchvision.models.resnet18()
, the time per batch is ~ 0.28 seconds.
I checked to make sure that it wasn't due to model size, and the models have 11181642 trainable params each.
Any advice on why the behavior is this way would be greatly appreciated.
Thank you very much for releasing the model and associated code along with your paper. I'm very grateful that you've put the effort into making it as easy as possible to get everything up and running, and I sincerely hope others involved in the contest follow your lead.
I'm taking an initial first pass at looking at everything, and am getting somewhat confusing results. First, it looks like the model is giving very different results at small batch sizes:
from models.small_cnn import SmallCNN
import numpy as np
import torch
from models.wideresnet import WideResNet
from torch.autograd import Variable
import torch.optim as optim
import torch.nn as nn
device = torch.device("cuda")
model = WideResNet().to(device)
model.load_state_dict(torch.load('./checkpoints/model_cifar_wrn.pt'))
X_data = np.load("data_attack/cifar10_X.npy")
Y_data = np.load("data_attack/cifar10_Y.npy")
X_data = np.transpose(X_data, (0, 3, 1, 2))
for bs in [1, 2, 4, 5, 10, 50, 100]:
predictions = []
for i in range(0,100,bs):
logits = model(torch.from_numpy(np.array(X_data[i:i+bs], dtype=np.float32)).to(device)).cpu().detach().numpy()
predictions.extend(np.argmax(logits,axis=1))
print("mean accuracy with batch size %d: %f"%(bs,np.mean(predictions == Y_data[:100])))
will output
mean accuracy with batch size 1: 0.140000
mean accuracy with batch size 2: 0.640000
mean accuracy with batch size 4: 0.760000
mean accuracy with batch size 5: 0.790000
mean accuracy with batch size 10: 0.840000
mean accuracy with batch size 50: 0.840000
mean accuracy with batch size 100: 0.830000
It looks like there is also some dependence on the data of the batch to classify each input. I have some batch of 10000 examples I want to process as a [100, 100, 3, 32, 32] matrix, and if I process them in row-major order I get a different accuracy than column-major. I suspect this might have the same underlying cause, so I'll give details for that later if necessary.
As you might imagine, this makes it difficult to evaluate the defense: evaluating the network with a batch of [99 clean examples] + [1 adversarial example] gives a different result than [50 clean examples] + [50 adversarial examples].
Is this intended, am I doing something wrong, or something else?
Why does the first term in the loss_robust equal F.log_softmax(model(x_adv))? I have tryed to generate adv samples by noraml PGD, and set the loss to the trades's loss. But I cannot understand why it is and I guess I prefer to use criterion_kl(F.softmax(model(x_adv)), F.softmax(model(x_natural))). Are there some nice people who can answer my question? I would appreciate it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.