Paper : Local Competition and Stochasticity for Adversarial Robustness

Add [Stochastic LWTA] about auto-attack HOT 11 OPEN

fra31 commented on July 22, 2024

Add [Stochastic LWTA]

from auto-attack.

Comments (11)

fra31 commented on July 22, 2024

Hi,

thanks for the submission! I noticed that your model has some stochastic component, but even using EoT the robust accuracy didn't decrease. Then I set deterministic=True in the LWTA layers and got for the resulting model

clean accuracy: 73.70%
robust accuracy 0.00%

for 1000 points. Then I evaluated the obtained points with the original (stochastic) model and got

clean accuracy: 85.80%
robust accuracy: 17.30%
robust accuracy: 17.20%
robust accuracy: 17.80%
robust accuracy: 16.80%
robust accuracy: 18.10%
robust accuracy 17.40%

with 6 runs to check the standard deviation. Also, consider that the points misclassified by the deterministic model were not attacked, which means around 12% of the robust accuracy is explained by the difference in clean performance between deterministic and stochastic model. Then attacking also such points might further reduce the robust accuracy.

Since the stochastic component doesn't impact significantly the adversarial perturbations, I guess there might be some component in the stochastic activation functions which prevents the usual gradient computation (one'd need to inspect the gradients carefully).

from auto-attack.

konpanousis commented on July 22, 2024

Hello,

I do not understand why you followed such an approach in evaluation.
The deterministic LWTA function is a different activation compared the Stochastic LWTA variant that we introduced.
It is like going to a ReLU based model and changing all the activations to Tanh.
The model has been trained stochastically with Stochastic LWTA. Loading the stochastically trained model and changing the activation function obviously results in a different model. You can see that even in the natural accuracy part.
I updated the code and removed these parts.

from auto-attack.

fra31 commented on July 22, 2024

Sorry, maybe I didn't explain myself clearly. I used the deterministic version only to generate the adversarial perturbations (first set of results), then classified the adversarial points with the original model with stochastic activation functions. This is a standard transfer attack (I invite you to check if you obtain the same results with such method), which can be helpful when the information (especially about gradients) from the target model is for some reason not helpful.

from auto-attack.

konpanousis commented on July 22, 2024

In the considered approach, we are not investigating the context of a transfer attack or other potential attacks using surrogate models. The LWTA approaches are scarcely examined in the literature and a thorough examination is in progress. However, in this case, we focus on the AutoAttack and other gradient-based implementations. We are talking specifically about the A-PGD and the other included attacks (from apgd-ce, apgd-t, fab-t, apgd-dlr, square and the rand versions using EoT).
Even if the Stochastic activations indeed block the information flow of the gradients this is an inherent property of the activation and not some artifact of the training process. Since even when using EoT did not result in a drop in accuracy, the proposed approach yields a state-of-the-art performance against the strong attacks that AutoAttack implements.
We'll gladly investigate the context that you are referring to (please do send me an email or a link to your implementation) but do include these results in the leaderboard.

from auto-attack.

fra31 commented on July 22, 2024

A few comments: first, the leaderboard on the webpage includes only deterministic defenses (note that in the paper we have a separate table for randomized ones which are not included).

Second, there are other ways to make white-box attacks, including those in AutoAttack, to (mostly) fail or perform poorly like quantization of the input (#44) or adding a further softmax layer on top of the classifier (#41). However, such methods do not improve robustness, and are easily bypassed by careful implementation changes (while preserving the weights of the target model). Changing the activation function to generate the perturbations (via AA) seems to me another of those countermeasures one can adopt when the information coming from the target model is little or not helpful.

Also, in the white-box scenario, one is interested in robustness of the models to any attack, including transfer-based ones.

from auto-attack.

konpanousis commented on July 22, 2024

According to your comments, it seems that AutoAttack/RobustBench is ready to exclude all Bayesian-based methods. Would you consider a VAE-like model as a leaderboard candidate?
If not and the same rationale applies in this case, I think it is in the best interest of the community to be extremely public that a whole discipline is excluded from evaluation.

Your examples are at least irrelevant, if not unfortunate. The described examples are ad-hoc "tricks" to fool the library or honest mistakes and not principled design paradigms.

Since these approaches are easily bypassed, if AutoAttack fails to capture the properties of newly proposed methods, one should focus on improving the attack and not arguing about other approaches that "seem" to be a countermeasure.
For better or for worse, AutoAttack uses the implemented attacks and that's it. Every other comment is bad etiquette.

I can at least be certain that every entry in the leaderboard has been thoroughly cross-examined with surrogate models and that the produced results are not artifacts of the training processes and tweaks and are truly robust.

from auto-attack.

fra31 commented on July 22, 2024

RobustBench clearly excludes randomized defenses from the current leaderboards, and accepts adaptive evaluations to improve the AA one. If a leaderboard for randomized defenses will be added, we'll be happy to accept your models.

For the models in Model Zoo, we have also studied transferability (see our paper) if that's what you mean. Obviously, I cannot guarantee that no attack is able to improve the robustness evaluation (that's for certified defenses), but if they had shown suspicious behaviors we would have further tested them.

I think it'd be indeed great to have an attack which automatically detects when standard methods do not work and finds an alternative. Currently, for some cases, we still have to it manually.

But, back to the point, I'd like to know if you can confirm that the model is vulnerable to transfer attacks, or I missed something when evaluating it.

from auto-attack.

soteri0s commented on July 22, 2024

As a lead PI of this work, I have watched the thread and remain completely disappointed with the depth of the technical analysis and arguments.

Here, we are talking about having a network learn some representations, at some layers inside it, which are latent variables, that is sampled variables, not simple values from a ReLU or other silly function. We draw from a TRAINED posterior which is there for the attacker to learn if they can. A good attack method should be able to catch this posterior, because it's learned there, it is NOT something COMPLETELY RANDOM, used as a trick. You LEARN POSTERIORS IN A VARIATIONAL WAY AND THEN SAMPLE DURING INFERENCE (DURING TRAINING YOU DO THE REPARAMETERIZATION TRICK/GUMBEL-SOFTMAX OR RELAXED CATEGORICAL).

If we were novice to the field, we could have got the encoder of a variational autoenocoder, get the representations "z's", which are sampled Gaussians, not deterministic units there, and then feed them to further layers with ReLUs and so on. Would you consider the existence of a Gaussian activation layer at any point of that net a randomised defence?! How can this even be said? It's a better, a more ROBUST way to learn representations what happens with the Gaussian unit in the VAE-type model.

Thus, that's a kind of activation we are talking about, NOT a randomised defence. How can you even say a neuron which functions as a random variable (i.e I train a posterior distribution which is there and can be interrogated, as opposed to point-estimates), that constitutes a RANDOMIZED DEFENCE TRICK? Where is the trick here? We are talking about the wholly grail of representation learning.

Of course, there is a whole theory of ours as to why Gaussian layers or not good, while sparse outputs from Discrete distributions (this is what the blocks of stochastic LWTAs are), which are biologically-inspired (this is how cortex makes representations, its stochastic competition to fire), are the way to go. They are a promising method for learning representations, in so many contexts that you will see coming in the near future as publications in major venues from us (the here-discussed paper is an AISTATS, of course, and the work has also been published in ICLM 2019I).

All in all, this discussion is disheartening as it shows fractions of the community lack basic knowledge of anything else than writing networks with few lines of Tensorflow. They are unfamiliar with even Tensorflow-Distribution, which contains the Gaussian layer as yet another one line command, and of course Gumbel-softmax relaxation.

from auto-attack.

fra31 commented on July 22, 2024

I just reported the results I got, without questioning or mentioning the idea behind your model, and asked (twice) to confirm whether the model is vulnerable or not to transfer attacks.

from auto-attack.

mahetue commented on July 22, 2024

I think Francesco asked a simple question - either your model is robust or it is not. We care only about the correct evaluation of the adversarial robustness and not about the motivation behind your model.

from auto-attack.

konpanousis commented on July 22, 2024

Dear Dr. Hein,

first of all, to answer the posed question, we evaluated the model using a WideResNet-34-5. Similarly to Francesco's process (for which I asked for the implementation and got no answer), we produced the adversarial examples with a deterministic LWTA activation (for all 10000 examples), yielding:

initial accuracy: 21.06%
Robust Accuracy after APGD-CE: 0.30%

and then classified the adversarial examples using the stochastic model resulting to 14.23% error (85.77% robust accuracy).

The same model under an immediate A-PGD-CE attack yielded 87% robust accuracy.

As already mentioned in a previous post, we are currently investigating further aspects of the proposed architecture, including black box attacks (that have nothing to do with gradient information and among which is Square, where our approach once again yields SOTA performance). Nevertheless, it is apparent that in this case, the surrogate model fails to produce meaningful adversarial examples for a transfer attack.

Any useful feedback, constructive criticism and suggestions (or even experimental results) are always welcomed. Apart from that, guesses, ad-hoc claims and irony are not a part of the scientific rationale and it is simply bad etiquette in the community (basically in all communities and life in general).

In this context, while you claim that Francesco just asked a "simple question", this is not the case. If he just stated from the beginning that the AutoAttack leaderboard does not accept randomized defences (even if in many cases this is an ambiguous term, and even if it is not stated in the AutoAttack page but only in RobustBench, and even if the considered approach fell under this category) and asked for further experimental results or dismissed the entry on that basis, we wouldn't be having this conversation. When F. compares our paradigm to mistakes, silly approaches or even cunning tricks to elude the adversarial optimization process, it is not just a "simple question", but instead a direct "attack" on our method. A public critique of a proposed approach is either the work of a reviewing process or the result of a thorough investigation. Othewrise it's just tittle-tattle.

from auto-attack.

Add [Stochastic LWTA] about auto-attack HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent