The hsja from jianbo-lab

Which true gradient does HSJA estimated for? (in function "approximate_gradient")

Dear Sir:
In your code def approximate_gradient(model, sample, num_evals, delta, params),

Line 165 in daecd5c

def approximate_gradient(model, sample, num_evals, delta, params):

I thought this code is used for estimating the true gradient as the following code:

    def get_grad(self, model, x, true_labels, target_labels):
        with torch.enable_grad():
            # x = torch.clamp(x,min=0,max=1.0).cuda()
            x = x.cuda()
            x.requires_grad_()
            logits = torch.softmax(model(x),dim=1)
            if true_labels is not None:
                true_labels = true_labels.cuda()
            if target_labels is not None:
                target_labels = target_labels.cuda()
            loss = self.cw_loss(logits, true_labels, target_labels)
            gradient = torch.autograd.grad(loss, x,retain_graph=True)[0].cpu().detach()
        return gradient

where cw_loss is defined as the following:

    def cw_loss(self, logit, label, target=None):
        if target is not None:
            # targeted cw loss: logit_t - max_{i\neq t}logit_i
            _, argsort = logit.sort(dim=1, descending=True)
            target_is_max = argsort[:, 0].eq(target).long()
            second_max_index = target_is_max.long() * argsort[:, 1] + (1 - target_is_max).long() * argsort[:, 0]
            target_logit = logit[torch.arange(logit.shape[0]), target]
            second_max_logit = logit[torch.arange(logit.shape[0]), second_max_index]
            return target_logit - second_max_logit
        else:
            # untargeted cw loss: max_{i\neq y}logit_i - logit_y
            _, argsort = logit.sort(dim=1, descending=True)
            gt_is_max = argsort[:, 0].eq(label).long()
            second_max_index = gt_is_max.long() * argsort[:, 1] + (1 - gt_is_max).long() * argsort[:, 0]
            gt_logit = logit[torch.arange(logit.shape[0]), label]
            second_max_logit = logit[torch.arange(logit.shape[0]), second_max_index]
            return second_max_logit - gt_logit

But when I compute the cosine simillarity between two gradient, I found it was very low, about 0.02?
Which true gradient does approximate_gradient approximate?
Can you give me some code or example please?

Is it possible and easy to enable batch processing?

Hey,

Really nice work and well written code as well. I would like to ask whether batch processing (optimize several images at the same time) is possible to implement on top of the code base, and whether it will be supported in the future. If not, is there any suggestions on trying to implement batch mode myself.

Thanks!

Question about L_infinity norm attack of HSJA.

I read your code carefully, and I have the following questions that confuse me in L-infinity norm of HSJA.

In line : https://github.com/Jianbo-Lab/HSJA/blob/master/hsja.py#L225 why you set highs = dists_post_update rather than highs = np.ones(len(perturbed_images)) used in L2 norm? And also the thresholds = np.minimum(dists_post_update * params['theta'], params['theta']) is different from L2 norm version?
In function def project, why you use the following code to project the alpha? The L2 norm is easier to understand which return (1-alphas) * original_image + alphas * perturbed_images as the middle point lies in the line between original_image and perturbed_images. How to understand the following L-inf norm code?

elif params['constraint'] == 'linf':
		out_images = clip_image(
			perturbed_images, 
			original_image - alphas, 
			original_image + alphas
			)
		return out_images

Why you use update = np.sign(gradf) as the gradient direction in L-infinity norm version? (

HSJA/hsja.py

Line 99 in 6a91456

update = np.sign(gradf)

)
Furthermore, is the gradf returned by def approximate_gradient the same vector as the normal vector of the decision hyperplane?

Why the perturbed image is not clip with min=0 and max=1 after binary search step? Bug?

I notice the statement perturbed = clip_image(perturbed + epsilon * update, clip_min, clip_max) is placed before perturbed,dist_post_update = binary_search_batch(sample, perturbed[None], model, params) in line 110~line 114 (geometric_progression). (https://github.com/Jianbo-Lab/HSJA/blob/master/hsja.py#L114)
Why there is no clipping operation after perturbed,dist_post_update = binary_search_batch(sample, perturbed[None], model, params) ? Is it a bug?

Question about delta

The pseudocode in the paper describes that under a certain delta value, when the output results of the classifier are all zero, the delta value will be reduced by half, but I don’t seem to see this step in the code. Could you please explain this? I still occasionally encounter this problem when using your code to run experiments under different gamma values or different networks.

Provide parameters and example models for ImageNet

The default parameters does not work well for Inception v3 model on ImageNet dataset in my setup. This repo does not provide ImageNet models and corresponding parameters for HSJA. Can you provide example parameters and example models for ImageNet dataset? Thanks a lot!

What does baseline substraction mean in the gradient estimation?

The code starts from
https://github.com/Jianbo-Lab/HSJA/blob/master/hsja.py#L182
What does these codes mean?
I can understand the above code to estimate gradients, which is mainly RGF method.

MNIST model for attack

Hi,

Thank you for providing the source code of the attack. However, this code attacks only for the CIFAR10 resnet model. Is there any pretrained model that you used for MNIST images? If so, could you recommend what model you used?

Any information will be of great help!

Thanks!

How is input norm calculated?

I have a quick question about input norm calculation. I think the linf norm is just the max difference on input space and previous attack works mostly use 0.05 or 8/255. In our paper, the linf norm is really different between ImageNet/CIFAR100 and MNIST/CIFAR10, is that because you calculate the norm on [0,255] space for ImageNet/CIFAR100?

this is a problem to run this code ,but i don't how to solve , please query a help,"construct_original_network grads = tape.gradient(preds[:,c], x) File "C:\Users\高婷\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\backprop.py", line 1086, in gradient unconnected_gradients=unconnected_gradients) File "C:\Users\高婷\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\imperative_grad.py", line 77, in imperative_grad compat.as_str(unconnected_gradients.value)) AttributeError: 'KerasTensor' object has no attribute '_id'"

Why this line code exists? HELP plz

HSJA/hsja.py

Line 333 in daecd5c

delta = 0.1 * (params['clip_max'] - params['clip_min'])

Which stepsize_search performs better in terms of query and distortion?

Your code has two options for stepsize_search: geometric_progression and grid_search, which one should I use for the experiments for the best performance in terms of query and distortion?

Unnecessary binary search?

Could you explain the difference between the binary search in the initialize method [1] and the binary search right after it [2]? It seems to me that the second one is unnecessary, since the image already has been projected to the boundary.

[1] Binary search in the initialize method

HSJA/hsja.py

Lines 288 to 300 in daecd5c

    
           # Binary search to minimize l2 distance to original image. 
        
           low = 0.0 
        
           high = 1.0 
        
           while high - low > 0.001: 
        
           	mid = (high + low) / 2.0 
        
           	blended = (1 - mid) * sample + mid * random_noise  
        
           	success = decision_function(model, blended[None], params) 
        
           	if success: 
        
           		high = mid 
        
           	else: 
        
           		low = mid 
        
           initialization = (1 - high) * sample + high * random_noise

[2] Binary search right after initialize

HSJA/hsja.py

Lines 81 to 86 in daecd5c

    
           # Project the initialization to the boundary. 
        
           perturbed, dist_post_update = binary_search_batch(sample,  
        
           	np.expand_dims(perturbed, 0),  
        
           	model,  
        
           	params) 
        
           dist = compute_distance(perturbed, sample, constraint)

a problem about convergence

Hello,I've met a problem.
when I use MNIST or CIFAR10 or ImageNet ,the l2 norm almost unchanged after the first binary search.
looking forward to your reply ，thanks

HSJA is extremely slow when I use the params in this repo

I find when I set the params as 'max_iter=64, max_eval=10000, init_eval=100' to attack a ResNet on MNIST, it is very time-consuming. I use a batch including 200 MNIST images but the algorithm spends more than 12 hours obtaining the adversarial examples of this batch. BTW, I use an RTX Titan to test the speed. Is it normal to cost a lot of time, or I do something wrong?

The approximate_gradient function seems not follow the equations in paper

HSJA/hsja.py

Line 180 in 6a91456

fval = 2 * decisions.astype(float).reshape(decision_shape) - 1.0

It seems like you use fval as the average of the subtraction between the Binary function and the baseline of the estimate. I do not understand this. Is this another approximation?

On the other hand, referring to eq.11, for me, it is more like an estimate of integration. Could you give me some details? Thanks.

The value of theta is different from the paper

Hi Jianbo,

Thanks for your nice work, and it's experimental results are really impressive.

I notice the values of theta are actually different in paper and source code. From equation 17 in the paper, theta should be 0.001 * d^{-3/2} under the l2 constriant. However, in the source code, theta is set to 1 * d^{-3/2} in this newer commit or 0.01 * d^{-1/2} in this older commit

	# Binary search to minimize l2 distance to original image.
	low = 0.0
	high = 1.0
	while high - low > 0.001:
	mid = (high + low) / 2.0
	blended = (1 - mid) * sample + mid * random_noise
	success = decision_function(model, blended[None], params)
	if success:
	high = mid
	else:
	low = mid

	initialization = (1 - high) * sample + high * random_noise

	# Project the initialization to the boundary.
	perturbed, dist_post_update = binary_search_batch(sample,
	np.expand_dims(perturbed, 0),
	model,
	params)
	dist = compute_distance(perturbed, sample, constraint)

jianbo-lab / hsja Goto Github PK

hsja's People

Contributors

Stargazers

Watchers

Forkers

hsja's Issues

Recommend Projects

Recommend Topics

Recommend Org