softsys4ai / athena Goto Github PK

Athena: A Framework for Defending Machine Learning Systems Against Adversarial Attacks

Home Page: https://softsys4ai.github.io/athena/

License: MIT License

Python 97.14% Jupyter Notebook 1.31% Shell 1.55%

adversarial-machine-learning deep-neural-networks artificial-intelligence machine-learning defense-methods trusted-ai security

athena's People

Contributors

Stargazers

Watchers

Forkers

jdc08161063 phymucs scottliao920 zobaed11 derekhjray wxbbuaa2011 albertchristian92 nimajam41 socioprophet

athena's Issues

Implementing some random noise as a weak attack

This will be a baseline attack, also we compare with a model trained on random noise.

Normalized l2-dissimilarity

Hi I'm trying to reproduce your experiment but while I was trying to evaluate FGSM attacks under white box settings, I found that the normalized l2-dissimilarity for a FGSM at eps=0.1 is only about 0.007. As in your code, the upper bound for an white-box attack is determined by some pre-generated adversarial examples. I'm wondering how you guys finish the experiments as in Fig. 7 in your paper?

Investigation on the difference of inference time between BS and AE

So far, we have empirical evidence that shows there is a significant difference between BS and AE in terms of inference time on the clean model and many transform models. Check the column "inference Probability" in
Inference_Time-T_Test.xlsx for the t-test result of inference time of BS and AE on every model.

Question: what is the root cause of this difference?

Investigation idea:

Which part of NN layers (lower layers or upper layers) has more impact on this difference?
measure the time cost for each layer for both BS and AE
Do the inferences of BS and AE result in a different number of neurons to be activated

End-to-end demo for our defense approach

we need to prepare a python notebook containing our end to end approach with some specific examples

Evaluate White-box Threat Model

white-box

poisson-noise transformation model crashes when predicting the AE, jsma (theta-10, gamma-30)

[Error Message]

Traceback (most recent call last):
File "detection_as_defense.py", line 67, in
transformationList)
File "/home/kevinsjh_gmail_com/adversarial_transformers/util.py", line 1087, in predictionForTest
tranSamples = transform_images(curSamples, transformType)
File "/home/kevinsjh_gmail_com/adversarial_transformers/transformation.py", line 748, in transform_images
elif (transformation_type in TRANSFORMATION.NOISES):
File "/home/kevinsjh_gmail_com/adversarial_transformers/transformation.py", line 671, in add_noise
img_noised = skimage.util.random_noise(img, mode=noise_mode)
File "/home/kevinsjh_gmail_com/anaconda3/lib/python3.7/site-packages/skimage/util/noise.py", line 155, in random_noise
out = np.random.poisson(image * vals) / float(vals)
File "mtrand.pyx", line 4005, in mtrand.RandomState.poisson
ValueError: lam value too large.

Accuracy of some transform models are lower than that of clean model

After investigating the accuracy of all models (clean model + 32 transform models) under the attack of FGSM250 AEs, it is observed that the accuracy of a few transform models become lower than the clean model's accuracy, which are higher than that of the clean model previously. Check the table below.

Model Type	testset	trainset
Clean	0.1143	0.1663
rotate270	0.1034	0.1394
erosion	0.1336	0.1182
opening	0.3172	0.1363
shift_bottom_right	0.1033	0.1181

Visualizing label separateability using t-sne

We should use t-sne to look into some properties of models trained on transformations and provide some intuitions why they work.

refactor

manage the information with logging

Use logging to manage the information, rather than using print function.

Test if the current defenses does not reply on a specifc AE type

Use ensemble models trained from fgsm AEs and have them tested on bim AEs.
Use ensemble models trained from bim AEs and have them tested on fgsm AEs.

Tune and fix bugs for new transformations

Some causes crash while some generated black images.
list:
new filter transformations,
denoising transformations,
geo transformations,
seg transformations.
Rename some of the existing transformation types to organize all transformations better.

construct tests for this project

unit tests

exceptions were thrown when applying some sort of transformations

filter.sobel
all augment transformations.

support one-pixel attack

We'd like to provide one-pixel attack in our tool kit.

CW attack

provide CW (0, 2, and inf norms) attacks

blackbox attack

We'd like to provide a simple blackbox attack

use FLAG to manage configurations

use FLAG to manage

project configurations
global variables
other related values/settings.

Issues running script on gcloud

  File "scripts.py", line 308, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "scripts.py", line 290, in main
    generate_adversarial_examples(DATA.mnist, ATTACK.JSMA)
  File "scripts.py", line 177, in generate_adversarial_examples
    theta=theta, gamma=gamma)
  File "/home/tester/advML/attacks/attacker.py", line 107, in get_adversarial_examples
    X_adv, Y = whitebox.generate(model_name, X, Y, attack_method, attack_params)
  File "/home/tester/advML/attacks/whitebox.py", line 172, in generate
    adv_x = attacker.generate(model.input, **attack_params)
  File "/home/tester/.local/lib/python3.5/site-packages/cleverhans/attacks/__init__.py", line 948, in generate
    labels, nb_classes = self.get_or_guess_labels(x, kwargs)
  File "/home/tester/.local/lib/python3.5/site-packages/cleverhans/attacks/__init__.py", line 281, in get_or_guess_labels
    preds = self.model.get_probs(x)
  File "/home/tester/.local/lib/python3.5/site-packages/cleverhans/utils_keras.py", line 179, in get_probs
    return self.get_layer(x, name)
  File "/home/tester/.local/lib/python3.5/site-packages/cleverhans/utils_keras.py", line 227, in get_layer
    output = self.fprop(x)
  File "/home/tester/.local/lib/python3.5/site-packages/cleverhans/utils_keras.py", line 203, in fprop
    self.keras_model = KerasModel(new_input, out_layers)
  File "/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/network.py", line 93, in __init__
    self._init_graph_network(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/network.py", line 231, in _init_graph_network
    self.inputs, self.outputs)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/network.py", line 1366, in _map_graph_network
    tensor_index=tensor_index)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/network.py", line 1347, in build_map
    for i in range(len(node.inbound_layers)):
**TypeError: object of type 'InputLayer' has no len()**

Organize the defense in a folder

Evaluate Grey-box Threat Model

Presume that attackers have access to all weak defenses but are not aware of the ensemble strategies.

pgd attack

Investigate how adv attacks change the class activation mapping (CAM)

We should investigate how adv attacks change the class activation mapping (CAM) and how our transformations change the map. It may reveal some understanding of why our approach works.

DEEPFOOL (l_inf norm)

support deepfool (L_inf norm)

make current defense approaches (ensemble models) to not depend on a specific AE type

Approach 1: add randomness (random noise) to AEs and then use them to train ensemble models

Approach 2: use the strongest type of AEs to build ensemble models for defense

SC computing cluster: deploy a docker container that could access a GPU node

U of SC computing resource center supports Sylab.io, With that, we should be able to create a docker container, which gives us root privilege, and deploy it to the computing resource center to access GPU nodes.

Try it out after conferences' deadlines

Get scripts run on rci nodes.

The scripts, which work well on local machines, do not work on rci nodes, some exceptions thrown by tensorflow. Highly suspect this is caused by the gap between versions. I am using tensorflow 1.13 on local machines, while tensorflow 1.12 is on rci nodes.

Need to take a look at this issue and fix the bugs on rci.

Support MIM attack

support MIM attack

reset function in src/evaluation/eval_whitebox.py

Hi, is it possible to implement reset function for other transformations?

Readme for the project showing the end top end workflow

With some figures and appropriate plots.

Detection as a defense

The idea is to differentiate BS and AE based on their output from distinct transform models.

In Detecing adversarial samples from artifacts.pdf, it is shown that different models make different mistakes when presented to same AEs. And 2018-arXiv-PictureAE-Picture_AE_detection_bimodel.pdf proposes Bi-model approach that concatenates the output of an image from two distinct models as its feature representation and then feeds it to a binary classifier for classification. The approach is claimed to reach >90% detection accuracy on mnist and cifar10.

we can concatenate/stack up the output of transform models for an input image and use it as a representation of the image and feed into a binary classifier. This might could have a higher detection accuracy and generalize better across different type of attacks.

Investigation:

Identify patterns of the prediction output of BS and AE
for BS and each type of AE, plot the boxplot for the average, min and max accuracy of all transform models
Detection approach 1: majority voting
the prediction output of AEs is much more diverse than that of BS. That is, the number of transform models agrees with each other on AE will be much smaller than that on BS. If this number is below some threshold, say 75% X total_number_of_models, the input image will be marked as an AE. Otherwise, it is considered as a benign sample.
Detection approach 2: distance matrix
【empirical evidence to collection】: distance of prediction outputs of a benign sample from two distinct transform models is close to 0, while the distance of prediction outputs of an AE from two distinct transform models should be much larger than 0.
Try with different distance metrics: L2, entropy, KL divergence, cosine, correlation

【distance matrix】: for an image, create a distance matrix by computing the distances of its prediction outputs between each pair of transform models. Investigate any possible property or difference between AE and BS