spring-epfl / mia Goto Github PK
View Code? Open in Web Editor NEWA library for running membership inference attacks against ML models
License: MIT License
A library for running membership inference attacks against ML models
License: MIT License
Follow the TF demo, there is something wrong in attacking pytorch model.
Could you please share a demo with PyTorch?
Hi,
In the BaseModelSerializer
definition, the Keras model is passed before the model ID:
Line 29 in d389d30
But in the ShadowModelBundle
class the model ID is passed before the model object:
Line 118 in d389d30
I think the BMS abstract class definition should be corrected with the two arguments swapped in order.
Best
Hi! I've been playing a bit with the library, great job I really liked it! but while running the CIFAR-10 on my machine I've found that I get a fairly high False Positive Rate, any suggestions on how to reduce them? am I doing something wrong here? Thanks for the help!
The example code I'm running -- The confusion matrix at the end of the notebook
In the README.md, the documentation
link fails:
A library for running membership inference attacks (MIA) against machine learning models. Check out the documentation.
Hi! I'm trying to conduct the membership attack on the Fashion Mnist dataset by slightly changing the provided example. Since the Fashion MNIST is 70k images (60k training and 10k for validation) and each image in 28x28x1, with 10 classes, I only had to adapt the target model to 28x28 images.
When training the attack model I get this error:
ValueError: Empty training data.
I was wondering where this problem comes from, I left untouched every other parameter like the SHADOW_DATASET_ATTACK
or ATTACK_TEST_DATASET_SIZE
I ran the CIFAR10 example and I would like to save it so I can run some tests without having to retrain the attack model every time and also be able to use it elsewhere. I can't do it conveniently. Is there a way to do so?
I have tried pickle.dump()
, as well as using _get_model()
and the saving it.
Thanks
I have installed all dependencies necessary for the application to run however when I run the tests located in /mia/tests
:~/mia/tests# python conftest.py
I receive
:~/mia/test# python conftest.py
Using TensorFlow backend.
and that is all - nothing else. do you have any idea why this could be happening?
Hi, you mention in the readme that the package supports PyTorch models, but in ShadowModelBundle._fit
you assume the model has fit
method (line 116).
How exactly have you tested the PyTorch models? I was thinking of maybe using pytorch-fitmodule or SuperModule, but if there's a way you recommend already that would be great. Also it would be nice to include an example of how to load PyTorch modules in the package! (maybe I can do a PR after I'm able to do it myself :-)
Hey Bogdan, I read your project called "spring-epfl/mia" and tried to reproduce the MIA model you shared on Github.
I found that the attack accuracy is like 55% on the Cifar-10 dataset, which is not high as we expected. Could you give me some advice to improve the accuracy?
Here are the Github link and the result of the code: https://github.com/spring-epfl/mia/tree/master/examples
Thank you in advance. Take care:)
This one is hopefully pretty straightforward: the wrapper param enable_cuda
is spelled enable_cude
in the docs. Not a huge deal, but did have me scratching my head for a minute when I saw it first.
Line 109 in d389d30
Also see section 3.3 here: https://buildmedia.readthedocs.org/media/pdf/mia-lib/latest/mia-lib.pdf
Right now it only supports attacks on classification models. Any plans for extending to regression?
Firstly, thanks for you contribution of mia, which is a very well-structured and concise implementation of the model inference attack.
However, one thing confuses me is that, in your cifar10 example, line 150-152, you include (x_test, y_test) into the dataset for testing the attacker, which is used for training the shadow model. For my perspective, this is inappropriate since the attack model has already seen these data indirectly through the shadow models, which breach the assumption that the testing is unseen before.
In the official implementation https://github.com/csong27/membership-inference, they seem to avoid this by separating one more testing set out.
I don not know whether my understanding is correct.
Hi ,
Thank you for implementing the Shokri et al. attack. I have been reading and repeating the experiment mentioned in the paper. However, I found that all the training dataset for shadow models just using the data records disjoint from target training dataset of specific dataset (like cifar-10) and replacing k features or replacing nothing in the code. Maybe, it could be a little bit different from the original algorithm in the paper.
I wrote the Algorithm 1: Data synthesis using the target model by myself using Pytorch. I generated a random tensor X_tensor as size of (1, 3, 32, 32) for cifar-10 dataset and used two phases --- search and sample as the algorithm 1 in the paper. The code is as below:
def data_synthesize(net, trainset_size, fix_class, initial_record, k_max,
in_channels, img_size, batch_size, num_workers, device):
"""
It is a function to synthesize data
"""
# Initialize X_tensor with an initial_record, with size of (1, in_channels, img_size, img_size)
X_tensor = initial_record
# Generate y_tensor with the size equivalent to X_tensor's
y_tensor = gen_class_tensor(trainset_size, fix_class)
y_c_current = 0 # target models probability of fixed class
j = 0 # consecutive rejections counter
k = k_max # search radius
max_iter = 100 # max iter number
conf_min = 0.1 # min probability cutoff to consider a record member of the class
rej_max = 5 # max number of consecutive rejections
k_min = 1 # min radius of feature perturbation
for _ in range(max_iter):
dataset = TensorDataset(X_tensor, y_tensor)
dataloader = DataLoader(dataset=dataset, batch_size=batch_size, num_workers=num_workers, shuffle=True)
y_c = nn_predict_proba(net, dataloader, device, fix_class)
# Phase 1: Search
if y_c >= y_c_current:
# Phase 2: Sample
if y_c > conf_min and fix_class == torch.argmax(nn_predict(net, dataloader, device), dim=1):
return X_tensor
X_new_tensor = X_tensor
y_c_current = y_c # renew variables
j = 0
else:
j += 1
if j > rej_max: # many consecutive rejects
k = max(k_min, int(np.ceil(k / 2)))
j = 0
X_tensor = rand_tensor(X_new_tensor, k, in_channels, img_size, trainset_size)
return X_tensor, y_c
However, the prediction probability it generates is so low, like 0.1. Could you please give me some guidance on the Data Synthesis Using the Target Model Algorithm or update the uploaded code? Thanks in advance for your patience!
Best wish!
Yantong
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.