Code Monkey home page Code Monkey logo

opacus's Introduction

Opacus


CircleCI Coverage Status PRs Welcome License

Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance, and allows the client to online track the privacy budget expended at any given moment.

Target audience

This code release is aimed at two target audiences:

  1. ML practitioners will find this to be a gentle introduction to training a model with differential privacy as it requires minimal code changes.
  2. Differential Privacy researchers will find this easy to experiment and tinker with, allowing them to focus on what matters.

Installation

The latest release of Opacus can be installed via pip:

pip install opacus

OR, alternatively, via conda:

conda install -c conda-forge opacus

You can also install directly from the source for the latest features (along with its quirks and potentially occasional bugs):

git clone https://github.com/pytorch/opacus.git
cd opacus
pip install -e .

Getting started

To train your model with differential privacy, all you need to do is to instantiate a PrivacyEngine and pass your model, data_loader, and optimizer to the engine's make_private() method to obtain their private counterparts.

# define your components as usual
model = Net()
optimizer = SGD(model.parameters(), lr=0.05)
data_loader = torch.utils.data.DataLoader(dataset, batch_size=1024)

# enter PrivacyEngine
privacy_engine = PrivacyEngine()
model, optimizer, data_loader = privacy_engine.make_private(
    module=model,
    optimizer=optimizer,
    data_loader=data_loader,
    noise_multiplier=1.1,
    max_grad_norm=1.0,
)
# Now it's business as usual

The MNIST example shows an end-to-end run using Opacus. The examples folder contains more such examples.

Migrating to 1.0

Opacus 1.0 introduced many improvements to the library, but also some breaking changes. If you've been using Opacus 0.x and want to update to the latest release, please use this Migration Guide

Learn more

Interactive tutorials

We've built a series of IPython-based tutorials as a gentle introduction to training models with privacy and using various Opacus features.

Technical report and citation

The technical report introducing Opacus, presenting its design principles, mathematical foundations, and benchmarks can be found here.

Consider citing the report if you use Opacus in your papers, as follows:

@article{opacus,
  title={Opacus: {U}ser-Friendly Differential Privacy Library in {PyTorch}},
  author={Ashkan Yousefpour and Igor Shilov and Alexandre Sablayrolles and Davide Testuggine and Karthik Prasad and Mani Malek and John Nguyen and Sayan Ghosh and Akash Bharadwaj and Jessica Zhao and Graham Cormode and Ilya Mironov},
  journal={arXiv preprint arXiv:2109.12298},
  year={2021}
}

Blogposts and talks

If you want to learn more about DP-SGD and related topics, check out our series of blogposts and talks:

FAQ

Check out the FAQ page for answers to some of the most frequently asked questions about differential privacy and Opacus.

Contributing

See the CONTRIBUTING file for how to help out. Do also check out the README files inside the repo to learn how the code is organized.

License

This code is released under Apache 2.0, as found in the LICENSE file.

opacus's People

Contributors

akashb-fb avatar anonymani avatar ashkan-software avatar birdx0810 avatar darktex avatar dependabot[bot] avatar eigengravy avatar facebook-github-bot avatar ftramer avatar huanyuzhang avatar ilyamironov avatar jessijzhao avatar johnlnguyen avatar karthikprasad avatar lariffle avatar ngimel avatar pierrestock avatar psolikov avatar r-barnes avatar romovpa avatar royrin avatar sagerkudrick avatar sarthakpati avatar sayanghosh avatar solosneros avatar stanislavglebik avatar tholop avatar touqir14 avatar walterddr avatar zycalice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opacus's Issues

Support for 3DConv layers

Feature

We would like opacus to support 3D conv layers for differentially-private training of video action recognition models.

Context

Looking at the code in
https://github.com/pytorch/opacus/blob/master/opacus/supported_layers_grad_samplers.py
I originally thought the implementation of 3D conv would be identical to that of 2D conv. Unfortunately, my naive implementation throws RuntimeError: Input Error: Only 4D input Tensors are supported (got 5D) during training. Is there a simple fix, or is there something deeper at play here?

ZeroDivisionError when alpha = 1.0

In _compute_rdp(q, sigma, alpha), the value of

return _compute_log_a(q, sigma, alpha) / (alpha - 1)

becomes zero when alpha = 1.0, a valid value for alpha order.

Cannot correctly install the package

Try to install it through:
pip install opacus

Cannot correctly install the package. I checked the file in site-packages, it is named opacus-0.0.dist-info with nearly nothing in it.

FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'

I'm trying to build a conda-forge package out of this but when I try to build the recipe with conda skeleton pypi opacus I have the following error

FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
$PYTHONPATH = /tmp/tmpm3lk0o9wconda_skeleton_opacus-0.9.0.tar.gz/opacus-0.9.0

Unsupported modules - torchdp.dp_model_inspector.IncompatibleModuleException

When I run pytorch-dp on a network with embedding layers and linear layers, i get the following exception - torchdp.dp_model_inspector.IncompatibleModuleException. From what I understood from this post, it is thrown when the functionality is not supported. Is there a work-around available to run pytorch-dp for nn.Embedding layers ?

Discovering the meaning of ε-DP.


name: "Feature Request"
about: Submit a proposal/request for a new feature


Feature

This tool can guarantee certain values of ε, however, the meaning of this ε is dependent on the context. Is there a way to understand what level of privacy this ε ascertains? In the original paper on ε-DP (which is different than Renyi DP that has been implemented in this case) it becomes clear that:

Given ε ≥ 0, a mechanism M is ε-differentially private if, for any two neighboring databases D and D′ and for any subset S ⊆ R of outputs:
Pr[M(D) ∈ S] ≤ e^ε ·Pr[M(D′) ∈ S].

As I said, this is different than Renyi DP and also different than (ε, δ)-DP; but the tenor is similar. Is the possible to find out what the value of Pr[M(D′) ∈ S] and Pr[M(D) ∈ S] are by using PyTorch-DP. Because if the value e^ε becomes greater than ·Pr[M(D′) ∈ S], the ε loses all meaning. Because any probability will be smaller or equal to a probability that is greater than 1. Or a similar statistic that can be used to say something meaningful about "what real level of privacy" this ε ensures.

error when batch_size is greater than sample_size.

If q is 1 then the following line has to calculate log(0) which raises ValueError: math domain error:

z0 = sigma ** 2 * math.log(1 / q - 1) + 0.5

In fact, q is the sample_rate in this line:

self.sample_rate = batch_size / sample_size

Is there any theoretical reason that sample_rate cannot be 1? I know it is rare, but is it forbidden at all?

if there might be a situation where one can set sample_rate=1, isn't it better to handle this with an exception?

Issue with bounding sensitivity

Hi,

I do want to address the problem of RDP calculation is the code. As I went through the paper Renyi Differential Privacy of the Sampled Gaussian Mechanism, the calculations are for l2-sensitivity of 1.

In the code, there is a parameter defined as --sigma or noise_multiplier which will be multiplied by the self.max_grad_norm and will be used to calculate RDP. The self.max_grad_norm limits the sensitivity of the function. However, the self.max_grad_norm parameter plays no part in the calculation of RDP. It is inconsistent with the definitions of the main paper.

Assume we increase the self.max_grad_norm value to a large number. Should we still get the same e-RDP guaranty?

Also the $\sigma$ of the additive noise is self.noise_multiplier * self.max_grad_norm according to this part of the code but it equals to self.noise_multiplier for calculation of the RDP as implemented here.

I think we should change the lines of this part, to the following:
def get_renyi_divergence(self): rdp = torch.tensor( tf_privacy.compute_rdp( self.sample_rate, self.noise_multiplier * self.max_grad_norm, 1, self.alphas ) ) return rdp

Error in computing gradients

I am adapting a non-private word2vector embedding to Opacus. My code worked before today, but today it collapses with the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-40-e2a3920634d2> in <module>
      9         target_tensor, context_tensor, negative_tensor = context_tuple_batches[i]
     10         loss = net(target_tensor, context_tensor, negative_tensor)
---> 11         loss.backward()
     12         optimizer.step()
     13         losses.append(loss.data)

/mnt/xarfuse/uid-227560/8f6d9a4c-seed-nspid4026531836-ns-4026531840/torch/tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    231                 create_graph=create_graph,
    232                 inputs=inputs)
--> 233         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    234 
    235     def register_hook(self, hook):

/mnt/xarfuse/uid-227560/8f6d9a4c-seed-nspid4026531836-ns-4026531840/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    145     Variable._execution_engine.run_backward(
    146         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 147         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    148 
    149 

../opacus/autograd_grad_sample.py in _capture_backprops(layer, inputs, outputs, loss_reduction, batch_first)
    190 
    191     backprops = outputs[0].detach()
--> 192     _compute_grad_sample(layer, backprops, loss_reduction, batch_first)
    193 
    194 

../opacus/autograd_grad_sample.py in _compute_grad_sample(layer, backprops, loss_reduction, batch_first)
    251         get_layer_type(layer)
    252     )
--> 253     compute_layer_grad_sample(layer, A, B)

../opacus/supported_layers_grad_samplers.py in _compute_embedding_grad_sample(layer, A, B, batch_dim)
    311 
    312     _create_or_extend_grad_sample(
--> 313         layer.weight, torch.einsum("n...ij->nij", gs), batch_dim
    314     )
    315 

/mnt/xarfuse/uid-227560/8f6d9a4c-seed-nspid4026531836-ns-4026531840/torch/functional.py in einsum(equation, *operands)
    369         return einsum(equation, *_operands)
    370 
--> 371     return _VF.einsum(equation, operands)  # type: ignore
    372 
    373 

RuntimeError: einsum() ellipsis (...) covering one or more dimensions was given in the input but not in the output

Here is my network definition and the loss function:

import torch
import torch.nn as nn
import torch.autograd as autograd
import torch.optim as optim
import torch.nn.functional as F


class Word2Vec(nn.Module):

    def __init__(self, embedding_size, vocab_size):
        super(Word2Vec, self).__init__()
        self.embeddings_target = nn.Embedding(vocab_size, embedding_size)
        self.embeddings_context = nn.Embedding(vocab_size, embedding_size)

    def forward(self, target_word, context_word, negative_example):
        emb_target = self.embeddings_target(target_word)
        
        bsz, neg_sample_sz = negative_example.shape[0], negative_example.shape[1]
        combined_word = torch.cat([context_word.unsqueeze(1), negative_example], dim=1)
        emb_combined = self.embeddings_context(combined_word)
        emb_context = emb_combined[:,0]
        emb_negative = emb_combined[:,1:].view(bsz,neg_sample_sz, -1)
        
        emb_product = torch.mul(emb_target, emb_context)
        emb_product = torch.sum(emb_product, dim=1)
        out = torch.sum(F.logsigmoid(emb_product))
        
        
        emb_product = torch.bmm(emb_negative, emb_target.unsqueeze(2))
        emb_product = torch.sum(emb_product, dim=1)
        out += torch.sum(F.logsigmoid(-emb_product))
        return -out

image

I am also providing how I train the neural network, but I believe there is nothing new.

import time
from opacus import PrivacyEngine

vocabulary_size = len(vocabulary)
num_epoch = 20

batch_size = 20
noise_multiplier = 0.1
max_grad_norm = 15
delta = 8e-5

loss_function = nn.CrossEntropyLoss()
net = Word2Vec(embedding_size=100, vocab_size=vocabulary_size)
optimizer = optim.Adam(net.parameters(),lr = 1e-3)

privacy_engine = PrivacyEngine(
    net,
    batch_size=batch_size,
    sample_size=len(context_tuple_list),
    alphas=[1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64)),
    noise_multiplier=noise_multiplier,
    max_grad_norm=max_grad_norm,
    batch_first=True,
    loss_reduction = "sum"
)
privacy_engine.attach(optimizer)
for j in range(0,num_epoch):
    losses = []
    context_tuple_batches = get_batches(context_tuple_list, batch_size)
    print (len(context_tuple_batches))
    for i in range(len(context_tuple_batches)):
        net.zero_grad()
        target_tensor, context_tensor, negative_tensor = context_tuple_batches[i]
        loss = net(target_tensor, context_tensor, negative_tensor)
        loss.backward()
        optimizer.step()
        losses.append(loss.data)

Thanks in advance,
Huanyu Zhang

`pip install torchdp` is squatted by a "security package"

I did the mistake of quickly reading through an example showing from torchdp import PrivacyEngine without reading the README first. Naturally, I tried to install pytorch-dp with pip install torchdp.

An error was triggered when I ran my code, and at a closer look I had the bad surprise of seeing that suspicious requests where being sent to dns1.alexbirsan-hacks-paypal.com. The requests contained some data collected from my computer (domain name, user, current directory).

Luckily, this package was harmless and seems to belong to a security researcher, but I think that this is an issue worth mentioning. And also, be careful when you install PyPI packages.

pip install fails with the cloned repo

Hi,
Thanks for your efforts.
Just wanted to report that cloning the repo and installing using pip install . fails with the following error:

ERROR: Command errored out with exit status 1:
 command: /mhelali/anaconda3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-qh3uxtnj/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-qh3uxtnj/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-req-build-qh3uxtnj/pip-egg-info
     cwd: /tmp/pip-req-build-qh3uxtnj/
Complete output (7 lines):
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-req-build-qh3uxtnj/setup.py", line 12, in <module>
    long_description = fh.read()
  File "/mhelali/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1783: ordinal not in range(128)
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

The issue is fixed by adding encoding="utf8" to lines 11 and 15 of setup.py

A bug in the tutorial building_image_classifier.ipynb

According to the description of the virtual_step(self) method in the following line

>>> optimizer.zero_grad()

the command optimizer.zero_grad() should be called after a real step and not at each step.

In the current tutorial, the command optimizer.zero_grad() is called for every step (real or virtual) which seems to be wrong.

" optimizer.zero_grad()\n",

This line (optimizer.zero_grad()) should be moved and placed right after optimizer.step()

Question: why using backward_hooks on Modules and not on Tensors?

Hi!
I have a quick question: I've seen that you have built backward hooks for many nn.Module classes where you basically compute the per-sample gradient. Do you think it could be possible to do this at a tensor level, or it would be impossible to capture the "per-sample" notion?

Generate tarball release

Feature

Since there are no tarball files released for this package here on GitHub or on PyPi, is it possible to generate/compile one either here or on pypi. Working on building this for conda-forge.

Thank you!

Alternatives

Additional context

Plays nice with dataparallel?

Hello, is the imagenet example code known to play well with DataParallel?
I got the following error when trying to train with multiple GPUs:

$ CUDA_VISIBLE_DEVICES=0,1,2,3 python imagenet.py --lr 0.1 --sigma 0.5 -c 1.5 --batch-size 256 --epochs 10 . --workers 32

Output is

PRIVACY ENGINE ON
Traceback (most recent call last):
File "imagenet.py", line 652, in
main()
File "imagenet.py", line 275, in main
main_worker(args.gpu, ngpus_per_node, args)
File "imagenet.py", line 442, in main_worker
train(train_loader, model, criterion, optimizer, epoch, args)
File "imagenet.py", line 509, in train
optimizer.step()
File "/home/tflynn/pylocal/lib/python3.7/site-packages/torchdp/privacy_engine.py", line 73, in dp_step
self.privacy_engine.step()
File "/home/tflynn/pylocal/lib/python3.7/site-packages/torchdp/privacy_engine.py", line 98, in step
clip_values = self.clipper.step()
File "/home/tflynn/pylocal/lib/python3.7/site-packages/torchdp/per_sample_gradient_clip.py", line 206, in step
autograd_grad_sample.compute_grad_sample(self.module, batch_dim=self.batch_dim)
File "/home/tflynn/pylocal/lib/python3.7/site-packages/torchdp/autograd_grad_sample.py", line 153, in compute_grad_sample
_check_layer_sanity(layer)
File "/home/tflynn/pylocal/lib/python3.7/site-packages/torchdp/autograd_grad_sample.py", line 121, in _check_layer_sanity
f"No activations detected for {type(layer)},"
ValueError: No activations detected for <class 'torch.nn.modules.conv.Conv2d'>, run forward after add_hooks(model)

However, there is no issue when using 1 GPU.

upgrade to 0.10.0 fails in python 3.8.5 environment

Bug

I tried pip install opacus==0.10.0 in my python 3.8.5 conda env and got the following error:

ERROR: Could not find a version that satisfies the requirement dataclasses==0.7 (from opacus==0.10.0) (from versions: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6)
ERROR: No matching distribution found for dataclasses==0.7 (from opacus==0.10.0)

dataclasses==0.7 seems to be unavailable.

Alternative

I tried to install opacus==0.10.0 in a new and clean environment to be sure, same result.

Additional context

First discussed in issue #88 .

Using Opacus in Google Colab: `torchcsprng` issue.

At the moment, using Opacus on Google Colab faces this error from privacy_engine.py:

---> 10 from torchcsprng._C import *
ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory

Due to this line:

import torchcsprng as csprng

Silly, but my current hack is to just comment out this line before running the code. For sure Colab is not a production environment :)

I was wondering whether there is a better solution for this or not?

Removing Dataloader as a parameter in PrivacyEngine

Feature

Currently, PrivacyEngine requires a dataloader to be passed in as a parameter to calculate the sample rate (see code). This is the only time the we use the dataloader (see grep). Here is no other need for the dataloader asides from that.

Alternatives

I suggest we let the user pass in the sample rate as an optional parameter instead and remove the dependency on the dataloader.

Additional context

We only depend on the sample rate to calculate the privacy budget not the dataloader, Am I missing something here?

Error for convolutions with non-default stride/padding

Reshape in autograd_grad_sample.compute_grad_sample fails for a convolution layer with non-default parameters.

It's caused by the shapes in A and B not matching up. I think the fix is a case of changing A to

A = torch.nn.functional.unfold(A, layer.kernel_size, padding=layer.padding, stride=layer.stride)

Can't install opacus with: pip install opacus

pip install opacus
ERROR: Could not find a version that satisfies the requirement opacus (from versions: none)
ERROR: No matching distribution found for opacus

My configurations:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

python --version
Python 3.6.2 :: Anaconda, Inc.

import torch
print(torch.version)
1.6.0

problem about run the opacus example

Feature

I try to run the opacus example in the pycharm, but it occurs the error :
AttributeError: 'Parameter' object has no attribute 'grad_sample'.
And I don't know how to resolve it.

Effect of constant batch size to the privacy calculation

It seems the privacy calculation of sampled mechanisms in Opacus is done based on https://arxiv.org/abs/1908.10530. In this paper, it is assumed that each element is sampled independently with probability q. That means we can have different batch sizes for each iteration though average batch size is qN, where N is the total number of elements in the dataset. However, in the examples under https://github.com/pytorch/opacus/tree/master/tutorials, a constant batch size is used. This may violate the independence assumption. Simply, given qN elements are already chosen, the probability of any other sample being chosen is zero, instead of q. I wonder if the analysis in the paper is still valid in constant batch size. If not, then doesn't using constant batch size underestimate the privacy spent?

DPLSTM for multiclass text classification

Hi, I was trying to use LSTM for text classification with sequence by referring to char-lstm-classification.py.

class LSTMClassifier(nn.Module):
    # https://github.com/prakashpandey9/Text-Classification-Pytorch/blob/master/load_data.py
    # + Opacus example
    def __init__(
        self,
        batch_size,
        output_size,
        hidden_size,
        vocab_size,
        embedding_length,
        weights,
    ):
        super(LSTMClassifier, self).__init__()

        self.batch_size = batch_size
        self.output_size = output_size
        self.hidden_size = hidden_size
        self.vocab_size = vocab_size
        self.embedding_length = embedding_length

        self.embedding = nn.Embedding(
            vocab_size, embedding_length
        )  # Initializing the look-up table.

        self.lstm = DPLSTM(embedding_length, hidden_size, batch_first=False)
   
        self.out_layer = nn.Linear(hidden_size, output_size)

    def forward(self, input, hidden):
        input_emb = self.embedding(input)
        input_emb = input_emb.permute(1, 0, 2)
        lstm_out, _ = self.lstm(input_emb, hidden)
        # batch dimension = 1 is needed throughout, so we add an additional
        # dimension and subsequently remove it before the softmax
        output = self.out_layer(lstm_out[-1].unsqueeze(0))
        return output[-1]

    def init_hidden(self):
        return (
            torch.zeros(1, self.batch_size, self.hidden_size),
            torch.zeros(1, self.batch_size, self.hidden_size),
        )

This model works with regular LSTM, fails with DPLSTM, where error appears on loss.backward(),
RuntimeError: the size of tensor a (100) must match the size of tensor b (16) at non-singleton dimension 0
16 is a batch size and 100 is input text sequence length.
Is there any insights to why this error occurs? Thank you!

Hook error when training word2vector

Hi,

I am adapting a non-private word2vector embedding to Opacus (https://rguigoures.github.io/word2vec_pytorch/), and I am met with the following error when backpropagating:

RuntimeError                              Traceback (most recent call last)
<ipython-input-35-02a99ed52705> in <module>
      8         loss = net(target_tensor, context_tensor, negative_tensor)
      9         print (loss)
---> 10         loss.backward()
     11         print ("bingo")
     12         optimizer.step()

/mnt/xarfuse/uid-227560/a55862f8-seed-nspid4026531836-ns-4026531840/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    212                 retain_graph=retain_graph,
    213                 create_graph=create_graph)
--> 214         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    215 
    216     def register_hook(self, hook):

/mnt/xarfuse/uid-227560/a55862f8-seed-nspid4026531836-ns-4026531840/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    125     Variable._execution_engine.run_backward(
    126         tensors, grad_tensors, retain_graph, create_graph,
--> 127         allow_unreachable=True)  # allow_unreachable flag
    128 
    129 

../opacus/autograd_grad_sample.py in _capture_backprops(layer, inputs, outputs, loss_reduction, batch_first)
    190 
    191     backprops = outputs[0].detach()
--> 192     _compute_grad_sample(layer, backprops, loss_reduction, batch_first)
    193 
    194 

../opacus/autograd_grad_sample.py in _compute_grad_sample(layer, backprops, loss_reduction, batch_first)
    251         get_layer_type(layer)
    252     )
--> 253     compute_layer_grad_sample(layer, A, B)

../opacus/supported_layers_grad_samplers.py in _compute_embedding_grad_sample(layer, A, B, batch_dim)
    308     """
    309     one_hot = F.one_hot(A, num_classes=layer.weight.shape[0])
--> 310     gs = torch.einsum("n...i,n...j->n...ij", one_hot, B)
    311 
    312     _create_or_extend_grad_sample(

/mnt/xarfuse/uid-227560/a55862f8-seed-nspid4026531836-ns-4026531840/torch/functional.py in einsum(equation, *operands)
    343         return einsum(equation, *_operands)
    344 
--> 345     return _VF.einsum(equation, operands)  # type: ignore
    346 
    347 
RuntimeError: ellipsis must represent 1 dimensions in all terms

My network is defined as:

import torch
import torch.nn as nn
import torch.autograd as autograd
import torch.optim as optim
import torch.nn.functional as F

class Word2Vec(nn.Module):
    def __init__(self, embedding_size, vocab_size):
        super(Word2Vec, self).__init__()
        self.embeddings_target = nn.Embedding(vocab_size, embedding_size)
        self.embeddings_context = nn.Embedding(vocab_size, embedding_size)
    def forward(self, target_word, context_word, negative_example):
        emb_target = self.embeddings_target(target_word)
        emb_context = self.embeddings_context(context_word)
        emb_product = torch.mul(emb_target, emb_context)
        emb_product = torch.sum(emb_product, dim=1)
        out = torch.sum(F.logsigmoid(emb_product))
        emb_negative = self.embeddings_context(negative_example)
        emb_product = torch.bmm(emb_negative, emb_target.unsqueeze(2))
        emb_product = torch.sum(emb_product, dim=1)
        out += torch.sum(F.logsigmoid(-emb_product))
        return -out

image

And my training process:

from opacus import PrivacyEngine

vocabulary_size = len(vocabulary)
num_epoch = 20

batch_size = 500
noise_multiplier = 0.5
max_grad_norm = 10
delta = 8e-5

loss_function = nn.CrossEntropyLoss()
net = Word2Vec(embedding_size=100, vocab_size=vocabulary_size)
optimizer = optim.Adam(net.parameters())

privacy_engine = PrivacyEngine(
    net,
    batch_size=batch_size,
    sample_size=len(context_tuple_list),
    alphas=[1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64)),
    noise_multiplier=noise_multiplier,
    max_grad_norm=max_grad_norm,
    batch_first=True,
    loss_reduction = "sum"
)
privacy_engine.attach(optimizer)

for j in range(0,num_epoch):
    losses = []
    context_tuple_batches = get_batches(context_tuple_list, batch_size)
    for i in range(len(context_tuple_batches)):
        net.zero_grad()
        target_tensor, context_tensor, negative_tensor = context_tuple_batches[i]
        loss = net(target_tensor, context_tensor, negative_tensor)
        loss.backward()
        optimizer.step()
        losses.append(loss.data)
    #print("Loss: ", np.mean(losses))
    epsilon, best_alpha = optimizer.privacy_engine.get_privacy_spent(
            delta)
    print (
            f"Epoch={j} / Loss={np.mean(losses):.4f} / "
            f"Ɛ = {epsilon:.2f}, 𝛿 = {delta:.2f}) for α = {best_alpha:.2f}"
         )

Thanks!

Extend opacus.DPLSTM to support PackedSequences

🚀 Feature

Extend opacus.DPLSTM to work with PackedSequences.

This is a good first issue to contribute, and we would very much welcome a PR!

Motivation

The PackedSequence format allows us to minimize padding in a batch by "zipping" sequences together, and keeping track of the lengths. It is a very commonly-used format for torch.nn.LSTM and it is the only feature that our DPLSTM reimplementation does not yet support.

Pitch

A common problem in NLP (or in dealing with sequences of stuff, more in general) is uneven lengths of examples in a batch. For example, if you want to batch short sentence and very long long long long sentence together in a single tensor, you need to handle the fact that the first sequence has length of 2 and the second has length of 6. Matrices (and tensors) can't have uneven rows :)

A common approach to this is simply padding: add a special token, eg <pad> to all sequences as needed to reach the length of the longest sequence in the batch (eg short sentence <pad> <pad> <pad> <pad>). This works, but leads to wasted memory and computation (the LSTM will still run through the pads). A better approach is having a format such as PackedSequence, which instead stacks and zips the sequences, and keeps track of all the relative lengths (see *Additional Context for more info on how this is done). This way, we don't waste neither memory nor compute. In our example before, we would pack our two sequences together like this:

[short, very, sentence, long, long, long, long, sentence] with lens=[2, 6].

So we "zip" them, and then we just keep going once the shorter sequence is finished. Why do we zip them? Think about how the LSTM runs: it goes step by step, but you can batch every step! This way, we start by running on both short and very (first items in each seq), then we continue to sentence and long, and then we can return the result for the first sequence but we keep going for the second sequence (we know how to do this because we have lens to tell us).

Additional context

See more tutorials about PackedSequences:

  1. https://gist.github.com/HarshTrivedi/f4e7293e941b17d19058f6fb90ab0fec
  2. https://discuss.pytorch.org/t/understanding-pack-padded-sequence-and-pad-packed-sequence/4099/6
  3. This StackOverflow answer has a good visual explanation.

CUDA 11.0 support

Feature

Does opacus support CUDA 11.0?
If not, are you planning to support CUDA 11.0 in the future? If so when?

Alternatives

Can one build opacus or install a nightly to get CUDA 11.0 support?

Additional context

Our computation cluster now has nodes with CUDA 11.0, so I need to adjust my project to work with the new CUDA.

I followed the instructions of the admins including installing anything torch related with these instructions.

But when I run my code opacus seems to look for CUDA 10.2 libs:

$ cat log.txt 

------------------------ /usr/share/Modules/modulefiles ------------------------
dot         module-git  module-info modules     null        use.own

------------------------------- /etc/modulefiles -------------------------------
cuda/10.0          cuda/11.0(default) cuda/9.0
cuda/10.1          cuda/8.0           cuda/9.1
cuda/10.2          cuda/8.0-ga2       cuda/9.2
Traceback (most recent call last):
  File "/path/to/storage/code/HFL_PyTorch/src/federated-hierarchical_main.py", line 18, in <module>
    import privacy_engine_xl as dp_xl
  File "/path/to/storage/code/HFL_PyTorch/src/privacy_engine_xl.py", line 2, in <module>
    import opacus
  File "/path/to/storage/miniconda3/envs/HFL_opacus/lib/python3.8/site-packages/opacus/__init__.py", line 6, in <module>
    from .privacy_engine import PrivacyEngine
  File "/path/to/storage/miniconda3/envs/HFL_opacus/lib/python3.8/site-packages/opacus/privacy_engine.py", line 10, in <module>
    import torchcsprng as csprng
  File "/path/to/storage/miniconda3/envs/HFL_opacus/lib/python3.8/site-packages/torchcsprng/__init__.py", line 10, in <module>
    from torchcsprng._C import *
ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory

Also having issues upgrading to opacus==0.10.0, but this might be unrelated:

ERROR: Could not find a version that satisfies the requirement dataclasses==0.7 (from opacus==0.10.0) (from versions: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6)
ERROR: No matching distribution found for dataclasses==0.7 (from opacus==0.10.0)

DPLSTM Memory Footprint

I was recently running a sample network on the IMDB dataset. The network is defined below:


class SampleNet(nn.Module):
    def __init__(self, vocab_size: int, batch_size):
        super(SampleNet, self).__init__()
        # Embedding dimension: vocab_size + <unk>, <pad>, <eos>, <sos>
        self.emb = nn.Embedding(vocab_size + 4, 100)
        self.h_init = torch.randn(1, batch_size, 100).to(device)
        self.c_init = torch.randn(1, batch_size, 100).to(device)
        self.hidden = (self.h_init, self.c_init)
        self.lstm = DPLSTM(100, 100, batch_first = True)
        self.pool = nn.AvgPool1d(256)
        self.fc1 = nn.Linear(100, 2)

    def forward(self, x):
        x = self.emb(x)
        x, _ = self.lstm(x, self.hidden)
        x = x.transpose(1,2)
        x = self.pool(x).squeeze()

        x = self.fc1(x)
        return x

Some notes on the code, the vocab_size is 10_000, the batch size is 100. When I run this code, torch gives me an out of memory error saying the following:

RuntimeError: CUDA out of memory. Tried to allocate 95.41 GiB (GPU 0; 3.95 GiB total capacity; 2.18 GiB already allocated; 369.06 MiB free; 2.21 GiB reserved in total by PyTorch)

Note that when I use the regular torch.nn.LSTM -- this error does not occur. I'm curious of why the memory footprint is extremely large for the DPLSTM implementation and could it possibly be because I am doing something wrong?

EDIT: This also only happens in the backward pass.

BERT LayerNorm doesn't support

Hello, I want to use opacus on the finetune of the BERT model, but it shows that opacus does not support the LayerNorm module.

class BertLayerNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-12):
        """Construct a layernorm module in the TF style (epsilon inside the square root).
        """
        super(BertLayerNorm, self).__init__()
        self.weight = nn.Parameter(torch.ones(hidden_size))
        self.bias = nn.Parameter(torch.zeros(hidden_size))
        self.variance_epsilon = eps

    def forward(self, x):
        u = x.mean(-1, keepdim=True)
        s = (x - u).pow(2).mean(-1, keepdim=True)
        x = (x - u) / torch.sqrt(s + self.variance_epsilon)
        return self.weight * x + self.bias

May I ask what caused it.And is there any method to solve it?

Pytorch-dp issue: Train differentially private LSTM for text generation?

I would like to integrate pytorch-dp in an LSTM to generate differentially private synthetic text. Without DP, I was using word-level language modeling RNN (with multi-layer LSTM) from the pytorch/examples repository (https://github.com/pytorch/examples/tree/master/word_language_model).

I tried a few things to integrate pytorch-dp, but ran into compatibility issues of privacy engine with some of the modules and would like to know whether (and if so, how) the current version could be used in my situation at all.

If there is a chance that pytorch-dp could be used in this case, this is what I tried so far:

  1. I first followed the readme and created the Privacy Engine, attaching it to an optimizer and the current model. This resulted in an error:
    “torchdp.dp_model_inspector.IncompatibleModuleException: Model contains incompatible modules. Some modules are not valid.: ['Main.encoder', 'Main.rnn’]”
    My understanding is that pytorch-dp does not support pytorch RNN and specifically LSTM layers and does not work with the encoder module.

  2. You provide a DPLSTM class which theoretically seems like an alternative to the pytorch LSTM model, but seems to be missing functions and properties that are necessary to integrate it in the text generator (e.g. encoder and decoder, but potentially also the possibility of using a multi-layer LSTM or dropout).
    Disregarding the usefulness of the model for the task, using DPLSTM as model still resulted in an issue, but this time the invalid module was just ‘Main’ in general, which I was not sure how to interpret.

I would love to integrate pytorch-dp in my project, but I am aware that it is still in development and the examples seem to be focussed on classification rather than generative models.

It would be great if you could let me know whether the issue is with pytorch-dp or my understanding of it.

Thanks in advance!

Why the sensitivity is C?

I read Abadi's paper and this code, but I can't understand why the sensitivity is C.
As the maximum distance of clipped gradient is 2C for two datasets differing one data, I think sensitivity should be 2C.
I agree that C is enough for sensitivity, but I think 2C is more strict and appropriate to be sensitivity.
Thanks.

Minor typo

python mnist.ru ...
should be replace by
python mnist.py ...
in examples/mnist_README.md

pytorch running error

Hello,when I used "pyhon mnist.py --device=cpu -n=15 --lr=.25 --sigma=1.3 -c=1.5 -b=250", there are some prolem. as follows:

TypeError: normal() received an invalid combination of arguments - got (int, Tensor, torch.Size, device=torch.device, generator=torch._C.Generator), but expected one of:

  • (Tensor mean, Tensor std, torch.Generator generator, Tensor out)
  • (Tensor mean, float std, torch.Generator generator, Tensor out)
  • (float mean, Tensor std, torch.Generator generator, Tensor out)
    Thanks for your answer!

proposal to handle wasserestein loss (multiple loss.backward()) in pytorch-dp

Current implementation of Pytorch-dp does not support Wasserstein Loss in GAN (not support multiple loss.backward())

issue

We are working on integrating pytorch-dp to GAN model to generate differential private synthetic data. Currently, pytorch-dp can only support a single loss.backward() before calling optimizer.step(), this will not work for Wasserstein Loss in GAN.

why important

Wasserstein Loss with gradient penalty was approved to help alleviate the issues of mode collapse that KL divergence introduced and has been used by many different variants of GAN models.

possible solutions

One temporal work-around is to update _create_or_extend_grad_sample() in supported_layers_grad_samplers.py. When doing multiple loss.backward() and not virtual step mode, instead of doing torch.cat((param.grad_sample, grad_sample, batch_dim) , making it as accumulative sum such as "param.grad_sample = param.grad_sample + grad_sample" .

current implementation
`def _create_or_extend_grad_sample(
param: torch.Tensor, grad_sample: torch.Tensor, batch_dim: int
) -> None:
"""
Create a 'grad_sample' attribute in the given parameter, or append to it
if the 'grad_sample' attribute already exists.
"""

if hasattr(param, "grad_sample"):
    # pyre-fixme[16]: `Tensor` has no attribute `grad_sample`.
    param.grad_sample = torch.cat((param.grad_sample, grad_sample), batch_dim)
else:
    param.grad_sample = grad_sample`

suggested implementation when not in virtual step mode
def _create_or_extend_grad_sample(
param: torch.Tensor, grad_sample: torch.Tensor, batch_dim: int
) -> None:
"""
Create a 'grad_sample' attribute in the given parameter, or append to it
if the 'grad_sample' attribute already exists.
"""

if hasattr(param, "grad_sample"):
    # pyre-fixme[16]: `Tensor` has no attribute `grad_sample`.
    param.grad_sample = param.grad_sample +  grad_sample
else:
    param.grad_sample = grad_sample

ModuleNotFoundError: No module named 'ocapus' after successful pip install

Issue

Cant seem to import ocapus in my IDE.

import ocapus
Traceback (most recent call last):

  File "<ipython-input-1-96ce97d71d0c>", line 1, in <module>
    import ocapus

ModuleNotFoundError: No module named 'ocapus'

I think I've pip installed correctly as I got these messages after pip installing.

Installing collected packages: opacus
Successfully installed opacus-0.10.1

I'm running python 3.6.12 and pytorch 1.7 in my environment.

python                    3.6.12               h5500b2f_2
pytorch                   1.7.0           py3.6_cuda102_cudnn7_0    pytorch
opacus                    0.10.1                   pypi_0    pypi

Any ideas why I can't get ocapus going?

Microbatch Support

Though clipping gradient for each instance brings higher accuracy, but it is very slow. Meanwhlie, layer.weight.grad_sample consumes too much memory, which forces me to use small model and small batch size. Could you support gradient clipping after averaging micro-batches like that in tensorflow-privacy?

Laplace and exponential noise distributions

Feature

Allowing other distributions for the noise created by the privacy_engine.
Especially Laplace and exponential.

The current implementation of the privacy_engine allows only the use of the normal distribution to create noise.

Alternatives

The privacy_engine uses torch.normal, but there is no such functions as torch.laplace or torch.exponential.
This is why I would suggest using the torch.distributions distributions.
This also appears to be more current approach according to this comment.

However I assume the reason you still use torch.normal is because torch.distributions.normal.Normal does not allow a custom generator and therefore does not support urandom.
So maybe this also requires an adjustment of the torch.distributions distributions?

Additional context

I created a sample implementation of a privacy_engine_xl with a modified _generate_noise method for our use case.
However I'd prefer the use of urandom and also a way to pass a device to the distributions.
It seems like Normal gets the device from the loc (aka mean) param if it is a Tensor, however that seems more like a hack than a solution, as the mean used in the privacy_engine is 0.

def _generate_noise(self, max_norm, parameter):
    if self.noise_multiplier > 0:
        mean = 0
        scale = self.noise_multiplier * max_norm

        if self.noise_type == "gaussian":
            dist = torch.distributions.normal.Normal(mean, scale)
        elif self.noise_type == "laplacian":
            dist = torch.distributions.laplace.Laplace(mean, scale)
        elif self.noise_type == "exponential":
            dist = torch.distributions.exponential.Exponential(1 / scale)
        else:
            dist = torch.distributions.normal.Normal(mean, scale)

        noise = dist.sample(parameter.grad.shape)

        return noise
    return 0.0

I'd be happy to submit a pull request later on, however I think this can not be the final solution.

Documentation:
https://pytorch.org/docs/master/distributions.html#normal
https://pytorch.org/docs/master/distributions.html#laplace
https://pytorch.org/docs/master/distributions.html#exponential

Problem with zero_grad() and virtual_step() in the tutorials

In the CIFAR tutorial, zero_grad() is called on the optimizer before every step, even for a virtual step:

optimizer.zero_grad()
.

I think that zero_grad() should only be called after a real step and not after a virtual step: if it is called every time it will erase the accumulated gradients. The same issue was raised (and solved) for another tutorial: #54.

This problem also appears in the text classification notebook:

" model.zero_grad()\n",
.

Example of Privacy Leak on Image datasets

Hello Team Opacus , I would like to understand if there are any examples anywhere which demonstrates how training with DP could mitigate model Inversion attacks , membership inference and other privacy attacks. I wanted it for RGB image classifiers. Thank you.

problem about download opacus

Feature

I try to download opacus but there exists problem that I can't solve. Here is the error:

ERROR: Could not find a version that satisfies the requirement torch==1.6.0 (from opacus) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch==1.6.0 (from opacus)
I have download torch1.6.0 . Is the version is not matched? What version of torch and torchversion should I change to?

backward hook function

Hi,

In the autograd_grad_sample function, you modified arguments given to the hook in-place.

I think this should be avoided as it is related to a known bug in PyTorch and may cause issues.
Is there a reason behind in-place modification instead of returning new values from the hook?

Thanks

MNIST example transform

Hi,

In the mnist example, why do you use the following transform and what's the reason behind those numbers:

transforms.Normalize((0.1307,), (0.3081,))

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.