Current implementation of Pytorch-dp does not support Wasserstein Loss in GAN (not su

I have been musing about making .grad_sample</c

proposal to handle wasserestein loss (multiple loss.backward()) in pytorch-dp,about pytorch/opacus

Comments (29)

wibrown commented on June 30, 2024 2

Yeah that makes sense. Can't share the full repo at the moment but here's a minimal WGAN implementation along which overrides the accumulator with summation, configured to run on Gaussian blobs.
main.py.txt

from opacus.

Nanboy-Ronan commented on June 30, 2024 1

I have also encountered the same. I suppose it is the similar problem, appreciate any advice.

File "/usr/local/lib/python3.7/dist-packages/opacus/optimizers/optimizer.py", line 199, in _get_flat_grad_sample
    "Per sample gradient is not initialized. Not updated in backward pass?"
ValueError: Per sample gradient is not initialized. Not updated in backward pass?

from opacus.

karthikprasad commented on June 30, 2024 1

Hi! Thanks everyone for reach-outs and sincerest apologies for the (frankly, inexcusable) delay in response. This issue is now on my radar and I will take a look at it this week.

from opacus.

Darktex commented on June 30, 2024

I have been musing about making .grad_sample accumulate for a while, as it would also simplify our API (we could remove virtual_step which isn't that aligned with PyTorch...).

I'm in the process of making a large refactor of this codebase to modularize and simplify it. I'll add this to the list and make sure to test it. Do you have a script you can share?

In the meantime, I'm not a GAN expert but I was wondering if it might be possible to express this as a single loss. Loss functions are nn.Modules so you can keep summing them, using one into another etc. This repo seems to be able to do W-GANs with a single backprop. Maybe this unblocks you while I work on this?

from opacus.

AprilXiaoyanLiu commented on June 30, 2024

I have been musing about making .grad_sample accumulate for a while, as it would also simplify our API (we could remove virtual_step which isn't that aligned with PyTorch...).

I'm in the process of making a large refactor of this codebase to modularize and simplify it. I'll add this to the list and make sure to test it. Do you have a script you can share?

In the meantime, I'm not a GAN expert but I was wondering if it might be possible to express this as a single loss. Loss functions are nn.Modules so you can keep summing them, using one into another etc. This repo seems to be able to do W-GANs with a single backprop. Maybe this unblocks you while I work on this?

Hi Darktex, thanks for your quick response. Agree that it can be combined as a single loss as nn.Modules accumulated them such as this loss_D = -torch.mean(discriminator(real_imgs)) + torch.mean(discriminator(fake_imgs)). This will work for pytorch but not pytorch-dp because torch-dp uses hooks to track grad+sample per layer. When you combine two loss together, for each loss it track the gradient, and with torch.cat(), you basically keep both grad together then do norm clip, which is not correct. We found a short-term workaround right now, which is to replace torch.cat() to sum as shown earlier. A long term solution will be beneficial.

from opacus.

Darktex commented on June 30, 2024

Got it. Can I use that repo I linked as a representative example for this so I can try it?

from opacus.

AprilXiaoyanLiu commented on June 30, 2024

Yup! I will also come up with a unittest to show some results

from opacus.

wibrown commented on June 30, 2024

Hi Darktex,
I'm also working on a project involving DP WGANs, and ran into the same issue as above when using torchdp. With no changes to torchdp, our code would fail on optimizer.step() due to a shape mismatch for the gradient. We implemented the change above (cat -> sum) to _create_or_extend_grad_sample, which allows our code to run, but we're now seeing very erratic behavior in training. Our training loop well works when the privacy engine is not attached and results in reasonably high quality samples. When attaching the engine, gradient norms and losses increase quite rapidly, even when using a large clipping bound (1) and almost no noise (which should have negligible effect on gradient quality). We are zeroing gradients before each step. Do you have an idea what might be causing the issue? And do you have any rough estimates for the timeline of implementing the refactor? Thanks!

from opacus.

AprilXiaoyanLiu commented on June 30, 2024

Hi Darktex,
I'm also working on a project involving DP WGANs, and ran into the same issue as above when using torchdp. With no changes to torchdp, our code would fail on optimizer.step() due to a shape mismatch for the gradient. We implemented the change above (cat -> sum) to _create_or_extend_grad_sample, which allows our code to run, but we're now seeing very erratic behavior in training. Our training loop well works when the privacy engine is not attached and results in reasonably high quality samples. When attaching the engine, gradient norms and losses increase quite rapidly, even when using a large clipping bound (1) and almost no noise (which should have negligible effect on gradient quality). We are zeroing gradients before each step. Do you have an idea what might be causing the issue? And do you have any rough estimates for the timeline of implementing the refactor? Thanks!

Hi wibrown, can I ask whether you implement loss in a combination way like this " loss_D = -torch.mean(discriminator(real_imgs)) + torch.mean(discriminator(fake_imgs)) " or seperate the loss? What I found if when I implementing the loss separately and converting torch.cat to sum, it works fine. Another side note, this will not work if you implement wasserstein gradient penalty. If you have wasserstein gradient penalty, need remove it.

from opacus.

wibrown commented on June 30, 2024

Our loss function is implemented just like that and we change torch.cat to sum, exactly as in the original post. We're using parameter clipping after each step, rather than gradient penalty. Losses grow rapidly and gradient norms fluctuate drastically, as shown in attached pic. We're using 1 for our max gradient norm and 1e-5 for the noise multiplier. Given the gradient norms we see, I wonder if something is going wrong in the gradient clipping process.

from opacus.

wibrown commented on June 30, 2024

It seems like the issue is still present in the recent opacus release (dimension mismatch errors on Wasserstein loss when not modifying the code as described above). Is this on the roadmap for being addressed? Thanks!

from opacus.

Darktex commented on June 30, 2024

Yes! We were busy with this transition and focused on documentation and stability of what we had before adding new features :) With the launch past us, we will now go back to adding features!

@wibrown do you have a link to your repo? This way I can run your code and start from where you are. In particular, I'd like to learn more about this part:

We're using parameter clipping after each step, rather than gradient penalty.

from opacus.

Darktex commented on June 30, 2024

I spent some more time on this one and I wanted to offer an update. The issue here is that per-sample gradients, maybe unlike "normal" gradients, can be accumulated in two different ways: either by summing or by concatenating. What do I do with two batches, each of size (B, H, D)? I could reduce them to one tensor of size (B, H, D) or I could concat them into one tensor of size (2B, H, D). You normally don't see this problem in PyTorch, because the batch information is already gone so it's clear that you want to always reduce subsequent batches.

To be honest, supporting both is something that we had not originally anticipated in our design. We are in the process of doing a major refactoring of the various components to make everything more modular and easier for research (I linked the PR to the first of these changes, which will streamline per-sample gradient computation). I will share a design document for the new PrivacyEngine API, and I will make sure this scenario is well supported :)

from opacus.

AprilXiaoyanLiu commented on June 30, 2024

I think we can support both (choose concatenate or sum grads) by adding another parameter to have the option?

from opacus.

AprilXiaoyanLiu commented on June 30, 2024

Our loss function is implemented just like that and we change torch.cat to sum, exactly as in the original post. We're using parameter clipping after each step, rather than gradient penalty. Losses grow rapidly and gradient norms fluctuate drastically, as shown in attached pic. We're using 1 for our max gradient norm and 1e-5 for the noise multiplier. Given the gradient norms we see, I wonder if something is going wrong in the gradient clipping process.

Looks like you've already solved this. The way it works for me is : in addition to do the sum grad, I manually clear the gradsample after each epoch with

for p in discriminator.parameters(): if hasattr(p, "grad_sample"): del p.grad_sample

from opacus.

greatjeffzhang commented on June 30, 2024

I want to ask if the issue is sovled or not. I plan to do some work on WGAN-GP, thanks.

from opacus.

XiangQiu42 commented on June 30, 2024

I want to ask if the issue is sovled or not. I plan to do some work on WGAN-GP, thanks.

Same question as you, I have tried WGAN-GP with opacus, but it doesn't work~

from opacus.

xierongpytorch commented on June 30, 2024

I want to add opacus to my model, but when I start the second round of training, the following error occurs: (there is no problem with only one training session)

loss.backward()

File "/mnt/DataDisk/conda/envs/syft/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/mnt/DataDisk/conda/envs/syft/lib/python3.7/site-packages/torch/autograd/init.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
File "/mnt/DataDisk/xierong/opacus/opacus/grad_sample/grad_sample_module.py", line 199, in capture_backprops_hook
module, backprops, loss_reduction, batch_first
File "/mnt/DataDisk/xierong/opacus/opacus/grad_sample/grad_sample_module.py", line 237, in rearrange_grad_samples
A = module.activations.pop()
IndexError: pop from empty list

I have no idea, can you help give some advice?

from opacus.

Flyige commented on June 30, 2024

omg,please fix this tool for seqGAN .facing the same issuse,i'm using this all for my paper .but now i have to switch to tensorflow..

from opacus.

Flyige commented on June 30, 2024

@Nanboy-Ronan
dude,i'm facing the same error as you do ,is there anyway u have sloved it yet?

from opacus.

karthikprasad commented on June 30, 2024

Alright folks! I took a look at this and I (think and hope I) bring good news. I have used the code from this repo to debug the issue.

For starters, the early discussion on this issue, while useful, is no longer relevant as we no longer have the concept of virtual_steps in opacus 1.* the way we did in pytorch-dp.

That said, the main problem of opacus disallowing multiple forward/backward without a call to optimizer.step() still exists, and is primarily a requirement for privacy accounting with poisson sampling. See this line of code
Technically, this restriction shouldn't apply to GAN usecase; the two forward/backward passes in this case are not from the same dataset and hence multiple forward/backward can be allowed. Practically, however, opacus doesn't know if a module is part of a GAN and if the data input is fake or real, and therefore assumes everything is from the same source and blocks the user from accidentally messing up.

To overcome this issue, the simplest approach would be to set poisson_sampling=False in the call to make_private(), and the code will run without errors. This is not a solution though, it is just a work-around. As mentioned here,

[disabling poisson sampling] doesn't fit the assumptions made by privacy accounting mechanism, but it can be a good approximation when using Poisson sampling is unfeasible.

The solution would be to simply disable privacy tracking and per-sample gradient computation when it is not necessary; as discussed in this post, we don't have to care about privacy of the fake dataset.

This colab implements the above suggestion and I have indicated my changes in comments. Hope this helps.

from opacus.

Flyige commented on June 30, 2024

Hi! .thx for your response first. but sadly I'm still facing this issue in my code,https://github.com/ZiJianZhao/SeqGAN-PyTorch.this is the code I'm using in my area. I did what u suggested, but it still doesn't work, it seems like I also need to rewrite a loss function to fit the input.to match the size of the loss caculation, I kind of packaged the loss in my own function like this :

class My_loss(nn.Module): def __init__(self,hidden_dim,vocabsize): super(My_loss, self).__init__() self.myloss=nn.NLLLoss(reduction='sum') self.softmax = nn.LogSoftmax() self.hidden_dim=hidden_dim self.lin=nn.Linear(hidden_dim,vocabsize) def forward(self, input, target): target = target.contiguous().view(-1) print(target.shape) pred = self.softmax(self.lin(input.contiguous().view(-1, self.hidden_dim))) loss=self.myloss(pred,target) return loss
there for my train_peoch part would be like this :

def train_epoch(model,dataloader, criterion, optimizer): total_loss = 0. total_words = 0. for data, target in dataloader:#tqdm( #data_iter, mininterval=2, desc=' - Training', leave=False): data = Variable(data) target = Variable(target) if opt.cuda: data, target = data.cuda(), target.cuda() pred,hidden_dim = model.forward(data) print('data',data.shape) print('inheretaregt',target.shape) loss = criterion(pred, target) total_loss += loss.item() total_words += data.size(0) * data.size(1) # data = data.contiguous().view(-1) optimizer.zero_grad() loss.backward() optimizer.step() # print(optimizer.state_dict()['param_groups'][0]['lr']) # dataloader.reset() return math.exp(total_loss / total_words)
because the original writer of this pytroch version seqGAN doesn't have it's own dataloader, so I also rewrite his data_iter.py code ,to package the dataloader of torch so I can put it in the make_private function.
`# -- coding:utf-8 --
class MyGenDataIter(Dataset):
""" Toy data iter to load digits"""
def init(self, data_file, batch_size):
super(MyGenDataIter, self).init()
self.batch_size = batch_size
self.data_lis = self.read_file(data_file)
self.data_num = len(self.data_lis)
self.indices = range(self.data_num)
self.num_batches = int(math.ceil(float(self.data_num) / self.batch_size))
self.idx = 0

def __len__(self):
    # print(self.data_lis)
    return len(self.data_lis)

def __iter__(self):
    return self

def __next__(self):
    return self.next()

def reset(self):
    self.idx = 0
    random.shuffle(self.data_lis)

def __getitem__(self, item):
    d=[self.data_lis[item]]
    d=torch.LongTensor(np.asarray(d,dtype='int64'))
    data=torch.cat((torch.zeros(1,1).long(),d),dim=1)
    target=torch.cat((d,(torch.zeros(1,1).long())),dim=1)
    # data=data.tolist()[0]
    # target=target.tolist()[0]
    data=data.ravel()
    target=target.ravel()
    return data,target
# def next(self):
#     if self.idx >= self.data_num:
#         raise StopIteration
#     index = self.indices[self.idx:self.idx + self.batch_size]
#     d = [self.data_lis[i] for i in index]
#     d = torch.LongTensor(np.asarray(d, dtype='int64'))
#     data = torch.cat([torch.zeros(self.batch_size, 1).long(), d], dim=1)  # 
#     target = torch.cat([d, torch.zeros(self.batch_size, 1).long()], dim=1)
#     self.idx += self.batch_size
#     return data, target

def read_file(self, data_file):
    with open(data_file, 'r') as f:
        lines = f.readlines()
    lis = []
    for line in lines:
        l = line.strip().split(' ')
        l = [int(s) for s in l]
        lis.append(l)
    return lis`

there for my make_private function would be like this:
generator,dpgen_optimizer,gen_data_iter_loader=privacy_engine.make_private(module=generator, optimizer=gen_optimizer, poisson_sampling=False, noise_multiplier = 1.1, data_loader=gen_data_iter_loader, max_grad_norm = 1.0,)
and of course I switch the lstm of DPlstm as doc required.
I'm sorry I barely a new learner of Pytorch ,so maybe I did some mistake in my code(or it's not my mistake hahaha),but this issue has bothered me a lot .if u could check this problem .as u can see ,I kind of satisfied the function's need ,but the error still the same.
hoep you can give me some help of my problem. thx!

from opacus.

karthikprasad commented on June 30, 2024

Hi @Flyige, the github repo you've shared doesn't seem to use opacus and the code you've pasted above is hard to follow due to formatting issues. Could you share your code (with the privacy engine attached) that throws the error in a colab?

from opacus.

Flyige commented on June 30, 2024

Hi! thank you again for ur quick response .unfortunately I can't get on to the colab website due to my place's international control(China's policy) lol. all I can tell u is that my code is from this link [(https://github.com/ZiJianZhao/SeqGAN-PyTorch.this)]. and I'm trying to use the opacus lib to make private for the generator of this seqGAN . I also kinda a new user of github so I apologize for the trouble u meet during reading my comments. I have already try my best to show you how I get to the error that I mentioned before.

from opacus.

karthikprasad commented on June 30, 2024

I see. In that case, could you share your code WITH your changes in a GitHub gist?

from opacus.

Flyige commented on June 30, 2024

Hi. to be more specific about the details ,I've already send u an e-mail .thank you for ur time and response.

from opacus.

karthikprasad commented on June 30, 2024

Hi! I haven't received any email. Could you share the gist here? Thanks.

from opacus.

Flyige commented on June 30, 2024

Hi! I have created a gist in here https://gist.github.com/Flyige/56d11bcbe8c1a6499a39b282b0cb3a68.this is my first time to use it so it takes a while to figure out what this is . and the e-mail I send to you was sent to a wrong address ,I have already send you a new one, apologize for that .

from opacus.

karthikprasad commented on June 30, 2024

Hi Flyige! Your gist has multiple patches on top of a different library, and I found it tedious to reconstruct a single reproducible code unit. :(
Could you share a gist of the code that I can simply copy-paste and run to see the error?

Also, since this issue is an old one pertaining to the earliest version of opacus (when it was still called pytorch-dp), I close it out to avoid any confusion. Please feel free to open a new issue once you have your gist ready. Thanks.

from opacus.

proposal to handle wasserestein loss (multiple loss.backward()) in pytorch-dp about opacus HOT 29 CLOSED

Comments (29)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent