jhjacobsen / invertible-resnet Goto Github PK

View Code? Open in Web Editor NEW

515.0 515.0 79.0 316 KB

Official Code for Invertible Residual Networks

License: MIT License

Python 99.42% Shell 0.58%

invertible-resnet's People

Contributors

Stargazers

Watchers

Forkers

jarrelscy dlwbm123 lingzenan ahmed-fau marcusdipaula shubhampachori12110095 codeaudit shankar0206 linhduongtuan yqgans simpleconjugate erikml chaoso stjordanis sigmafang thwjoy kpiyush1 fhopfmueller owen94 lmy0217 ishaqadenali sidharthgurbani youngflyasd sjtuhuoda suyanzhou626 bumbummen99 musyoku zumbalamambo freundr scape1989 emigmo david-klindt mikelasz happydetective olivia-wxhuang lilujunai fentahgg voletiv alexandermath canbuoy edi662 tonyztc maxin0002 wwhappylife tengfeihou lgoetz bstrive cyberiums wyf0912 andrehuang yunruiguo lingyixia schwartz-zha liuchongwei ruqibai jesselivezey ymcidence cv-ip nannanji yonabian ohadoh-math githubyuejian sailfish009 yeosunkyung 2877992943 tianshuo-xu lural1226 pkulwj1994 icmoon527 ps789 liuyixin-louis patrickbai97 maxzanella salmck iq-scm fourthm guojj33 chending007

invertible-resnet's Issues

Undefined names: Missing imports?

flake8 testing of https://github.com/jhjacobsen/invertible-resnet on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./CIFAR_main.py:311:102: F821 undefined name 'full_fname'
        interpolate(model, testloader, testset, start_epoch, use_cuda, best_objective, args.dataset, full_fname)
                                                                                                     ^
./models/utils_toy_densities.py:250:19: F821 undefined name 'model'
    clip_fc_layer(model, coeff, use_cuda)
                  ^
./models/utils_toy_densities.py:250:26: F821 undefined name 'coeff'
    clip_fc_layer(model, coeff, use_cuda)
                         ^
./models/utils_toy_densities.py:250:33: F821 undefined name 'use_cuda'
    clip_fc_layer(model, coeff, use_cuda)
                                ^
./models/utils_toy_densities.py:257:41: F821 undefined name 'model'
    out_bij, p_z_g_y, trace, gt_trace = model(inputs)
                                        ^
./models/utils_toy_densities.py:258:31: F821 undefined name 'model'
    log_det = compute_log_det(model, inputs, out_bij)
                              ^
./models/utils_toy_densities.py:277:41: F821 undefined name 'model'
    out_bij, p_z_g_y, trace, gt_trace = model(inputs)
                                        ^
./models/utils_toy_densities.py:278:31: F821 undefined name 'model'
    log_det = compute_log_det(model, inputs, out_bij)
                              ^
./models/model_utils.py:226:76: F821 undefined name 'num_units'
                         'multiple of group_size({})'.format(num_channels, num_units))
                                                                           ^
./models/invertible_layers.py:181:26: F821 undefined name 'Conv2dZeroInit'
        self.conv_zero = Conv2dZeroInit(c // 2, c, 3, padding=(3 - 1) // 2)
                         ^
./models/invertible_layers.py:187:16: F821 undefined name 'gaussian_diag'
        return gaussian_diag(mean, logs)
               ^
./models/invertible_layers.py:215:21: F821 undefined name 'NN_actnorm'
          self.NN = NN_actnorm(H, W, in_channels=num_features // 2, hidden_channels=width)
                    ^
./models/invertible_layers.py:217:21: F821 undefined name 'NN_layernorm'
          self.NN = NN_layernorm(H, W, in_channels=num_features // 2, hidden_channels=width)
                    ^
./models/invertible_layers.py:219:21: F821 undefined name 'NN_batchnorm'
          self.NN = NN_batchnorm(H, W, in_channels=num_features // 2, hidden_channels=width)
                    ^
./models/invertible_layers.py:237:21: F821 undefined name 'NN_actnorm'
          self.NN = NN_actnorm(H, W, in_channels=num_features // 2, hidden_channels=width, channels_out=num_features)
                    ^
./models/invertible_layers.py:239:21: F821 undefined name 'NN_layernorm'
          self.NN = NN_layernorm(H, W, in_channels=num_features // 2, hidden_channels=width, channels_out=num_features)
                    ^
./models/invertible_layers.py:241:21: F821 undefined name 'NN_batchnorm'
          self.NN = NN_batchnorm(H, W, in_channels=num_features // 2, hidden_channels=width, channels_out=num_features)
                    ^
./models/invertible_layers.py:250:22: F821 undefined name 'flatten_sum'
        objective += flatten_sum(torch.log(scale))
                     ^
./models/invertible_layers.py:261:22: F821 undefined name 'flatten_sum'
        objective -= flatten_sum(torch.log(scale))
                     ^
19    F821 undefined name 'full_fname'
19

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

F821: undefined name name
F822: undefined name name in __all__
F823: local variable name referenced before assignment
E901: SyntaxError or IndentationError
E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

There is no argument called svdclipping

I have tried to run your program by using the command script and I met the following issue.

You do not provide the argument svdclipping in your code. (CIFAR_main.py 241 line)

However, you try to build your model based on this argument on several points. After I carefully checked your code, I found that you just pass this argument several times, but it was not actually used in your model.

About Re-producing the dynamics of ResNet and i-ResNet in Figure 1

Hi,

I'm trying to reproduce the phenomenon in Figure 1. However I got some confusion. As you demonstrated, networks in Fig. 1 map the interval [-2,2] to noisy x^3. As [-2,2] -> x^3 has merely one dimension, ResNets require 3 dimensions input. I wonder how they map that interval. If the dynamics are mapping the outputs of residual blocks, however, the outputs have different sizes due to the downsampling. In brief, my question is about how you did that mapping operation.

Mant thanks,
Z. L

code for reproducing Fig 2 (toy density experiments)

Could you please provide code for running experiments on the toy (2D) datasets. Not sure if it's in the repo.

Error of Inverse Result is Large?

I use your command script to run a classification model and meet these 2 issues.

When the model hasn’t been trained, I test its inverse function. And the error of a (3x32x32 sized) picture is only about 0.001 when running 20 inverse iterations.

Then I try to load the model after 1 epoch, the reconstruction error is suddenly about 5.

I load the model after 50, 150, 200 epochs, but none of them can match the untrained model’s inverse error. After 200 epochs, for a (3x3x32 sized) picture, the smallest error is about 0.95.
When I use inverse iterations on the trained model, the reconstruction error rises when I use more inverse iterations. It’s strange because I think the more inverse iterations I use, the less inverse error I will get.

Is this result normal? This problem puzzles me a lot.

Classifier OOM when computing on test set.

Thanks for a great repository, the code works very well, is nicely documented and the overall structure is intuitive.
I found a minor issue which can easily be solved.

The function test(..) computes loss on test set without turning gradient computations off.

https://github.com/jhjacobsen/invertible-resnet/blob/master/models/utils_cifar.py#L194

One might think model.eval() turns off gradients, but it does not, see e.g. [1].
Instead, one needs something like

model.eval()
with torch.no_grad(): 
    # code from before

This does usually not cause OOM, but if one is training multiple classifiers at the same time on the same GPU it does.
This is useful when e.g. repeating experiment to get error bars).

[1] https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615

training parameters for SVHN/other datasets

Dear authors,

I am training the model on SVHN with coeff=0.3, after 200 epochs, many singular values are still larger than 1.
Are there any other parameters to be adjusted?

Best,
Liang

a question about using large image to run classification

Hi, thank you so much for providing the code. I want to run it in my own dataset. I just set the batch size to 1 and even use 6 layers with 896*448 image, but still out of memory. Do you know how this error happens? Are there some ways to solve it? Thank you.

Some questions about INN

Thank you very much for sharing the code, I have a few questions that I want to bother you

Glow is structurally reversible, that is, it is a reversible network without training, and your work is structurally irreversible and requires certain training to become a reversible network.

I don’t know if my understanding is correct.

Can your work achieve the reversibility of the MLP network? If you can, can you tell me which part needs to be modified?

Looking forward to your reply !

Duplicated argument clipping

There is a duplicate argument clipping in the latest version that leads to a TypeError.
https://github.com/jhjacobsen/invertible-resnet/blob/master/models/conv_iResNet.py#L185
https://github.com/jhjacobsen/invertible-resnet/blob/master/models/conv_iResNet.py#L287
https://github.com/jhjacobsen/invertible-resnet/blob/master/models/conv_iResNet.py#L294

Script could run normally after removing them.

Pretrained models

Is it possible to share the pretrained models?

Visdom shows nothing

Hello,

I start up visdom server with default parameters and when I run your scripts/classify_cifar.sh script, nothing shows up in visdom, except the very last test accuracy point.

I have no proxies configured, just running your code as cloned from the repo

Pre-trained model

Is there pre-trained model can be download? Thanks

Spectral norm causes gradient signal to be lost when sigma exceeding coeff

Something I was struggling with with my own implementation of Gouk's spectral norm is that a spectral normalized layer seems to become stuck once the sigma values reach the coeff.

What I mean by this is:
Take a spectral normalized FC layer with 2 inputs and 1 output, and feed normally distributed random numbers into it, and ask it to maximize the output. This increases the weights until it reaches a sigma of > coeff.

Then take the same layer, feed the normally distributed random numbers into it and ask it to minimize the output. You'd expect this to decrease the weights until it reaches a sigma of 0, but it's sigma starts > coeff, nothing happens! In fact the weights don't receive very much gradient signal at all.

I think this might be because this line:
sigma = torch.dot(u, torch.mv(weight_mat, v))

happens with grad enabled, meaning that the gradient is propagated along this pathway, forcing the sigma to stay at 1.

I have made a notebook to demonstrate this problem, and my 'fix'
Gouk-jhjacobsen.zip

I'm not sure if this is the expected behaviour, I'd have thought this was analogous to the dying ReLU problem, as layers' sigmas become saturated they'd drop out and stop learning, which might be suboptimal.

Wether the Lipschitz constant of Actnorm layer should less than 1 ?

Question related to the initialization of the model

Dear author of the "i-Res-Net"

I would like to ask what does the purpose of the ignore_lodget parameter? in "CIFAR_main.py" at line 268

with torch.no_grad():
        model(init_batch, ignore_logdet=True)

Also is there an email address where I can ask my questions related to the "i-Res-Net"?

Regards

bits_per_dim

HI! Thank you very much for your code. I have a question about the function bits_per_dim: Why do you want to add an 8?

def bits_per_dim(logpx, inputs):
return -logpx / float(np.log(2.) * np.prod(inputs.shape[1:])) + 8.

Sincerely look forward to your reply!

Wether the Lipschitz constant of Actnorm layer should less than 1

Hello, this is a great work! But I have a question:
To ensure the network reversible，Lipschitz constant must less than 1, and you divide the spectrum norm of conv and fc layer. But, in Actnorm layer, the Lipschitz constant are not linited to less than 1.
Wether it should be limited as the conv and fc layer?