jhjacobsen / invertible-resnet Goto Github PK
View Code? Open in Web Editor NEWOfficial Code for Invertible Residual Networks
License: MIT License
Official Code for Invertible Residual Networks
License: MIT License
flake8 testing of https://github.com/jhjacobsen/invertible-resnet on Python 3.7.1
$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics
./CIFAR_main.py:311:102: F821 undefined name 'full_fname'
interpolate(model, testloader, testset, start_epoch, use_cuda, best_objective, args.dataset, full_fname)
^
./models/utils_toy_densities.py:250:19: F821 undefined name 'model'
clip_fc_layer(model, coeff, use_cuda)
^
./models/utils_toy_densities.py:250:26: F821 undefined name 'coeff'
clip_fc_layer(model, coeff, use_cuda)
^
./models/utils_toy_densities.py:250:33: F821 undefined name 'use_cuda'
clip_fc_layer(model, coeff, use_cuda)
^
./models/utils_toy_densities.py:257:41: F821 undefined name 'model'
out_bij, p_z_g_y, trace, gt_trace = model(inputs)
^
./models/utils_toy_densities.py:258:31: F821 undefined name 'model'
log_det = compute_log_det(model, inputs, out_bij)
^
./models/utils_toy_densities.py:277:41: F821 undefined name 'model'
out_bij, p_z_g_y, trace, gt_trace = model(inputs)
^
./models/utils_toy_densities.py:278:31: F821 undefined name 'model'
log_det = compute_log_det(model, inputs, out_bij)
^
./models/model_utils.py:226:76: F821 undefined name 'num_units'
'multiple of group_size({})'.format(num_channels, num_units))
^
./models/invertible_layers.py:181:26: F821 undefined name 'Conv2dZeroInit'
self.conv_zero = Conv2dZeroInit(c // 2, c, 3, padding=(3 - 1) // 2)
^
./models/invertible_layers.py:187:16: F821 undefined name 'gaussian_diag'
return gaussian_diag(mean, logs)
^
./models/invertible_layers.py:215:21: F821 undefined name 'NN_actnorm'
self.NN = NN_actnorm(H, W, in_channels=num_features // 2, hidden_channels=width)
^
./models/invertible_layers.py:217:21: F821 undefined name 'NN_layernorm'
self.NN = NN_layernorm(H, W, in_channels=num_features // 2, hidden_channels=width)
^
./models/invertible_layers.py:219:21: F821 undefined name 'NN_batchnorm'
self.NN = NN_batchnorm(H, W, in_channels=num_features // 2, hidden_channels=width)
^
./models/invertible_layers.py:237:21: F821 undefined name 'NN_actnorm'
self.NN = NN_actnorm(H, W, in_channels=num_features // 2, hidden_channels=width, channels_out=num_features)
^
./models/invertible_layers.py:239:21: F821 undefined name 'NN_layernorm'
self.NN = NN_layernorm(H, W, in_channels=num_features // 2, hidden_channels=width, channels_out=num_features)
^
./models/invertible_layers.py:241:21: F821 undefined name 'NN_batchnorm'
self.NN = NN_batchnorm(H, W, in_channels=num_features // 2, hidden_channels=width, channels_out=num_features)
^
./models/invertible_layers.py:250:22: F821 undefined name 'flatten_sum'
objective += flatten_sum(torch.log(scale))
^
./models/invertible_layers.py:261:22: F821 undefined name 'flatten_sum'
objective -= flatten_sum(torch.log(scale))
^
19 F821 undefined name 'full_fname'
19
E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.
name
name
in __all__
I have tried to run your program by using the command script and I met the following issue.
However, you try to build your model based on this argument on several points. After I carefully checked your code, I found that you just pass this argument several times, but it was not actually used in your model.
Hi,
I'm trying to reproduce the phenomenon in Figure 1. However I got some confusion. As you demonstrated, networks in Fig. 1 map the interval [-2,2] to noisy x^3. As [-2,2] -> x^3 has merely one dimension, ResNets require 3 dimensions input. I wonder how they map that interval. If the dynamics are mapping the outputs of residual blocks, however, the outputs have different sizes due to the downsampling. In brief, my question is about how you did that mapping operation.
Mant thanks,
Z. L
Could you please provide code for running experiments on the toy (2D) datasets. Not sure if it's in the repo.
I use your command script to run a classification model and meet these 2 issues.
When the model hasn’t been trained, I test its inverse function. And the error of a (3x32x32 sized) picture is only about 0.001 when running 20 inverse iterations.
Then I try to load the model after 1 epoch, the reconstruction error is suddenly about 5.
I load the model after 50, 150, 200 epochs, but none of them can match the untrained model’s inverse error. After 200 epochs, for a (3x3x32 sized) picture, the smallest error is about 0.95.
When I use inverse iterations on the trained model, the reconstruction error rises when I use more inverse iterations. It’s strange because I think the more inverse iterations I use, the less inverse error I will get.
Is this result normal? This problem puzzles me a lot.
Thanks for a great repository, the code works very well, is nicely documented and the overall structure is intuitive.
I found a minor issue which can easily be solved.
The function test(..)
computes loss on test set without turning gradient computations off.
https://github.com/jhjacobsen/invertible-resnet/blob/master/models/utils_cifar.py#L194
One might think model.eval()
turns off gradients, but it does not, see e.g. [1].
Instead, one needs something like
model.eval()
with torch.no_grad():
# code from before
This does usually not cause OOM, but if one is training multiple classifiers at the same time on the same GPU it does.
This is useful when e.g. repeating experiment to get error bars).
[1] https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615
Dear authors,
I am training the model on SVHN with coeff=0.3, after 200 epochs, many singular values are still larger than 1.
Are there any other parameters to be adjusted?
Best,
Liang
Hi, thank you so much for providing the code. I want to run it in my own dataset. I just set the batch size to 1 and even use 6 layers with 896*448 image, but still out of memory. Do you know how this error happens? Are there some ways to solve it? Thank you.
Thank you very much for sharing the code, I have a few questions that I want to bother you
I don’t know if my understanding is correct.
Looking forward to your reply !
There is a duplicate argument clipping in the latest version that leads to a TypeError.
https://github.com/jhjacobsen/invertible-resnet/blob/master/models/conv_iResNet.py#L185
https://github.com/jhjacobsen/invertible-resnet/blob/master/models/conv_iResNet.py#L287
https://github.com/jhjacobsen/invertible-resnet/blob/master/models/conv_iResNet.py#L294
Script could run normally after removing them.
Is it possible to share the pretrained models?
Is there pre-trained model can be download? Thanks
Something I was struggling with with my own implementation of Gouk's spectral norm is that a spectral normalized layer seems to become stuck once the sigma values reach the coeff.
What I mean by this is:
Take a spectral normalized FC layer with 2 inputs and 1 output, and feed normally distributed random numbers into it, and ask it to maximize the output. This increases the weights until it reaches a sigma of > coeff.
Then take the same layer, feed the normally distributed random numbers into it and ask it to minimize the output. You'd expect this to decrease the weights until it reaches a sigma of 0, but it's sigma starts > coeff, nothing happens! In fact the weights don't receive very much gradient signal at all.
I think this might be because this line:
sigma = torch.dot(u, torch.mv(weight_mat, v))
happens with grad enabled, meaning that the gradient is propagated along this pathway, forcing the sigma to stay at 1.
I have made a notebook to demonstrate this problem, and my 'fix'
Gouk-jhjacobsen.zip
I'm not sure if this is the expected behaviour, I'd have thought this was analogous to the dying ReLU problem, as layers' sigmas become saturated they'd drop out and stop learning, which might be suboptimal.
Dear author of the "i-Res-Net"
I would like to ask what does the purpose of the ignore_lodget
parameter? in "CIFAR_main.py" at line 268
with torch.no_grad():
model(init_batch, ignore_logdet=True)
Also is there an email address where I can ask my questions related to the "i-Res-Net"?
Regards
HI! Thank you very much for your code. I have a question about the function bits_per_dim: Why do you want to add an 8?
def bits_per_dim(logpx, inputs):
return -logpx / float(np.log(2.) * np.prod(inputs.shape[1:])) + 8.
Sincerely look forward to your reply!
Hello, this is a great work! But I have a question:
To ensure the network reversible,Lipschitz constant must less than 1, and you divide the spectrum norm of conv and fc layer. But, in Actnorm layer, the Lipschitz constant are not linited to less than 1.
Wether it should be limited as the conv and fc layer?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.