casperkaae / parmesan Goto Github PK

Variational and semi-supervised neural network toppings for Lasagne

License: Other

Python 100.00%

parmesan's Introduction

Parmesan

Parmesan is a library adding variational and semi-supervised neural network models to the neural network library Lasagne.

Installation

Parmesan depends heavily on the Lasagne and Theano libraries. Please make sure you have these installed before installing Parmesan.

Install Parmesan

git clone https://github.com/casperkaae/parmesan.git
cd parmesan
python setup.py develop

Documentation

Work in progress. At the moment Parmesan primarily includes

Layers for Monte Carlo approximation of integrals used in (importance weighted) variational autoencoders in parmesan/layers/sample.py
Layers for constructing Ladder Networks in parmesan/layers/ladderlayers.py
Layers for implementing normalizing flows in parmesan/layers/flow.py

Please see the source code and code examples for further details.

Examples

examples/vae_vanilla.py: Variational autoencoder as described in Kingma et al. 2013
examples/iw_vae.py: Variational autoencoder using importance sampling as described in Burda et al. 2015
examples/iw_vae_normflow.py: Variational autoencoder using normalizing flows and importance sampling as described in Burda et al. 2015 and Rezende et al. 2015
examples/mnist_ladder.py: Semi-supervised Ladder Network as described in Rasmus et al. 2015

Usage example: Below is an image of the log-likelihood terms training an importance weighted autoencoder on MNIST using binomial sampling of the inputs before each epoch. Further we found it beneficial to add batch normalization to the fully connected layers. The training is done using one Monte Carlo sample to approximate the expectations over q(z|x) and one importance weighted sample. The test performance was evaluated using 5000 importance weighted samples and be should be directly comparable to the results in Burda et al. The final test performance is LL=-84.78 which is better than the current best published results at LL=-86.76 reported in Burda et al. table 1 (compare to top 1st row and 4th row in column labeled IWAE since we are training using a single importance weighted sample)).

Similar results should be obtained by running

python examples/iw_vae.py -eq_samples 1 -iw_samples 1 -lr 0.001 -nhidden 500 -nlatent 100 -nonlin_dec very_leaky_rectify -nonlin_enc rectify -batch_size 250 -anneal_lr_epoch 2000

Development

Parmesan is work in progress, inputs, contributions and bug reports are very welcome.

The library is developed by

Casper Kaae Sønderby
Søren Kaae Sønderby
Lars Maaløe

References

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Burda, Y., Grosse, R., & Salakhutdinov, R. (2015). Importance Weighted Autoencoders. arXiv preprint arXiv:1509.00519.
Rezende, D. J., & Mohamed, S. (2015). Variational Inference with Normalizing Flows. arXiv preprint arXiv:1505.05770.
Rasmus, A., Valpola, H., Honkala, M., Berglund, M., & Raiko, T. (2015). Semi-Supervised Learning with Ladder Network. arXiv preprint arXiv:1507.02672.

parmesan's People

Contributors

Stargazers

Watchers

parmesan's Issues

Various small issues

I ran into a couple of small issues trying to use Parmesan

The recently added datasets add a whole bunch of dependencies (nltk, sklearn, scipy, ..); would it be possible to import those packages locally for the functions that need them? Now to use Parmesan you need all that stuff installed even if you don't use those datasets.
Would it be possible to change _srng in SimpleSampleLayer and SampleLayer to srng? The leading underscore implies it's a private variable (although Python doesn't enforce this). I find using a fixed seed before, for instance, approximating log likelihood by importance sampling useful for exactly reproducing results.
The git has some binary files that probably shouldn't be there parmesan/dist/Parmesan-0.1.dev1-py2.7.egg, parmesan/misc/eval_L5000.jpg.

arg. error

Hello, I have problem in running, I've tried this command 👍
sudo python mnist_ladder.py -lambdas 1,10,0.1,0.1,0.1,0.1,0.1 but it gives me this error

h6: (200, 10)
y_weights_decoder: (100, 10)
Traceback (most recent call last):
File "mnist_ladder.py", line 238, in
h6_dec, name='dec_normalize6'), name='dec_scale6')
File "/home/homa/parmesan/parmesan/layers/normalize.py", line 61, in init
super(NormalizeLayer, self).init(incoming, *kwargs)
File "/usr/local/lib/python2.7/dist-packages/Lasagne-0.1-py2.7.egg/lasagne/layers/base.py", line 38, in init
self.input_shape = incoming.output_shape
File "/usr/local/lib/python2.7/dist-packages/Lasagne-0.1-py2.7.egg/lasagne/layers/base.py", line 52, in output_shape
return self.get_output_shape_for(self.input_shape)
File "/usr/local/lib/python2.7/dist-packages/Lasagne-0.1-py2.7.egg/lasagne/layers/shape.py", line 380, in get_output_shape_for
range(self.slice.indices(input_shape[self.axis])))
TypeError: 'NoneType' object cannot be interpreted as an index

Do you have any idea what am I doing wrong in giving args? I just wanna try it for one iteration to see how does it work.

Thanks

VAE+NF

Thank you for your great work.
However, when I implement VAE + NF, my kld, q(zK||p(z)), tends to be lower than 0. I think if it is a problem caused by only sample one z_0 per data point(i.e. L=1 in the vae paper).
I have been confused by this problem for weeks, I appreciate your help.
@casperkaae

What happens when batchsize doesn't evenly divide sample set?

I was just wondering whether there is a bug when the batchsize doesn't evenly divide the sample set?

For instance, when the batchsize is 24, the last minibatch of the training set on MNIST will be of size 16

I think the code returns an array that is of size (24, eq_samples) for the bound on one epoch, regardless, padded with zeros (not sure how Theano handles this when you input an array with a slice that is out of bounds), so that the bound reported will be slightly smaller than it actually is (being divided by 10008 * eq_samples)

Rather than keep building an array with the bound values, would it not be better to accumulate them in a scalar? See:

https://github.com/casperkaae/parmesan/blob/master/examples/vimco.py#L322

class DenoiseLayer(MergeLayer):

Hi,
I'm using Parmesan model for ladder network with Convolutional layers. I replaced Dense layers with Convolution and pooling layers. I have a problem which I guess it might happen in DenoiseLayer(MergeLayer) layer since it gives me following error :

ValueError: GpuElemwise. Input dimension mis-match. Input 2 (indices start at 0) has shape[0] == 1100, but the output's size on that axis is 100.
So to modify the DenoiseLayer, I want to make sure I fully understand its functionality. As I understood it merges two input layers(the noisy from Encoder and the another from higher layer in Decoder), right? So, do you think it would be fine if I replace it with one of Lasagne's Merge layers?

Thanks,

Bug in VIMCO implementation?

Should the following line,

https://github.com/casperkaae/parmesan/blob/master/examples/vimco.py#L248

actually read

g_lb_inference = T.mean(T.sum(dg(L_corr) * log_qz_given_x, axis=2) + L)

instead of

g_lb_inference = T.mean(T.sum(dg(L_corr) * log_qz_given_x) + L)

I think with the current code that the two terms in g_lb_inference have a different scaling. The T.sum reduces dg(L_corr) * log_qz_given_x to a single number, which is then broadcast across all the elements of L, which has dimensions batchsize x eq_samples. So the second term is scaled by 1/ (batchsize * eq_samples), whereas this term cancels in the first term because it is summed that many times

Add nonlin to sample layers

Can we make the exp nonlinearity optional in the sample layers by adding a nonlinearity keyword. That would allow for softplus etc.

add the argument:

nonlinearity=lambda x: T.exp(0.5*x)

Bayes by backprop

Hi again!

I've finished the work on my topping. It seems to be flexible. I opened a PR in lasagne but think that here is better place to contribute to.

Resampling bernoulli in iw_vae

The bernoulli resampling is happening every epoch - this matches the IWAE paper but is a bug, since resampling is effectively data augmentation the result cannot be compared to VAE on a fixed binarization (such as that provided by Hugo Larochelle here http://www.dmi.usherb.ca/~larocheh/mlpython/_modules/datasets/binarized_mnist.html).

In particular this line:
https://github.com/casperkaae/parmesan/blob/master/examples/iw_vae.py#L289

They mention this issue at the bottom of the github repo https://github.com/yburda/iwae

Python 3.5 compatibility

Hi,

Can I run parmesan using Python 3.5? Although I am able to run Lasagne, I get this error when I try to run the example

python examples/iw_vae.py -eq_samples 1 -iw_samples 1 -lr 0.001 -nhidden 500 -nlatent 100 -nonlin_dec very_leaky_rectify -nonlin_enc rectify -batch_size 250 -anneal_lr_epoch 2000
  File "examples/iw_vae.py", line 118
    print "Using real valued MNIST dataset to binomial sample dataset after every epoch "
                                                                                     ^
SyntaxError: Missing parentheses in call to 'print'

I realize that I need to add the parenthesis for the print function to make the code compatible with Python 3, but are there many other modifications that need to be made to the code to make it run ?

System configuration: Ubuntu 16.04, GeForce GTX 960, Cuda 8.1, CuDDN 5.1, Lasagne 0.2.dev1, Theano 0.8.2.

Thanks

Convolutional layers

Are there plans to add convolutional layers to parmesan. I would like to reproduce the results of Rasmus et al 2015 and found parmesan very helpful to get me started.

Is anyone working on this yet? Otherwise, I'd give it a try myself.

None of examples work

Hi, I was trying to refactor code to be compatible with python3 but got stuck with not working examples.
I ran them with python2 following installation instructions and having other dependencies and got the same error:

Exception: Compilation failed (return status=1): clang: error: unknown argument: '-target-feature'. clang: error: unknown argument: '-sse4a'
...

I have no idea what causes it

BatchNormalization

It would be nice if the BatchNormalizationLayer, rather than supporting a "single_pass" would support "collect" where it will collect the variables in minibatches. If the dataset is big enough the "single_pass" could fail. Alternative would be something like this:

if collect:
                # This will collect the dataset statistics on minibatches
                # However to this accurately we will need a new extra variable
                # for E[x^2] and use its average instead of the std
                running_ex2 = theano.shared(np.zeros_like(self.std.get_value()))
                t = theano.shared(1)
                ex2 = input.sqr().mean(self.axes, keepdims=True)
                mean_update = ((t-1) / t) * running_mean + mean / t
                ex2_update = ((t-1) / t) * running_ex2 + ex2 / t
                std_update = (ex2_update - mean_update.sqr() + self.epsilon).sqrt()
                # Set the default updates
                running_mean.default_update = mean_update
                running_ex2.default_update = ex2_update
                running_std.default_update = std_update
                t.default_update = t + 1
                # and include them in the graph so their default updates will be
                # applied (although the expressions will be optimized away later)
                mean += 0 * running_mean + 0 * t
                std += 0 * running_std + 0 * running_ex2
            else:
                # During training instead we use a geometric moving average
                running_mean.default_update = ((1 - self.alpha) * running_mean +
                                               self.alpha * mean)
                running_std.default_update = ((1 - self.alpha) * running_std +
                                              self.alpha * std)
                # and include them in the graph so their default updates will be
                # applied (although the expressions will be optimized away later)
                mean += 0 * running_mean
                std += 0 * running_std

Large dataset

I would like to know why in "mnist_ladder.py" example you mentioned it doesn't work for larger dataset? What should I change if I wanna use it for large dataset? Shall I pass mini-batches through network as "sym_x" ?

TypeError: 'NoneType' object cannot be interpreted as an index

I have tried to run the mnist_ladder.py example without changing anything, it gives me following error. Would you please tell me what I'm doing wrong?

Lambdas: [1000.0, 10.0, 0.1, 0.1, 0.1, 0.1, 0.1] h6: (200, 10) y_weights_decoder: (0, 10) Traceback (most recent call last): File "mnist_ladder.py", line 268, in <module> h6_dec, name='dec_normalize6'), name='dec_scale6') File "/home/usr/parmesan/parmesan/layers/normalize.py", line 61, in __init__ super(NormalizeLayer, self).__init__(incoming, **kwargs) File "/home/usr/venv/local/lib/python2.7/site-packages/Lasagne-0.1-py2.7.egg/lasagne/layers/base.py", line 38, in __init__ self.input_shape = incoming.output_shape File "/home/usr/venv/local/lib/python2.7/site-packages/Lasagne-0.1-py2.7.egg/lasagne/layers/base.py", line 52, in output_shape return self.get_output_shape_for(self.input_shape) File "/home/usr/venv/local/lib/python2.7/site-packages/Lasagne-0.1-py2.7.egg/lasagne/layers/shape.py", line 380, in get_output_shape_for range(*self.slice.indices(input_shape[self.axis]))) TypeError: 'NoneType' object cannot be interpreted as an index

Reproduce results from sec. 6.1 in "Variational inference using normalizing flows"

As discussed in #21 it would be nice to reproduce the results from sec. 6.1 in the "Variational inference using normalizing flows" paper by Rezende et al.

I would guess the approach is:

sample input: x = q0 = N(z; mu, sigma^2 I)
run through mapping and normalizing flow to get p(U_hat)
update params using squared error between U and U_hat

ladder network for cifar-10

can you kindly implement ladder network for CIFAR-10

Wrong gradients in NormalizingPlanarFlowLayer

If I understand correctly, equation 11 in the paper is computed here, where for a batch of 5 and with 3 features, h'(w^t + b) should have a shape of (5,) and w of (3,), thus psi should be a (5, 3), and psi_u (5,). However, in the current implementation psi is (5,) and psi_u is a scalar. So the solution would be the change the dot product for a element-wise product. Is that right or did I make a mistake?

Lasagne Batch Normalization

Now that Lasagne includes support for batch normalization, I was wondering if it could replace Parmesan's BN. As far as I understand both were derived from f0k's initial BN gist, but I haven't looked in detail at the differences. Did you make any significant modifications?

python 3 compatibility

Is it expected in the nearest future?

poor performance on example

Running the example command from the README:
python examples/iw_vae.py -eq_samples 1 -iw_samples 1 -lr 0.001 -nhidden 500 -nlatent 100 -nonlin_dec very_leaky_rectify -nonlin_enc rectify
yields substantially worse performance than the plot:

Epoch=9990     Time=2.96       LR=0.00100      E_qsamples=1    IVAEsamples=1   
TRAIN:          Cost=-92.92944  logq(z|x)=-118.74120  logp(z)=-141.49203      logp(x|z)=-70.17860     
EVAL-L1:        Cost=-99.33702  logq(z|x)=-118.58514    logp(z)=-141.30058    logp(x|z)=-76.62159 
EVAL-L5000:     Cost=-92.42599  logq(z|x)=-118.61915    logp(z)=-141.30986   logp(x|z)=-76.66057

Can anyone replicate those results using the current codebase? Or has something changed that would result in such a large performance drop?