jbornschein / draw Goto Github PK

View Code? Open in Web Editor NEW

347.0 29.0 86.0 41.53 MB

Reimplementation of DRAW

License: MIT License

Python 22.99% Jupyter Notebook 77.01%

python theano machine-learning-algorithms

draw's People

Contributors

Stargazers

Watchers

Forkers

kastnerkyle capybaralet kaynewest snazz2001 ebenolson ajaytalati udibr wqren beronx86 kuyun-zhangyang mljack raphael-forks chagge zengqiang2006 shyamalschandra vseledkin zhilunt poolio nagyist dribnet samim23 codeaudit tarzain langholz yangxs negar-rostamzadeh ablavatski mpezeshki buyijie wavelets luong-vinh yingzha xzhang311 jianbotang tigerneil marcociccone mathrho magicalfox m-limbird nationalflag davidbench starimpact cezi127 ouya-bytes sandy4321 daerduocarey imclab wgapl adityosanjaya sszzsupersupersupersuper photon3710 thudzj anirudh9119 rjbashar noviadroid yaowenwu yaricom sordonia ml-lab rihardsk thrandis clear-datacenter benjamesbabala wanjinchang adrianhust lidongyv hope-yao nagyistge yorkerlin ronrest vyraun shaoxuan92 destiny120 murugeshmarvel arnabkar qyhboy harrytanme codegank shubhampachori12110095 zmxheart afcarl hrbigelow miftahulridwan ursu1964

draw's Issues

draw not compatible with latest blocks

The current draw code is again incompatible with the latest version of blocks (currently 49a12a3). This seems to mainly be due to the API change in mila-iqia/blocks#725 and I have a fix for this in my blocks_fix branch. Unfortunately, this change also invalidates all previous pickled models, so I'm going to hold off on a merge request until I can train a new models for the README update.

Question on how you initialize and update states ?

Hi Jorg,

I was just wondering how you implement state initialization and update, in your code, and how this is done in Blocks in general, (I'm still a novice with both Theano and Blocks). I think this is related to how you get different digits, every time you sample with the trained decoder?

To be more precise, when you start the main loop, for its first iteration on the first batch, in the first epoch, what values do you use for,

c, h_enc, c_enc, h_dec, c_dec

I've tried a couple of things -

just set h_enc, c_enc, h_dec, c_dec to zero, at the start of each sequence of iterations of an SGD step. I'm guessing this is what your code does, from the Blocks RNN docs. I set c initially to -10, thus it's sigmoid is zero, i.e. a blank canvas.
so called backpropogation through time, (BPTT), where at the end of one sequence of iterations, t = 1,...,T, I set

h_enc_0 = h_enc_T , h_dec_0 = h_dec_T , etc

That is, I use the final states of the encoder and decoder's hidden outputs and cell values, from the previous SGD step, as the initial states of the encoder and decoder, hidden outputs and cell values, for the first iteration of the objective function used in the present SGD step. I also tried to do the same for the canvas matrix 'c', but that doesn't really make sense - i.e. why write on top of an old "canvas" which already has an image written on it?

Is some similar sort of state transfer of h_enc, c_enc, h_dec, c_dec implemented in the Blocks, LSTM module, and your code here? That is, are the initial states for the first iteration of the current SGD step, set to the final states calculated by the last iteration of the previous SGD step?

Basically I'm still having problems generating convincing looking digits from my system with all of these variants - it confuses me which is the 'right' one theoretically - I'm not sure I understand how BPTT, can be applied to the DRAW system of networks?

A related question is - which method do you use when you sample the batch of 16*16 canvases with the decoder, with sample.py?

I can't see where in your code, or how Theano/Blocks does this? Basically your first sampled/reconstructed canvas, i.e. samples-000.png seems to already contain some biases which then "evolve", to form the final images. I seem to have missed something in my implementation as my initial canvases, c_0 are simply blank.

I think this relates to a line in the paper, (just above eqn 3)

For each image x presented to the network, c_0 , h^enc_0 , h^dec_0 , are initialised to learned biases.

May I ask how you implement this in your code? It's been confusing me for weeks? I've tried adding a simple bias layer that takes, c_0 , h^enc_0 , h^dec_0 as inputs, but it didn't make much difference?

Sorry for the long question, but I'm trying to understand your code, with a broken 'Blocks` installation :(

Best Regards,

PS - my experiments are still all with the without attention version of the DRAW paper, for small numbers of iterations (~4), and also sometimes using the same sample from the prior for all iterations..

error when use plot

when I use the Plot in the main loop of the train_draw.py, the errors appeared :

ERROR:/usr/local/lib/python2.7/dist-packages/bokeh/validation/check.pyc:W-1001 (NO_GLYPH_RENDERERS): Plot has no glyph renderers: Figure, ViewModel:Plot, ref _id: 2f872f9d-fe0c-43b8-b814-18aace23b976
.
.
.
AttributeError: unexpected attribute 'y_axis_label' to Line

Maximum recursion depth in Checkpoint.

With recent versions of blocks we reliably get RuntimeError: maximum recursion depth exceeded .. even when setting a very high maximum recursion limit.

AttributeError: 'DrawModel' object has no attribute 'apply'

Hi, do you have ideas why this can happen?

Running experiment bmnist-r2-w5-t64-enc256-dec256-z100-lr34
               dataset: bmnist
          subdirectory: 20150625-174801-bmnist
         learning rate: 0.0003
             attention: 2,5
          n_iterations: 64
     encoder dimension: 256
           z dimension: 100
     decoder dimension: 256
            batch size: 100
                epochs: 100

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/.../draw/train-draw.py in <module>()
    280     args = parser.parse_args()
    281 
--> 282     main(**vars(args))

/home/.../draw/train-draw.py in main(name, dataset, epochs, batch_size, learning_rate, attention, n_iter, enc_dim, dec_dim, z_dim, oldmodel)
    155 
    156     #x_recons = 1. + x
--> 157     x_recons, kl_terms = draw.reconstruct(x)
    158     #x_recons, _, _, _, _ = draw.silly(x, n_steps=10, batch_size=100)
    159     #x_recons = x_recons[-1,:,:]

/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in __call__(self, *args, **kwargs)
    358 
    359     def __call__(self, *args, **kwargs):
--> 360         return self.application.apply(self, *args, **kwargs)
    361 
    362 

/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, *args, **kwargs)
    300         self.call_stack.append(brick)
    301         try:
--> 302             outputs = self.application_function(brick, *args, **kwargs)
    303             outputs = pack(outputs)
    304         finally:

/home/.../draw/draw/draw.pyc in reconstruct(self, features)
    340 
    341         c, h_enc, c_enc, z, kl, h_dec, c_dec = \
--> 342             rvals = self.iterate(x=features, u=u)
    343 
    344         x_recons = T.nnet.sigmoid(c[-1,:,:])

/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in __call__(self, *args, **kwargs)
    358 
    359     def __call__(self, *args, **kwargs):
--> 360         return self.application.apply(self, *args, **kwargs)
    361 
    362 

/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, *args, **kwargs)
    300         self.call_stack.append(brick)
    301         try:
--> 302             outputs = self.application_function(brick, *args, **kwargs)
    303             outputs = pack(outputs)
    304         finally:

/usr/local/lib/python2.7/dist-packages/blocks/bricks/recurrent.pyc in recurrent_apply(brick, application, application_call, *args, **kwargs)
    179             # Ensure that all initial states are available.
    180             initial_states = brick.initial_states(batch_size, as_dict=True,
--> 181                                                   *args, **kwargs)
    182             for state_name in application.states:
    183                 dim = brick.get_dim(state_name)

/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in __call__(self, *args, **kwargs)
    358 
    359     def __call__(self, *args, **kwargs):
--> 360         return self.application.apply(self, *args, **kwargs)
    361 
    362 

/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, *args, **kwargs)
    300         self.call_stack.append(brick)
    301         try:
--> 302             outputs = self.application_function(brick, *args, **kwargs)
    303             outputs = pack(outputs)
    304         finally:

/usr/local/lib/python2.7/dist-packages/blocks/bricks/recurrent.pyc in initial_states(self, batch_size, *args, **kwargs)
     54         """
     55         result = []
---> 56         for state in self.apply.states:
     57             dim = self.get_dim(state)
     58             if dim == 0:

AttributeError: 'DrawModel' object has no attribute 'apply'

Normalize tfd and mnist datasets

The tfd and mnist datasets return data between 0 and 255.

We need to normalize them to 0..1 before feeding them into the binary crossentropy.

ImportError: cannot import name AbstractMode

ImportError Traceback (most recent call last)
/home/muslimboy/Desktop/data/draw/train-draw.py in ()
48 import draw.datasets as datasets
49 from draw.draw import *
---> 50 from draw.samplecheckpoint import SampleCheckpoint
51 from draw.partsonlycheckpoint import PartsOnlyCheckpoint
52

/home/muslimboy/Desktop/data/draw/draw/samplecheckpoint.py in ()
8 from blocks.extensions.saveload import Checkpoint
9
---> 10 from sample import generate_samples
11
12

/home/muslimboy/Desktop/data/draw/sample.py in ()
13 from PIL import Image
14 from blocks.main_loop import MainLoop
---> 15 from blocks.model import AbstractModel
16 from blocks.config import config
17

ImportError: cannot import name AbstractModel

Error with Blocks-0.2.0

When running the training for MNIST with Blocks0.2.0, there's an error:

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.
Traceback (most recent call last):
File "./train-draw.py", line 289, in
main(**vars(args))
File "./train-draw.py", line 257, in main
main_loop.run()
File "/usr/local/lib/python2.7/dist-packages/blocks/main_loop.py", line 197, in run
reraise_as(e)
File "/usr/local/lib/python2.7/dist-packages/blocks/utils/init.py", line 258, in reraise_as
six.reraise(type(new_exc), new_exc, orig_exc_traceback)
File "/usr/local/lib/python2.7/dist-packages/blocks/main_loop.py", line 170, in run
self._run_extensions('before_training')
File "/usr/local/lib/python2.7/dist-packages/blocks/main_loop.py", line 263, in _run_extensions
extension.dispatch(CallbackName(method_name), *args)
File "/usr/local/lib/python2.7/dist-packages/blocks/extensions/init.py", line 338, in dispatch
self.do(callback_invoked, *(from_main_loop + tuple(arguments)))
File "draw/partsonlycheckpoint.py", line 25, in do
filenames = self.save_separately_filenames(path)
AttributeError: 'PartsOnlyCheckpoint' object has no attribute 'save_separately_filenames'
Original exception:
AttributeError: 'PartsOnlyCheckpoint' object has no attribute 'save_separately_filenames'

Hyper parameter settings

Out of memory after 3 epochs ?

I tried to run using the default settings, i.e.

python ./train-draw.py

but during the 4th epoch I got an out of memory error?

I'm using a modern GPU with 2GB ram, which is fine with my torch experiments, and all my other LSTM experiments, is there a way to avoid this?

AFTER ANOTHER EPOCH

Training status:
epochs_done: 4
iterations_done: 2000
Log records from the iteration 2000:
epoch_took: 210.858546019
iteration_took: 0.411077022552
saved_to: ('mnist-full-t10-enc256-dec256-z100-lr13.pkl',)
test_kl_term_0: 2.90826129913
.....

 test_nll_bound: 101.928291321
 total_took: 1034.53031182
 train_kl_term_0: 3.13306331635
......

 train_nll_bound: 103.738845825
 train_total_gradient_norm: 27.4104881287
 train_total_step_norm: 1.72634613514

Epoch 4, step 50 |

Elapsed Time: 0:00:20

Error allocating 7471104 bytes of device memory (out of memory).

Driver report 4771840 bytes free and 1341718528 bytes total

[12:34:07] blocks.main_loop Error occured during training.

MemoryError: Error allocating 7471104 bytes of device memory (out of memory).
Apply node that caused the error: GpuGemm{no_inplace}

NLL bound high?

After 100 epochs I got
train_nll_bound 91.1 and test_nll_bound of 90.5 is that similar to what you got?
I'm asking because the paper in Table 2 reported 80.97

Compilation is failing on cuda

ValueError Traceback (most recent call last)
/home/val/Desktop/draw/train-draw.py in ()
275 args = parser.parse_args()
276
--> 277 main(**vars(args))

/home/val/Desktop/draw/train-draw.py in main(name, dataset, epochs, batch_size, learning_rate, attention, n_iter, enc_dim, dec_dim, z_dim, oldmodel)
158 x = tensor.matrix('features')
159
--> 160 x_recons, kl_terms = draw.reconstruct(x)
161
162 recons_term = BinaryCrossEntropy().apply(x, x_recons)

/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in call(self, _args, *_kwargs)
358
359 def call(self, _args, *_kwargs):
--> 360 return self.application.apply(self, _args, *_kwargs)
361
362

/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, _args, *_kwargs)
300 self.call_stack.append(brick)
301 try:
--> 302 outputs = self.application_function(brick, _args, *_kwargs)
303 outputs = pack(outputs)
304 finally:

/home/val/Desktop/draw/draw/draw.pyc in reconstruct(self, features)
341
342 c, h_enc, c_enc, z, kl, h_dec, c_dec =
--> 343 rvals = self.apply(x=features, u=u)
344
345 x_recons = T.nnet.sigmoid(c[-1,:,:])

/usr/local/lib/python2.7/dist-packages/blocks/bricks/recurrent.pyc in recurrent_apply(brick, application, application_call, _args, *kwargs)
231 go_backwards=reverse,
232 name='{}{}_scan'.format(
--> 233 brick.name, application.application_name))
234 result = pack(result)
235 if return_initial_states:

/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.pyc in scan(fn, sequences, outputs_info, non_sequences, n_steps, truncate_gradient, go_backwards, mode, name, profile, allow_gc, strict)
1042 pass
1043 scan_inputs += [arg]
-> 1044 scan_outs = local_op(*scan_inputs)
1045 if type(scan_outs) not in (list, tuple):
1046 scan_outs = [scan_outs]

/usr/local/lib/python2.7/dist-packages/theano/gof/op.pyc in call(self, _inputs, *_kwargs)
598 """
599 return_list = kwargs.pop('return_list', False)
--> 600 node = self.make_node(_inputs, *_kwargs)
601
602 if config.compute_test_value != 'off':

/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.pyc in make_node(self, *inputs)
539 argoffset + idx,
540 outer_sitsot.type.dtype,
--> 541 inner_sitsot_out.type.dtype))
542 if inner_sitsot_out.ndim != outer_sitsot.ndim - 1:
543 raise ValueError(err_msg3 %

ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (fn) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

Unable to satisfy dependencies

I'm running debian stretch, I have installed seemingly everything necessary, yet the program still has an error, particularly at line 32:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/home/dash/draw/draw/train-draw.py in <module>()
     30 from blocks.monitoring import aggregation
     31 from blocks.extensions import FinishAfter, Timing, Printing, ProgressBar
---> 32 from blocks.extensions.plot import Plot
     33 from blocks.extensions.saveload import Checkpoint, Dump
     34 from blocks.extensions.monitoring import DataStreamMonitoring, TrainingDataMonitoring

ImportError: No module named plot

If you could cover this better in the readme, it would greatly help, thanks!

Draw the rectangles as the paper

Hi,
I want to draw the rectangles. It seems that I needs the return values of ZoomableAttentiuonWindow.nn2att (center_y, center_x, delta, sigma, gamma). So what's the best way to monitor these variables?

Readme Dependencies should only link to Blocks install

Blocks installation http://blocks.readthedocs.org/en/latest/setup.html
includes an install of Theano, Fuel and picklable-itertools which are also needed
(see https://github.com/bartvm/blocks/blob/master/requirements.txt )
so there is no need to list them again

On the other hand you need to prepare the data and set fuel path
http://blocks.readthedocs.org/en/latest/tutorial.html#training-your-model

you also need
pip install ipdb

you also need to download binarized mnist
https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/datasets/download_binarized_mnist.py

Core dumped (segmentation fault) while ./train-draw.py around epoch 13

it seems to be linked to a bad compilation of libmkl_avx2.so
2 ideas
http://stackoverflow.com/questions/30323971/lapacke-dgesdd-crashed-segmentation-fault-core-dumped
and
http://iswwwup.com/t/7ad0fd892964/import-theano-gets-illegal-instruction.html

I don't know how to fix it

TypeError: init() got an unexpected keyword argument 'parameters'

TypeError Traceback (most recent call last)
/home/muslimboy/Desktop/drawpython/draw/train-draw.py in ()
287 args = parser.parse_args()
288
--> 289 main(**vars(args))

/home/muslimboy/Desktop/drawpython/draw/train-draw.py in main(name, dataset, epochs, batch_size, learning_rate, attention, n_iter, enc_dim, dec_dim, z_dim, oldmodel, live_plotting)
179 step_rule=CompositeRule([
180 StepClipping(10.),
--> 181 Adam(learning_rate),
182 ])
183 #step_rule=RMSProp(learning_rate),

/home/muslimboy/anaconda/lib/python2.7/site-packages/blocks/algorithms/init.pyc in init(self, step_rule, gradients, known_grads, **kwargs)
191 if gradients:
192 kwargs.setdefault("params", gradients.keys())
--> 193 super(GradientDescent, self).init(**kwargs)
194
195 self.gradients = gradients

incorrect method call to oldmodel in train-draw.py

main_loop.model.set_parameter_values(oldmodel.get_param_values())

needs to be

main_loop.model.set_parameter_values(oldmodel.get_parameter_values())

Exploding cell value outputs of the encoder?

Hi, thanks for making this great implementation open source.

I working on a similar implementation in torch, (but it's not working yet either without or with filterbanks/attention), and I'd like to understand your code better - because I'm confused?

To be precise, is there an easy way using Blocks to track the norms of the encoder cell values, say ||c_enc_t||_2, after the end of the forward pass, of the SGD step?

What I'm finding with my implementation is that, ||c_enc_t||_2, when T gets bigger than about 10 glimpses gets really big, after about 20 epochs. Initially I was using T=64, but I reduced that to T=10, but it's still happening?

Just wondered if you saw this with your Blocks/Theano implementation? Thanks for your help.

Best, Aj

Bokeh error - cannot get train-draw.py to run?

Hi,

I've read through your code and some of the Blocks and Fuel code, and am now I trying to run it. I installed Blocks, Fuel, etc, and tried to run your repo using,

python ./train-draw.py

I got the following error,

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

which does'nt seem to be anything to do with Theano, or Blocks, I think it's something to do with Bokeh?

How do you fix it? Or better still,

*** can I run the code without Bokeh? ***

I'd prefer to do this?

The two screen shots show the Bokeh server on the left and the python error stack on the right,

Regards,