jbornschein / draw Goto Github PK
View Code? Open in Web Editor NEWReimplementation of DRAW
License: MIT License
Reimplementation of DRAW
License: MIT License
The current draw code is again incompatible with the latest version of blocks (currently 49a12a3). This seems to mainly be due to the API change in mila-iqia/blocks#725 and I have a fix for this in my blocks_fix branch. Unfortunately, this change also invalidates all previous pickled models, so I'm going to hold off on a merge request until I can train a new models for the README update.
Hi Jorg,
I was just wondering how you implement state initialization and update, in your code, and how this is done in Blocks in general, (I'm still a novice with both Theano and Blocks). I think this is related to how you get different digits, every time you sample with the trained decoder?
To be more precise, when you start the main loop, for its first iteration on the first batch, in the first epoch, what values do you use for,
c
, h_enc
, c_enc
, h_dec
, c_dec
I've tried a couple of things -
just set h_enc
, c_enc
, h_dec
, c_dec
to zero, at the start of each sequence of iterations of an SGD step. I'm guessing this is what your code does, from the Blocks RNN docs. I set c
initially to -10, thus it's sigmoid is zero, i.e. a blank canvas.
so called backpropogation through time, (BPTT), where at the end of one sequence of iterations, t = 1,...,T
, I set
h_enc_0 = h_enc_T
, h_dec_0 = h_dec_T
, etc
That is, I use the final states of the encoder and decoder's hidden outputs and cell values, from the previous SGD step, as the initial states of the encoder and decoder, hidden outputs and cell values, for the first iteration of the objective function used in the present SGD step. I also tried to do the same for the canvas matrix 'c', but that doesn't really make sense - i.e. why write on top of an old "canvas" which already has an image written on it?
h_enc
, c_enc
, h_dec
, c_dec
implemented in the Blocks, LSTM module, and your code here? That is, are the initial states for the first iteration of the current SGD step, set to the final states calculated by the last iteration of the previous SGD step?Basically I'm still having problems generating convincing looking digits from my system with all of these variants - it confuses me which is the 'right' one theoretically - I'm not sure I understand how BPTT, can be applied to the DRAW system of networks?
sample.py
?I can't see where in your code, or how Theano/Blocks does this? Basically your first sampled/reconstructed canvas, i.e. samples-000.png
seems to already contain some biases which then "evolve", to form the final images. I seem to have missed something in my implementation as my initial canvases, c_0
are simply blank.
I think this relates to a line in the paper, (just above eqn 3)
For each image x presented to the network,
c_0
,h^enc_0
,h^dec_0
, are initialised to learned biases.
May I ask how you implement this in your code? It's been confusing me for weeks? I've tried adding a simple bias layer that takes, c_0
, h^enc_0
, h^dec_0
as inputs, but it didn't make much difference?
Sorry for the long question, but I'm trying to understand your code, with a broken 'Blocks` installation :(
Best Regards,
Aj
PS - my experiments are still all with the without attention version of the DRAW paper, for small numbers of iterations (~4), and also sometimes using the same sample from the prior for all iterations..
when I use the Plot in the main loop of the train_draw.py, the errors appeared :
ERROR:/usr/local/lib/python2.7/dist-packages/bokeh/validation/check.pyc:W-1001 (NO_GLYPH_RENDERERS): Plot has no glyph renderers: Figure, ViewModel:Plot, ref _id: 2f872f9d-fe0c-43b8-b814-18aace23b976
.
.
.
AttributeError: unexpected attribute 'y_axis_label' to Line
With recent versions of blocks we reliably get RuntimeError: maximum recursion depth exceeded
.. even when setting a very high maximum recursion limit.
Hi, do you have ideas why this can happen?
Running experiment bmnist-r2-w5-t64-enc256-dec256-z100-lr34
dataset: bmnist
subdirectory: 20150625-174801-bmnist
learning rate: 0.0003
attention: 2,5
n_iterations: 64
encoder dimension: 256
z dimension: 100
decoder dimension: 256
batch size: 100
epochs: 100
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/.../draw/train-draw.py in <module>()
280 args = parser.parse_args()
281
--> 282 main(**vars(args))
/home/.../draw/train-draw.py in main(name, dataset, epochs, batch_size, learning_rate, attention, n_iter, enc_dim, dec_dim, z_dim, oldmodel)
155
156 #x_recons = 1. + x
--> 157 x_recons, kl_terms = draw.reconstruct(x)
158 #x_recons, _, _, _, _ = draw.silly(x, n_steps=10, batch_size=100)
159 #x_recons = x_recons[-1,:,:]
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in __call__(self, *args, **kwargs)
358
359 def __call__(self, *args, **kwargs):
--> 360 return self.application.apply(self, *args, **kwargs)
361
362
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, *args, **kwargs)
300 self.call_stack.append(brick)
301 try:
--> 302 outputs = self.application_function(brick, *args, **kwargs)
303 outputs = pack(outputs)
304 finally:
/home/.../draw/draw/draw.pyc in reconstruct(self, features)
340
341 c, h_enc, c_enc, z, kl, h_dec, c_dec = \
--> 342 rvals = self.iterate(x=features, u=u)
343
344 x_recons = T.nnet.sigmoid(c[-1,:,:])
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in __call__(self, *args, **kwargs)
358
359 def __call__(self, *args, **kwargs):
--> 360 return self.application.apply(self, *args, **kwargs)
361
362
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, *args, **kwargs)
300 self.call_stack.append(brick)
301 try:
--> 302 outputs = self.application_function(brick, *args, **kwargs)
303 outputs = pack(outputs)
304 finally:
/usr/local/lib/python2.7/dist-packages/blocks/bricks/recurrent.pyc in recurrent_apply(brick, application, application_call, *args, **kwargs)
179 # Ensure that all initial states are available.
180 initial_states = brick.initial_states(batch_size, as_dict=True,
--> 181 *args, **kwargs)
182 for state_name in application.states:
183 dim = brick.get_dim(state_name)
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in __call__(self, *args, **kwargs)
358
359 def __call__(self, *args, **kwargs):
--> 360 return self.application.apply(self, *args, **kwargs)
361
362
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, *args, **kwargs)
300 self.call_stack.append(brick)
301 try:
--> 302 outputs = self.application_function(brick, *args, **kwargs)
303 outputs = pack(outputs)
304 finally:
/usr/local/lib/python2.7/dist-packages/blocks/bricks/recurrent.pyc in initial_states(self, batch_size, *args, **kwargs)
54 """
55 result = []
---> 56 for state in self.apply.states:
57 dim = self.get_dim(state)
58 if dim == 0:
AttributeError: 'DrawModel' object has no attribute 'apply'
The tfd and mnist datasets return data between 0 and 255.
We need to normalize them to 0..1 before feeding them into the binary crossentropy.
ImportError Traceback (most recent call last)
/home/muslimboy/Desktop/data/draw/train-draw.py in ()
48 import draw.datasets as datasets
49 from draw.draw import *
---> 50 from draw.samplecheckpoint import SampleCheckpoint
51 from draw.partsonlycheckpoint import PartsOnlyCheckpoint
52
/home/muslimboy/Desktop/data/draw/draw/samplecheckpoint.py in ()
8 from blocks.extensions.saveload import Checkpoint
9
---> 10 from sample import generate_samples
11
12
/home/muslimboy/Desktop/data/draw/sample.py in ()
13 from PIL import Image
14 from blocks.main_loop import MainLoop
---> 15 from blocks.model import AbstractModel
16 from blocks.config import config
17
ImportError: cannot import name AbstractModel
When running the training for MNIST with Blocks0.2.0, there's an error:
Blocks will attempt to run
on_error
extensions, potentially saving data, before exiting and reraising the error. Note that the usualafter_training
extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.
Traceback (most recent call last):
File "./train-draw.py", line 289, in
main(**vars(args))
File "./train-draw.py", line 257, in main
main_loop.run()
File "/usr/local/lib/python2.7/dist-packages/blocks/main_loop.py", line 197, in run
reraise_as(e)
File "/usr/local/lib/python2.7/dist-packages/blocks/utils/init.py", line 258, in reraise_as
six.reraise(type(new_exc), new_exc, orig_exc_traceback)
File "/usr/local/lib/python2.7/dist-packages/blocks/main_loop.py", line 170, in run
self._run_extensions('before_training')
File "/usr/local/lib/python2.7/dist-packages/blocks/main_loop.py", line 263, in _run_extensions
extension.dispatch(CallbackName(method_name), *args)
File "/usr/local/lib/python2.7/dist-packages/blocks/extensions/init.py", line 338, in dispatch
self.do(callback_invoked, *(from_main_loop + tuple(arguments)))
File "draw/partsonlycheckpoint.py", line 25, in do
filenames = self.save_separately_filenames(path)
AttributeError: 'PartsOnlyCheckpoint' object has no attribute 'save_separately_filenames'
Original exception:
AttributeError: 'PartsOnlyCheckpoint' object has no attribute 'save_separately_filenames'
I tried to run using the default settings, i.e.
python ./train-draw.py
but during the 4th epoch I got an out of memory error?
I'm using a modern GPU with 2GB ram, which is fine with my torch experiments, and all my other LSTM experiments, is there a way to avoid this?
AFTER ANOTHER EPOCH
Training status:
epochs_done: 4
iterations_done: 2000
Log records from the iteration 2000:
epoch_took: 210.858546019
iteration_took: 0.411077022552
saved_to: ('mnist-full-t10-enc256-dec256-z100-lr13.pkl',)
test_kl_term_0: 2.90826129913
.....
test_nll_bound: 101.928291321
total_took: 1034.53031182
train_kl_term_0: 3.13306331635
......
train_nll_bound: 103.738845825
train_total_gradient_norm: 27.4104881287
train_total_step_norm: 1.72634613514
Epoch 4, step 50 |
Elapsed Time: 0:00:20
Error allocating 7471104 bytes of device memory (out of memory).
Driver report 4771840 bytes free and 1341718528 bytes total
[12:34:07] blocks.main_loop Error occured during training.
MemoryError: Error allocating 7471104 bytes of device memory (out of memory).
Apply node that caused the error: GpuGemm{no_inplace}
After 100 epochs I got
train_nll_bound 91.1 and test_nll_bound of 90.5 is that similar to what you got?
I'm asking because the paper in Table 2 reported 80.97
ValueError Traceback (most recent call last)
/home/val/Desktop/draw/train-draw.py in ()
275 args = parser.parse_args()
276
--> 277 main(**vars(args))
/home/val/Desktop/draw/train-draw.py in main(name, dataset, epochs, batch_size, learning_rate, attention, n_iter, enc_dim, dec_dim, z_dim, oldmodel)
158 x = tensor.matrix('features')
159
--> 160 x_recons, kl_terms = draw.reconstruct(x)
161
162 recons_term = BinaryCrossEntropy().apply(x, x_recons)
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in call(self, _args, *_kwargs)
358
359 def call(self, _args, *_kwargs):
--> 360 return self.application.apply(self, _args, *_kwargs)
361
362
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, _args, *_kwargs)
300 self.call_stack.append(brick)
301 try:
--> 302 outputs = self.application_function(brick, _args, *_kwargs)
303 outputs = pack(outputs)
304 finally:
/home/val/Desktop/draw/draw/draw.pyc in reconstruct(self, features)
341
342 c, h_enc, c_enc, z, kl, h_dec, c_dec =
--> 343 rvals = self.apply(x=features, u=u)
344
345 x_recons = T.nnet.sigmoid(c[-1,:,:])
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in call(self, _args, *_kwargs)
358
359 def call(self, _args, *_kwargs):
--> 360 return self.application.apply(self, _args, *_kwargs)
361
362
/usr/local/lib/python2.7/dist-packages/blocks/bricks/base.pyc in apply(self, bound_application, _args, *_kwargs)
300 self.call_stack.append(brick)
301 try:
--> 302 outputs = self.application_function(brick, _args, *_kwargs)
303 outputs = pack(outputs)
304 finally:
/usr/local/lib/python2.7/dist-packages/blocks/bricks/recurrent.pyc in recurrent_apply(brick, application, application_call, _args, *kwargs)
231 go_backwards=reverse,
232 name='{}{}_scan'.format(
--> 233 brick.name, application.application_name))
234 result = pack(result)
235 if return_initial_states:
/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.pyc in scan(fn, sequences, outputs_info, non_sequences, n_steps, truncate_gradient, go_backwards, mode, name, profile, allow_gc, strict)
1042 pass
1043 scan_inputs += [arg]
-> 1044 scan_outs = local_op(*scan_inputs)
1045 if type(scan_outs) not in (list, tuple):
1046 scan_outs = [scan_outs]
/usr/local/lib/python2.7/dist-packages/theano/gof/op.pyc in call(self, _inputs, *_kwargs)
598 """
599 return_list = kwargs.pop('return_list', False)
--> 600 node = self.make_node(_inputs, *_kwargs)
601
602 if config.compute_test_value != 'off':
/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.pyc in make_node(self, *inputs)
539 argoffset + idx,
540 outer_sitsot.type.dtype,
--> 541 inner_sitsot_out.type.dtype))
542 if inner_sitsot_out.ndim != outer_sitsot.ndim - 1:
543 raise ValueError(err_msg3 %
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info
in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (fn
) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.
I'm running debian stretch, I have installed seemingly everything necessary, yet the program still has an error, particularly at line 32:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/home/dash/draw/draw/train-draw.py in <module>()
30 from blocks.monitoring import aggregation
31 from blocks.extensions import FinishAfter, Timing, Printing, ProgressBar
---> 32 from blocks.extensions.plot import Plot
33 from blocks.extensions.saveload import Checkpoint, Dump
34 from blocks.extensions.monitoring import DataStreamMonitoring, TrainingDataMonitoring
ImportError: No module named plot
If you could cover this better in the readme, it would greatly help, thanks!
Hi,
I want to draw the rectangles. It seems that I needs the return values of ZoomableAttentiuonWindow.nn2att (center_y, center_x, delta, sigma, gamma). So what's the best way to monitor these variables?
Blocks installation http://blocks.readthedocs.org/en/latest/setup.html
includes an install of Theano, Fuel and picklable-itertools which are also needed
(see https://github.com/bartvm/blocks/blob/master/requirements.txt )
so there is no need to list them again
On the other hand you need to prepare the data and set fuel path
http://blocks.readthedocs.org/en/latest/tutorial.html#training-your-model
you also need
pip install ipdb
you also need to download binarized mnist
https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/datasets/download_binarized_mnist.py
it seems to be linked to a bad compilation of libmkl_avx2.so
2 ideas
http://stackoverflow.com/questions/30323971/lapacke-dgesdd-crashed-segmentation-fault-core-dumped
and
http://iswwwup.com/t/7ad0fd892964/import-theano-gets-illegal-instruction.html
I don't know how to fix it
TypeError Traceback (most recent call last)
/home/muslimboy/Desktop/drawpython/draw/train-draw.py in ()
287 args = parser.parse_args()
288
--> 289 main(**vars(args))
/home/muslimboy/Desktop/drawpython/draw/train-draw.py in main(name, dataset, epochs, batch_size, learning_rate, attention, n_iter, enc_dim, dec_dim, z_dim, oldmodel, live_plotting)
179 step_rule=CompositeRule([
180 StepClipping(10.),
--> 181 Adam(learning_rate),
182 ])
183 #step_rule=RMSProp(learning_rate),
/home/muslimboy/anaconda/lib/python2.7/site-packages/blocks/algorithms/init.pyc in init(self, step_rule, gradients, known_grads, **kwargs)
191 if gradients:
192 kwargs.setdefault("params", gradients.keys())
--> 193 super(GradientDescent, self).init(**kwargs)
194
195 self.gradients = gradients
main_loop.model.set_parameter_values(oldmodel.get_param_values())
needs to be
main_loop.model.set_parameter_values(oldmodel.get_parameter_values())
Hi, thanks for making this great implementation open source.
I working on a similar implementation in torch, (but it's not working yet either without or with filterbanks/attention), and I'd like to understand your code better - because I'm confused?
To be precise, is there an easy way using Blocks
to track the norms of the encoder cell values, say ||c_enc_t||_2
, after the end of the forward pass, of the SGD step?
What I'm finding with my implementation is that, ||c_enc_t||_2
, when T
gets bigger than about 10 glimpses gets really big, after about 20 epochs. Initially I was using T=64
, but I reduced that to T=10
, but it's still happening?
Just wondered if you saw this with your Blocks/Theano implementation? Thanks for your help.
Best, Aj
Hi,
I've read through your code and some of the Blocks and Fuel code, and am now I trying to run it. I installed Blocks, Fuel, etc, and tried to run your repo using,
python ./train-draw.py
I got the following error,
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
which does'nt seem to be anything to do with Theano, or Blocks, I think it's something to do with Bokeh?
How do you fix it? Or better still,
*** can I run the code without Bokeh? ***
I'd prefer to do this?
The two screen shots show the Bokeh server on the left and the python error stack on the right,
Regards,
Aj
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.