jayparks / tf-seq2seq Goto Github PK
View Code? Open in Web Editor NEWSequence to sequence learning using TensorFlow.
Sequence to sequence learning using TensorFlow.
While training, resource exhausted error. decreased hidden units batch size and num_enc/dec_units still no change. it is able to detect GPU -
2017-12-15 17:50:40.266983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 980 Ti major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.83GiB
2017-12-15 17:50:40.267067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
building model..
building encoder..
building decoder and attention..
setting optimizer..
Created new model parameters..
Training..
But after this following is the trackback of error
2017-12-15 17:45:48.103510: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2017-12-15 17:45:48.103914: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[500,15000]
2017-12-15 17:45:48.104030: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
Traceback (most recent call last):
File "train.py", line 227, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 223, in main
train()
File "train.py", line 149, in train
decoder_inputs=target, decoder_inputs_length=target_len)
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 470, in train
outputs = sess.run(output_feed, input_feed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
[[Node: decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_191 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1168_decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Caused by op u'decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like', defined at:
File "train.py", line 227, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 223, in main
train()
File "train.py", line 125, in train
model = create_model(sess, FLAGS)
File "train.py", line 74, in create_model
model = Seq2SeqModel(config, 'train')
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 68, in init
self.build_model()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 77, in build_model
self.build_decoder()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 235, in build_decoder
self.init_optimizer()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 412, in init_optimizer
gradients = tf.gradients(self.loss, trainable_params)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 907, in _SelectGrad
zeros = array_ops.zeros_like(x)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1495, in zeros_like
return gen_array_ops._zeros_like(tensor, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 5960, in _zeros_like
"ZerosLike", x=x, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op u'decoder/decoder/while/Select_1', defined at:
File "train.py", line 227, in
tf.app.run()
[elided 5 identical lines from previous traceback]
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 77, in build_model
self.build_decoder()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 210, in build_decoder
maximum_iterations=max_decoder_length))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 286, in dynamic_decode
swap_memory=swap_memory)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 253, in body
zero_outputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/nest.py", line 413, in map_structure
structure[0], [func(*x) for x in entries])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 251, in
lambda out, zero: array_ops.where(finished, zero, out),
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2441, in where
return gen_math_ops._select(condition=condition, t=x, e=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 3988, in _select
"Select", condition=condition, t=t, e=e, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
[[Node: decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_191 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1168_decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Is this some left-over from previous versions of the code ?
https://github.com/JayParks/tf-seq2seq/blob/master/seq2seq_model.py#L359
Gets overwritten at #L393
If attention was not used, decoder_initial_state would simply be (a tiled) encoder_last_state ?
I'm a beginner in deep learning. Can i ask if the following codes are trying to add the decoder_end_token to all the decoder targets with the end_token full of 1s?
When we calculate the sequence loss, the crossentropy between the end_token and corresponding y_hat could be very large?
decoder_end_token = tf.ones(shape=[self.batch_size, 1], dtype=tf.int32) * data_utils.end_token
self.decoder_targets_train = tf.concat([self.decoder_inputs,decoder_end_token], axis=1)
self.loss = seq2seq.sequence_loss(logits=self.decoder_logits_train, targets=self.decoder_targets_train, weights=masks, average_across_timesteps=True, average_across_batch=True,)
many thanks!
Hi,
I got the file missing error. Was it deleted?
No such file or directory: 'data/europarl-v7.1.4M.de'
Thanks.
I'm trying to use pre-embedded data as input. It means I don't want to use embedding layer of model.
How should I do this? Need for help.
Hi, I am also using r1.2 to implement beam search decoder, but I didn't get the correct results. greedy searching works well. Did you get the correct results when using beam search decoder? Thanks.
https://github.com/JayParks/tf-seq2seq/blob/master/seq2seq_model.py#L368
It gives that the dimension 0 of inputs and attention do not match (as we are tile_batching it to batch_size * beam_width). Didn't you get any error while running with beam_search?
Hi,
when I review this code, I found the depth of encoder and decoder must be the same.
as " self.depth = config['depth']" which is used in contructing encoder and decoder.
And I try to set different depths, just get the following error:
Traceback (most recent call last):
File "train.py", line 301, in
tf.app.run()
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 297, in main
train()
File "train.py", line 145, in train
model = create_model(sess, FLAGS)
File "train.py", line 82, in create_model
model = Seq2SeqModel(config, 'train')
File "/home/aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 72, in init
self.build_model()
File "/home//aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 81, in build_model
self.build_decoder()
File "/home/aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 287, in build_decoder
maximum_iterations=max_decoder_length))
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 286, in dynamic_decode
swap_memory=swap_memory)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 234, in body
decoder_finished) = decoder.step(time, inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/basic_decoder.py", line 138, in step
cell_outputs, cell_state = self._cell(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1066, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 951, in call
outputs, new_state = self._cell(inputs, state, scope=scope)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 891, in call
output, new_state = self._cell(inputs, state, scope)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 591, in call
(c_prev, m_prev) = state
ValueError: too many values to unpack
Any way, do you have any suggetion to solve this problem ?
Can you clarify a bit on how you are taking the validation data and why bpe is used in such cases ?
tf.app.flags.DEFINE_string('source_valid_data', 'data/newstest2012.bpe.de', 'Path to source validation data')
tf.app.flags.DEFINE_string('target_valid_data', 'data/newstest2012.bpe.fr', 'Path to target validation data')
Hi, this repo is userful, I just can't find a really successfully train and run implementation of seq2seq make full use of newest tensorflow apis. I check the code, seems rnn package moves back into core layer instead of contrib part. Did this tested on tensorflow1.2? Just report this to catch this repo up to the edge stage.
Thank you very much for your contribution.
I am not familiar with NMT.
Could you provide me the address for downloading the sample.src and sample.trg?
In the documentation, it is suggested to make attn_input_feeding =True during decoding.
But in the code, I don't see any place where it is set to True during decoding.
The configuration is all read from the dump formed during training and since it was set False during training, the attn_input_feeding remains False even during decoding.
Am i missing something?
preprocessing created a bunch of files. which of these files are data params and which data params are required??
--source_vocabulary : Path to source vocabulary
--target_vocabulary : Path to target vocabulary
--source_train_data : Path to source training data
--target_train_data : Path to target training data
--source_valid_data : Path to source validation data
--target_valid_data : Path to target validation data
can you show how to train a model using train.py with all data params required for training?
when i test this code with the dataset of europarl-v7.1.4M.en , the decoder result is not true, most line is .i am confused.
I have solved.
In seq2seq_model.py file,
I use bi-directional GRU for encoder but I got an error.
More specifically,in line 391, i got an error as follows
"TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn."
I use tensorflow 1.7 How can I solve this problem?
In addition, why do have to initialize last decoder cell to zero state not encoder last state as preceding layer?
Thanks in advance
am getting issue related to miss match of state and output. But I am unable to figure the issue.
It would be really appreciated if someone can guide me. Thanks in advance.
I am using tensorfow-gpu==1.2.1, with 1080 Ti graphics.
Error is as below:
ValueError: Shapes (8, 522) and (8, 512) are incompatible
Error occurs in the file "attention_wrapper.py" in the method named "call" at line 708
cell_output, next_cell_state = self._cell(cell_inputs, cell_state)
I was able to figure out that it is adding the attention_size to the shape and so there is a mismatch.
But I have no idea how to fix it.
The code is as below, hyper-parameters are declared as below (test purpose).
`
batch_size= 8
number_of_units_per_layer= 512
number_of_layers = 3
attn_size= 10
def build_decoder_cell(enc_output, enc_state, source_sequence_length, attn_size, batch_size):
encoder_outputs = enc_output
encoder_last_state = enc_state
encoder_inputs_length = source_sequence_length
attention_mechanism = attention_wrapper.LuongAttention(
num_units=attn_size, memory=encoder_outputs,
memory_sequence_length=encoder_inputs_length,
scale=True,
name='LuongAttention' )
# Building decoder_cell
decoder_cell_list = [
build_single_cell() for i in range(num_layers)]
decoder_initial_state = encoder_last_state
def attn_decoder_input_fn(inputs, attention):
#if not self.attn_input_feeding:
# return inputs
# Essential when use_residual=True
_input_layer = Dense(size, dtype=tf.float32,
name='attn_input_feeding')
return _input_layer(array_ops.concat([inputs, attention], -1))
# AttentionWrapper wraps RNNCell with the attention_mechanism
# Note: We implement Attention mechanism only on the top decoder layer
decoder_cell_list[-1] = attention_wrapper.AttentionWrapper(
cell=decoder_cell_list[-1],
attention_mechanism=attention_mechanism,
attention_layer_size=attn_size,
#cell_input_fn=attn_decoder_input_fn,
initial_cell_state=encoder_last_state[-1],
alignment_history=False,
name='Attention_Wrapper')
# To be compatible with AttentionWrapper, the encoder last state
# of the top layer should be converted into the AttentionWrapperState form
# We can easily do this by calling AttentionWrapper.zero_state
# Also if beamsearch decoding is used, the batch_size argument in .zero_state
# should be ${decoder_beam_width} times to the origianl batch_size
#batch_size = self.batch_size if not self.use_beamsearch_decode \
# else self.batch_size * self.beam_width
initial_state = [state for state in encoder_last_state]
initial_state[-1] = decoder_cell_list[-1].zero_state(
batch_size=batch_size, dtype=tf.float32)
decoder_initial_state = tuple(initial_state)
return tf.contrib.rnn.MultiRNNCell(decoder_cell_list), decoder_initial_state`
Thank you once again.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.