jayparks / tf-seq2seq Goto Github PK

Sequence to sequence learning using TensorFlow.

Python 51.01% Jupyter Notebook 13.55% Perl 18.75% Shell 1.29% Smalltalk 1.24% Emacs Lisp 11.15% JavaScript 0.55% NewLisp 1.04% Ruby 1.08% Slash 0.23% SystemVerilog 0.12%

tensorflow seq2seq sequence-to-sequence neural-machine-translation nmt encoder-decoder machine-learning deep-learning neural-network natural-language-processing

tf-seq2seq's People

Contributors

Stargazers

Watchers

Forkers

gomson xaveng peratham pedrobalage kelizhong chulakar damingyang p63gonome3 fsxfreak sxdkxgwan fskyml yzx1992 elnazdavoodi virajadduru linzai1992 yfliao kudep huibinr dp1310 soprof chengka7 hpk23 zheng5yu9 jankim xblaster challenging zhuwenxiao winggy fydlzr benjamesbabala subho406 chandreshiit sammy4321 fence adedzy ngohoanhkhoa hustercn cppowboy gds123 arieszhang1994 s4sarath vishwajeet93 winnechan sunnymarkliu zhangxuemiao yeahestherchan ml-ai-nlp-ir zhongxia96 jiths xumine narchontis nays850 mukhal kairobo matteopagliari aiedward matteo-pagliari sevinjyolchuyeva fangpings lgdkobe24 germey hccho2 caoxu915683474 jasonluo-tw fanfanba garylms oliviershi giteshkhanna ameyem-skill-labs afcarl leeyangg fengsee tungk mulinfro kailiwu scottwang96 jyonn mrg7 whidbey watereals shashankg7 flamit yy77806773 czyssrs envibus navpreetsamra tianjiangood lity3lenovo psds01 alexwgr dkdl012 littttttlebird ankur287 johnsonhit muhammedabdelnasser phaniram-sayapaneni debuluoyi linloong lzswangjian xiedake

tf-seq2seq's Issues

ResourceExhausted error

While training, resource exhausted error. decreased hidden units batch size and num_enc/dec_units still no change. it is able to detect GPU -
2017-12-15 17:50:40.266983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 980 Ti major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.83GiB
2017-12-15 17:50:40.267067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
building model..
building encoder..
building decoder and attention..
setting optimizer..
Created new model parameters..
Training..

But after this following is the trackback of error

2017-12-15 17:45:48.103510: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2017-12-15 17:45:48.103914: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[500,15000]
2017-12-15 17:45:48.104030: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
Traceback (most recent call last):
File "train.py", line 227, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 223, in main
train()
File "train.py", line 149, in train
decoder_inputs=target, decoder_inputs_length=target_len)
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 470, in train
outputs = sess.run(output_feed, input_feed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
[[Node: decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_191 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1168_decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like', defined at:
File "train.py", line 227, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 223, in main
train()
File "train.py", line 125, in train
model = create_model(sess, FLAGS)
File "train.py", line 74, in create_model
model = Seq2SeqModel(config, 'train')
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 68, in init
self.build_model()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 77, in build_model
self.build_decoder()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 235, in build_decoder
self.init_optimizer()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 412, in init_optimizer
gradients = tf.gradients(self.loss, trainable_params)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 907, in _SelectGrad
zeros = array_ops.zeros_like(x)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1495, in zeros_like
return gen_array_ops._zeros_like(tensor, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 5960, in _zeros_like
"ZerosLike", x=x, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op u'decoder/decoder/while/Select_1', defined at:
File "train.py", line 227, in
tf.app.run()
[elided 5 identical lines from previous traceback]
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 77, in build_model
self.build_decoder()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 210, in build_decoder
maximum_iterations=max_decoder_length))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 286, in dynamic_decode
swap_memory=swap_memory)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 253, in body
zero_outputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/nest.py", line 413, in map_structure
structure[0], [func(*x) for x in entries])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 251, in
lambda out, zero: array_ops.where(finished, zero, out),
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2441, in where
return gen_math_ops._select(condition=condition, t=x, e=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 3988, in _select
"Select", condition=condition, t=t, e=e, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
[[Node: decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_191 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1168_decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

decoder_initial_state

Is this some left-over from previous versions of the code ?
https://github.com/JayParks/tf-seq2seq/blob/master/seq2seq_model.py#L359

Gets overwritten at #L393

If attention was not used, decoder_initial_state would simply be (a tiled) encoder_last_state ?

representation of the end token?

I'm a beginner in deep learning. Can i ask if the following codes are trying to add the decoder_end_token to all the decoder targets with the end_token full of 1s?
When we calculate the sequence loss, the crossentropy between the end_token and corresponding y_hat could be very large?
decoder_end_token = tf.ones(shape=[self.batch_size, 1], dtype=tf.int32) * data_utils.end_token
self.decoder_targets_train = tf.concat([self.decoder_inputs,decoder_end_token], axis=1)
self.loss = seq2seq.sequence_loss(logits=self.decoder_logits_train, targets=self.decoder_targets_train, weights=masks, average_across_timesteps=True, average_across_batch=True,)

Can anybody provide a guidance on how to run the code?

many thanks!

Missing file 'data/europarl-v7.1.4M.de'

Hi,

I got the file missing error. Was it deleted?
No such file or directory: 'data/europarl-v7.1.4M.de'

Thanks.

How can I use pre-embedded data in this model?

I'm trying to use pre-embedded data as input. It means I don't want to use embedding layer of model.
How should I do this? Need for help.

Does the BeamSearchDecoder work well?

Hi, I am also using r1.2 to implement beam search decoder, but I didn't get the correct results. greedy searching works well. Did you get the correct results when using beam search decoder? Thanks.

what's the format of sample_data.src and sample_data.trg?

Beam Search: Error in attn_decoder_input_fn in concat statement

https://github.com/JayParks/tf-seq2seq/blob/master/seq2seq_model.py#L368
It gives that the dimension 0 of inputs and attention do not match (as we are tile_batching it to batch_size * beam_width). Didn't you get any error while running with beam_search?

Does it support different depth for encoder and decoder

Hi,
when I review this code, I found the depth of encoder and decoder must be the same.
as " self.depth = config['depth']" which is used in contructing encoder and decoder.
And I try to set different depths, just get the following error:

Traceback (most recent call last):
File "train.py", line 301, in
tf.app.run()
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 297, in main
train()
File "train.py", line 145, in train
model = create_model(sess, FLAGS)
File "train.py", line 82, in create_model
model = Seq2SeqModel(config, 'train')
File "/home/aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 72, in init
self.build_model()
File "/home//aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 81, in build_model
self.build_decoder()
File "/home/aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 287, in build_decoder
maximum_iterations=max_decoder_length))
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 286, in dynamic_decode
swap_memory=swap_memory)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 234, in body
decoder_finished) = decoder.step(time, inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/basic_decoder.py", line 138, in step
cell_outputs, cell_state = self._cell(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1066, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 951, in call
outputs, new_state = self._cell(inputs, state, scope=scope)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 891, in call
output, new_state = self._cell(inputs, state, scope)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 591, in call
(c_prev, m_prev) = state
ValueError: too many values to unpack

Any way, do you have any suggetion to solve this problem ?

BPE and validation data

Can you clarify a bit on how you are taking the validation data and why bpe is used in such cases ?

tf.app.flags.DEFINE_string('source_valid_data', 'data/newstest2012.bpe.de', 'Path to source validation data')
tf.app.flags.DEFINE_string('target_valid_data', 'data/newstest2012.bpe.fr', 'Path to target validation data')

rnn moves back into core layer, should update for tf1.2?

Hi, this repo is userful, I just can't find a really successfully train and run implementation of seq2seq make full use of newest tensorflow apis. I check the code, seems rnn package moves back into core layer instead of contrib part. Did this tested on tensorflow1.2? Just report this to catch this repo up to the edge stage.

Could you provide the sample data?

Thank you very much for your contribution.
I am not familiar with NMT.
Could you provide me the address for downloading the sample.src and sample.trg?

attn_input_feeding

In the documentation, it is suggested to make attn_input_feeding =True during decoding.
But in the code, I don't see any place where it is set to True during decoding.

The configuration is all read from the dump formed during training and since it was set False during training, the attn_input_feeding remains False even during decoding.

Am i missing something?

Does the BeamSearchDecoder work well

what are data params?

preprocessing created a bunch of files. which of these files are data params and which data params are required??

--source_vocabulary : Path to source vocabulary
--target_vocabulary : Path to target vocabulary
--source_train_data : Path to source training data
--target_train_data : Path to target training data
--source_valid_data : Path to source validation data
--target_valid_data : Path to target validation data

can you show how to train a model using train.py with all data params required for training?

when i test this code with the dataset of europarl-v7.1.4M.en , the decoder result is not true, most line is <unk>

when i test this code with the dataset of europarl-v7.1.4M.en , the decoder result is not true, most line is .i am confused.

I have solved.

Get error when initializing decoder initial state

In seq2seq_model.py file,
I use bi-directional GRU for encoder but I got an error.
More specifically,in line 391, i got an error as follows

"TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn."

I use tensorflow 1.7 How can I solve this problem?

In addition, why do have to initialize last decoder cell to zero state not encoder last state as preceding layer?

Thanks in advance

can this code work?

I found this seq2seq project is quite helpful, which is based on the latest tensorflow 1.2. May I know whether your source code really works, as I found in your notebook that there is error arisen.

unsupported operand type(s) for -: 'float' and 'Flag'

Problem using attention wrapper.

am getting issue related to miss match of state and output. But I am unable to figure the issue.
It would be really appreciated if someone can guide me. Thanks in advance.
I am using tensorfow-gpu==1.2.1, with 1080 Ti graphics.

Error is as below:
ValueError: Shapes (8, 522) and (8, 512) are incompatible

Error occurs in the file "attention_wrapper.py" in the method named "call" at line 708

cell_output, next_cell_state = self._cell(cell_inputs, cell_state)

I was able to figure out that it is adding the attention_size to the shape and so there is a mismatch.
But I have no idea how to fix it.
The code is as below, hyper-parameters are declared as below (test purpose).
`
batch_size= 8
number_of_units_per_layer= 512
number_of_layers = 3
attn_size= 10
def build_decoder_cell(enc_output, enc_state, source_sequence_length, attn_size, batch_size):

encoder_outputs = enc_output
encoder_last_state = enc_state
encoder_inputs_length = source_sequence_length

attention_mechanism = attention_wrapper.LuongAttention(
        num_units=attn_size, memory=encoder_outputs,
        memory_sequence_length=encoder_inputs_length,
        scale=True,
        name='LuongAttention' )

# Building decoder_cell
decoder_cell_list = [
    build_single_cell() for i in range(num_layers)]

decoder_initial_state = encoder_last_state

def attn_decoder_input_fn(inputs, attention):
    #if not self.attn_input_feeding:
    #    return inputs

    # Essential when use_residual=True
    _input_layer = Dense(size, dtype=tf.float32,
                        name='attn_input_feeding')
    return _input_layer(array_ops.concat([inputs, attention], -1))


# AttentionWrapper wraps RNNCell with the attention_mechanism
# Note: We implement Attention mechanism only on the top decoder layer
decoder_cell_list[-1] = attention_wrapper.AttentionWrapper(
    cell=decoder_cell_list[-1],
    attention_mechanism=attention_mechanism,
    attention_layer_size=attn_size,
    #cell_input_fn=attn_decoder_input_fn,
    initial_cell_state=encoder_last_state[-1],
    alignment_history=False,
    name='Attention_Wrapper')

# To be compatible with AttentionWrapper, the encoder last state
# of the top layer should be converted into the AttentionWrapperState form
# We can easily do this by calling AttentionWrapper.zero_state

# Also if beamsearch decoding is used, the batch_size argument in .zero_state
# should be ${decoder_beam_width} times to the origianl batch_size
#batch_size = self.batch_size if not self.use_beamsearch_decode \
#    else self.batch_size * self.beam_width
initial_state = [state for state in encoder_last_state]

initial_state[-1] = decoder_cell_list[-1].zero_state(
    batch_size=batch_size, dtype=tf.float32)
decoder_initial_state = tuple(initial_state)

return tf.contrib.rnn.MultiRNNCell(decoder_cell_list), decoder_initial_state`

Thank you once again.