Code Monkey home page Code Monkey logo

jayparks / tf-seq2seq Goto Github PK

View Code? Open in Web Editor NEW
391.0 25.0 109.0 123 KB

Sequence to sequence learning using TensorFlow.

Python 51.01% Jupyter Notebook 13.55% Perl 18.75% Shell 1.29% Smalltalk 1.24% Emacs Lisp 11.15% JavaScript 0.55% NewLisp 1.04% Ruby 1.08% Slash 0.23% SystemVerilog 0.12%
tensorflow seq2seq sequence-to-sequence neural-machine-translation nmt encoder-decoder machine-learning deep-learning neural-network natural-language-processing

tf-seq2seq's People

Contributors

germey avatar jayparks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tf-seq2seq's Issues

ResourceExhausted error

While training, resource exhausted error. decreased hidden units batch size and num_enc/dec_units still no change. it is able to detect GPU -
2017-12-15 17:50:40.266983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 980 Ti major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.83GiB
2017-12-15 17:50:40.267067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0, compute capability: 5.2)
building model..
building encoder..
building decoder and attention..
setting optimizer..
Created new model parameters..
Training..

But after this following is the trackback of error

2017-12-15 17:45:48.103510: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2017-12-15 17:45:48.103914: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[500,15000]
2017-12-15 17:45:48.104030: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
Traceback (most recent call last):
File "train.py", line 227, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 223, in main
train()
File "train.py", line 149, in train
decoder_inputs=target, decoder_inputs_length=target_len)
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 470, in train
outputs = sess.run(output_feed, input_feed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
[[Node: decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_191 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1168_decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like', defined at:
File "train.py", line 227, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 223, in main
train()
File "train.py", line 125, in train
model = create_model(sess, FLAGS)
File "train.py", line 74, in create_model
model = Seq2SeqModel(config, 'train')
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 68, in init
self.build_model()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 77, in build_model
self.build_decoder()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 235, in build_decoder
self.init_optimizer()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 412, in init_optimizer
gradients = tf.gradients(self.loss, trainable_params)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 907, in _SelectGrad
zeros = array_ops.zeros_like(x)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1495, in zeros_like
return gen_array_ops._zeros_like(tensor, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 5960, in _zeros_like
"ZerosLike", x=x, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op u'decoder/decoder/while/Select_1', defined at:
File "train.py", line 227, in
tf.app.run()
[elided 5 identical lines from previous traceback]
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 77, in build_model
self.build_decoder()
File "/DATA/USERS/sai/residual/tf-seq2seq/seq2seq_model.py", line 210, in build_decoder
maximum_iterations=max_decoder_length))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 286, in dynamic_decode
swap_memory=swap_memory)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 253, in body
zero_outputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/nest.py", line 413, in map_structure
structure[0], [func(*x) for x in entries])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 251, in
lambda out, zero: array_ops.where(finished, zero, out),
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2441, in where
return gen_math_ops._select(condition=condition, t=x, e=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 3988, in _select
"Select", condition=condition, t=t, e=e, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[500,15000]
[[Node: decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like = ZerosLike[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/gradients/decoder/decoder/while/Select_1_grad/zeros_like/Enter, ^decoder/gradients/Sub)]]
[[Node: decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_191 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1168_decoder/gradients/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

representation of the end token?

I'm a beginner in deep learning. Can i ask if the following codes are trying to add the decoder_end_token to all the decoder targets with the end_token full of 1s?
When we calculate the sequence loss, the crossentropy between the end_token and corresponding y_hat could be very large?
decoder_end_token = tf.ones(shape=[self.batch_size, 1], dtype=tf.int32) * data_utils.end_token
self.decoder_targets_train = tf.concat([self.decoder_inputs,decoder_end_token], axis=1)
self.loss = seq2seq.sequence_loss(logits=self.decoder_logits_train, targets=self.decoder_targets_train, weights=masks, average_across_timesteps=True, average_across_batch=True,)

Does the BeamSearchDecoder work well?

Hi, I am also using r1.2 to implement beam search decoder, but I didn't get the correct results. greedy searching works well. Did you get the correct results when using beam search decoder? Thanks.

Does it support different depth for encoder and decoder

Hi,
when I review this code, I found the depth of encoder and decoder must be the same.
as " self.depth = config['depth']" which is used in contructing encoder and decoder.
And I try to set different depths, just get the following error:


Traceback (most recent call last):
File "train.py", line 301, in
tf.app.run()
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 297, in main
train()
File "train.py", line 145, in train
model = create_model(sess, FLAGS)
File "train.py", line 82, in create_model
model = Seq2SeqModel(config, 'train')
File "/home/aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 72, in init
self.build_model()
File "/home//aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 81, in build_model
self.build_decoder()
File "/home/aldy/work/nmt/tf-seq/tf-seq2seq-master/seq2seq_model.py", line 287, in build_decoder
maximum_iterations=max_decoder_length))
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 286, in dynamic_decode
swap_memory=swap_memory)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home//aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 234, in body
decoder_finished) = decoder.step(time, inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/basic_decoder.py", line 138, in step
cell_outputs, cell_state = self._cell(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1066, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 951, in call
outputs, new_state = self._cell(inputs, state, scope=scope)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 891, in call
output, new_state = self._cell(inputs, state, scope)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/aldy/.local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 591, in call
(c_prev, m_prev) = state
ValueError: too many values to unpack


Any way, do you have any suggetion to solve this problem ?

BPE and validation data

Can you clarify a bit on how you are taking the validation data and why bpe is used in such cases ?

tf.app.flags.DEFINE_string('source_valid_data', 'data/newstest2012.bpe.de', 'Path to source validation data')
tf.app.flags.DEFINE_string('target_valid_data', 'data/newstest2012.bpe.fr', 'Path to target validation data')

rnn moves back into core layer, should update for tf1.2?

Hi, this repo is userful, I just can't find a really successfully train and run implementation of seq2seq make full use of newest tensorflow apis. I check the code, seems rnn package moves back into core layer instead of contrib part. Did this tested on tensorflow1.2? Just report this to catch this repo up to the edge stage.

Could you provide the sample data?

Thank you very much for your contribution.
I am not familiar with NMT.
Could you provide me the address for downloading the sample.src and sample.trg?

attn_input_feeding

In the documentation, it is suggested to make attn_input_feeding =True during decoding.
But in the code, I don't see any place where it is set to True during decoding.

The configuration is all read from the dump formed during training and since it was set False during training, the attn_input_feeding remains False even during decoding.

Am i missing something?

what are data params?

preprocessing created a bunch of files. which of these files are data params and which data params are required??

--source_vocabulary : Path to source vocabulary
--target_vocabulary : Path to target vocabulary
--source_train_data : Path to source training data
--target_train_data : Path to target training data
--source_valid_data : Path to source validation data
--target_valid_data : Path to target validation data

can you show how to train a model using train.py with all data params required for training?

Get error when initializing decoder initial state

In seq2seq_model.py file,
I use bi-directional GRU for encoder but I got an error.
More specifically,in line 391, i got an error as follows

"TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn."

I use tensorflow 1.7 How can I solve this problem?

In addition, why do have to initialize last decoder cell to zero state not encoder last state as preceding layer?

Thanks in advance

can this code work?

I found this seq2seq project is quite helpful, which is based on the latest tensorflow 1.2. May I know whether your source code really works, as I found in your notebook that there is error arisen.

screen shot 2017-06-03 at 19 49 31

Problem using attention wrapper.

am getting issue related to miss match of state and output. But I am unable to figure the issue.
It would be really appreciated if someone can guide me. Thanks in advance.
I am using tensorfow-gpu==1.2.1, with 1080 Ti graphics.

Error is as below:
ValueError: Shapes (8, 522) and (8, 512) are incompatible

Error occurs in the file "attention_wrapper.py" in the method named "call" at line 708

cell_output, next_cell_state = self._cell(cell_inputs, cell_state)

I was able to figure out that it is adding the attention_size to the shape and so there is a mismatch.
But I have no idea how to fix it.
The code is as below, hyper-parameters are declared as below (test purpose).
`
batch_size= 8
number_of_units_per_layer= 512
number_of_layers = 3
attn_size= 10
def build_decoder_cell(enc_output, enc_state, source_sequence_length, attn_size, batch_size):

encoder_outputs = enc_output
encoder_last_state = enc_state
encoder_inputs_length = source_sequence_length

attention_mechanism = attention_wrapper.LuongAttention(
        num_units=attn_size, memory=encoder_outputs,
        memory_sequence_length=encoder_inputs_length,
        scale=True,
        name='LuongAttention' )

# Building decoder_cell
decoder_cell_list = [
    build_single_cell() for i in range(num_layers)]

decoder_initial_state = encoder_last_state

def attn_decoder_input_fn(inputs, attention):
    #if not self.attn_input_feeding:
    #    return inputs

    # Essential when use_residual=True
    _input_layer = Dense(size, dtype=tf.float32,
                        name='attn_input_feeding')
    return _input_layer(array_ops.concat([inputs, attention], -1))


# AttentionWrapper wraps RNNCell with the attention_mechanism
# Note: We implement Attention mechanism only on the top decoder layer
decoder_cell_list[-1] = attention_wrapper.AttentionWrapper(
    cell=decoder_cell_list[-1],
    attention_mechanism=attention_mechanism,
    attention_layer_size=attn_size,
    #cell_input_fn=attn_decoder_input_fn,
    initial_cell_state=encoder_last_state[-1],
    alignment_history=False,
    name='Attention_Wrapper')

# To be compatible with AttentionWrapper, the encoder last state
# of the top layer should be converted into the AttentionWrapperState form
# We can easily do this by calling AttentionWrapper.zero_state

# Also if beamsearch decoding is used, the batch_size argument in .zero_state
# should be ${decoder_beam_width} times to the origianl batch_size
#batch_size = self.batch_size if not self.use_beamsearch_decode \
#    else self.batch_size * self.beam_width
initial_state = [state for state in encoder_last_state]

initial_state[-1] = decoder_cell_list[-1].zero_state(
    batch_size=batch_size, dtype=tf.float32)
decoder_initial_state = tuple(initial_state)

return tf.contrib.rnn.MultiRNNCell(decoder_cell_list), decoder_initial_state`

Thank you once again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.