Comments (7)
Hmm, you would need to know the ID of the tag.
from tensorflow-tutorial.
@alrojo - I tried to implement . So , if we are having a new data , we will enc_state from dynamic_rnn . Then , we take EOS ( # ) token from target_embedding matrix , calculate new state and then multiply it with W_out and b_out make a prediction using argmax , then using that vector in the argmax position to calculate new state and so on , untill we reach the end of max_length ( if provided ) , or EOS token . But , can you make a implementation inside the Ipython notebook for unseen data prediction ?
from tensorflow-tutorial.
Hi @s4sarath , we are working on something that would accomplish such.
You can checkout this PR for more information.
Also, ideas and contributions are much welcomed.
from tensorflow-tutorial.
@alrojo - Yeah actually , I did this by separating everything from tf_utils.decoder to training and validation( testing ) as separate functions . But , I feel it is so amateurish , because I am not a hardcore programmer , to follow proper python syntax or pep-8 . Anyway , I will try to modify or giving suggestion based on your PR at tensorflow . Thanks , for your contributions by the way , i make use of your code as revrse-engineering o learn algorithm better .
from tensorflow-tutorial.
I'm glad you can make use it, I did the exact same thing last year when I started learning deep learning and python from this developer: github.com/bennane . The only pep-8 requirements I really think about are: use 4 spaces instead of tabs, don't write more than 79 chars in one line.
Yes, feel free to come with questions/development ideas in the PR or here.
The architecture/code is still very much under development.
from tensorflow-tutorial.
@alrojo - Hi . I was trying to modify your attention decoder , to create a new attention . My aim is I am having a 2d matrix and 3d matrix . Lets say 2 x 10 and 2 x 10 x 5 . What i need is , first row of 2d matrix will multiply ( matrix ) with , 1 st batch of 3d matrix . Then second row of 2d with 2nd batch of 3d , resulting in 2 x 5 matrix . So , I decided to make use of while , inside the loop of your attention decoder . But I am getting an error . The error is InvalidArgumentError: TensorArray TensorArray: Could not read from TensorArray index 0 because it has not yet been written to.
if I am not supposed to post it here , i am sorry and I will take back the code and comment
import tensorflow as tf
from tensorflow.python.ops import tensor_array_ops
from tensorflow.python.framework import ops
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import math_ops
###
# a custom masking function, takes sequence lengths and makes masks
def mask(sequence_lengths):
# based on this SO answer: http://stackoverflow.com/a/34138336/118173
batch_size = tf.shape(sequence_lengths)[0]
max_len = tf.reduce_max(sequence_lengths)
lengths_transposed = tf.expand_dims(sequence_lengths, 1)
rng = tf.range(max_len)
rng_row = tf.expand_dims(rng, 0)
return tf.less(rng_row, lengths_transposed)
###
# decoder with attention
def attention_decodercustom_(attention_input, attention_lengths, initial_state, target_input,
target_input_lengths, num_units, num_attn_units, embeddings, W_out, b_out,
batch_len = 3, name='decoder', swap=False):
"""Decoder with attention.
Note that the number of units in the attention decoder must always
be equal to the size of the initial state/attention input.
Keyword arguments:
attention_input: the input to put attention on. expected dims: [batch_size, attention_length, attention_dims]
initial_state: The initial state for the decoder RNN.
target_input: The target to replicate. Expected: [batch_size, max_target_sequence_len, embedding_dims]
num_attn_units: Number of units in the alignment layer that produces the context vectors.
"""
with tf.variable_scope(name):
target_dims = target_input.get_shape()[2]
attention_dims = attention_input.get_shape()[2]
input_max_len = attention_input.get_shape()[1]
attn_len = tf.shape(attention_input)[1]
max_sequence_length = tf.reduce_max(target_input_lengths)
num_units = attention_dims
weight_initializer = tf.truncated_normal_initializer(stddev=0.1)
attention_input_mod = tf.transpose(attention_input , [0,2,1])
# map initial state to num_units
var = tf.get_variable # for ease of use
# target_dims + num_units is because we stack embeddings and prev. hidden state to
# optimize speed
W_z_x = var('W_z_x', shape=[target_dims, num_units], initializer=weight_initializer)
W_z_h = var('W_z_h', shape=[num_units, num_units], initializer=weight_initializer)
b_z = var('b_z', shape=[num_units], initializer=weight_initializer)
W_r_x = var('W_r_x', shape=[target_dims, num_units], initializer=weight_initializer)
W_r_h = var('W_r_h', shape=[num_units, num_units], initializer=weight_initializer)
b_r = var('b_r', shape=[num_units], initializer=weight_initializer)
W_c_x = var('W_c_x', shape=[target_dims, num_units], initializer=weight_initializer)
W_c_h = var('W_c_h', shape=[num_units, num_units], initializer = weight_initializer)
b_c = var('b_c', shape=[num_units], initializer=weight_initializer)
middle_matrix = var('middle', shape=[num_units, num_units], initializer = weight_initializer)
# project initial state
# TODO: don't use convolutions!
# TODO: fix the bias (b_a)
# make inputs time-major
inputs = tf.transpose(target_input, perm=[1, 0, 2])
inputs_temp = inputs
# make tensor array for inputs, these are dynamic and used in the while-loop
# these are not in the api documentation yet, you will have to look at github.com/tensorflow
input_ta_temp = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True , name='input_ta_temp')
input_ta_temp = input_ta_temp.unpack(inputs_temp)
time = tf.constant(0)
# calculate the GRU
x_t = input_ta_temp.read(time)
z = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(initial_state, W_z_h) + b_z) # update gate
r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(initial_state, W_r_h) + b_r) # reset gate
c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*initial_state, W_c_h) + b_c) # proposed new state
new_state = (1-z)*c + z*initial_state # new state
initial_state = new_state
input_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, name = 'input_ta')
input_ta = input_ta.unpack(inputs)
def decoder_cond(time, state, output_ta_t, attention_tracker):
return tf.less(time, max_sequence_length)
def decoder_body_builder(feedback=False):
def decoder_body(time, old_state, output_ta_t, attention_tracker):
if feedback:
def from_previous():
prev_1 = tf.matmul(old_state, W_out) + b_out
return tf.gather(embeddings, tf.argmax(prev_1, 1))
x_t = tf.cond(tf.greater(time, 0), from_previous, lambda: input_ta.read(0))
else:
x_t = input_ta.read(time)
# calculate the GRU
def sub_decoder_cond(sub_time,temp_holder_):
return tf.less(sub_time, 3)
def sub_decoder_body_builder():
def sub_decoder_body(sub_time ,temp_holder_t):
sub_x_t = tf.reshape(sub_initial_.read(sub_time) , [1,-1])
sub_i_t = sub_input.read(sub_time)
sub_res = tf.matmul(sub_x_t, sub_i_t)
temp_holder_t.write(sub_time, sub_x_t)
return(sub_time+1, temp_holder_t )
return sub_decoder_body
we_project = tf.tanh(tf.matmul( initial_state , middle_matrix ))
sub_initial_ = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True ,tensor_array_name='sub_initial')
sub_input = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, tensor_array_name = 'sub_input')
temp_holder = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True , tensor_array_name = 'temp_holder' )
sub_initial_ = sub_initial_.unpack(we_project)
sub_input = sub_input.unpack(attention_input_mod)
sub_time = tf.constant(0)
sub_loop_vars = [sub_time, temp_holder]
_, temp_holder = tf.while_loop(sub_decoder_cond,
sub_decoder_body_builder(),
sub_loop_vars,
swap_memory=swap)
alpha_time = temp_holder.pack()
# temp_holder.close()
alpha = tf.to_float(mask(attention_lengths)) * alpha_time
alpha_softmax = alpha
# alpha_softmax = tf.nn.softmax(alpha)
z = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(old_state, W_z_h) + b_z) # update gate
r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(old_state, W_r_h) + b_r) # reset gate
c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*old_state, W_c_h) + b_c) # proposed new state
new_state = (1-z)*c + z*old_state # new state
# writing output
output_ta_t = output_ta_t.write(time+1, new_state)
attention_tracker = attention_tracker.write(time, alpha_softmax)
# context = tf.reduce_sum(tf.expand_dims(alpha_softmax, 2) * attention_input, [1])
return (time + 1, new_state, output_ta_t, attention_tracker)
return decoder_body
output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
attention_tracker = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
time = tf.constant(0)
loop_vars = [time, initial_state, output_ta, attention_tracker]
_, state, output_ta, valid_attention_tracker = tf.while_loop(decoder_cond,
decoder_body_builder(),
loop_vars,
swap_memory=swap)
# _, valid_state, valid_output_ta, valid_attention_tracker = tf.while_loop(decoder_cond,
# decoder_body_builder(feedback=True),
# loop_vars,
# swap_memory=swap)
dec_out = tf.transpose(output_ta.pack(), perm=[1, 0, 2])
# valid_dec_out = tf.transpose(valid_output_ta.pack(), perm=[1, 0, 2])
valid_attention_tracker = tf.transpose(valid_attention_tracker.pack(), perm=[1, 0, 2])
# return dec_out, valid_dec_out, valid_attention_tracker
return dec_out, valid_attention_tracker
from tensorflow-tutorial.
Hi, TensorFlow supports this type of behaviour now in the seq2seq section. However, to keep the tutorial a learning experience we will keep the old way of doing it, as it gives the learner an intuition about how to build custom encoders and decoders from scratch in TensorFlow.
from tensorflow-tutorial.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow-tutorial.