Code Monkey home page Code Monkey logo

Comments (7)

alrojo avatar alrojo commented on September 26, 2024

Hmm, you would need to know the ID of the tag.

from tensorflow-tutorial.

s4sarath avatar s4sarath commented on September 26, 2024

@alrojo - I tried to implement . So , if we are having a new data , we will enc_state from dynamic_rnn . Then , we take EOS ( # ) token from target_embedding matrix , calculate new state and then multiply it with W_out and b_out make a prediction using argmax , then using that vector in the argmax position to calculate new state and so on , untill we reach the end of max_length ( if provided ) , or EOS token . But , can you make a implementation inside the Ipython notebook for unseen data prediction ?

from tensorflow-tutorial.

alrojo avatar alrojo commented on September 26, 2024

Hi @s4sarath , we are working on something that would accomplish such.
You can checkout this PR for more information.
Also, ideas and contributions are much welcomed.

from tensorflow-tutorial.

s4sarath avatar s4sarath commented on September 26, 2024

@alrojo - Yeah actually , I did this by separating everything from tf_utils.decoder to training and validation( testing ) as separate functions . But , I feel it is so amateurish , because I am not a hardcore programmer , to follow proper python syntax or pep-8 . Anyway , I will try to modify or giving suggestion based on your PR at tensorflow . Thanks , for your contributions by the way , i make use of your code as revrse-engineering o learn algorithm better .

from tensorflow-tutorial.

alrojo avatar alrojo commented on September 26, 2024

I'm glad you can make use it, I did the exact same thing last year when I started learning deep learning and python from this developer: github.com/bennane . The only pep-8 requirements I really think about are: use 4 spaces instead of tabs, don't write more than 79 chars in one line.

Yes, feel free to come with questions/development ideas in the PR or here.
The architecture/code is still very much under development.

from tensorflow-tutorial.

s4sarath avatar s4sarath commented on September 26, 2024

@alrojo - Hi . I was trying to modify your attention decoder , to create a new attention . My aim is I am having a 2d matrix and 3d matrix . Lets say 2 x 10 and 2 x 10 x 5 . What i need is , first row of 2d matrix will multiply ( matrix ) with , 1 st batch of 3d matrix . Then second row of 2d with 2nd batch of 3d , resulting in 2 x 5 matrix . So , I decided to make use of while , inside the loop of your attention decoder . But I am getting an error . The error is InvalidArgumentError: TensorArray TensorArray: Could not read from TensorArray index 0 because it has not yet been written to.

if I am not supposed to post it here , i am sorry and I will take back the code and comment

import tensorflow as tf
from tensorflow.python.ops import tensor_array_ops
from tensorflow.python.framework import ops
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import math_ops


###
# a custom masking function, takes sequence lengths and makes masks
def mask(sequence_lengths):
    # based on this SO answer: http://stackoverflow.com/a/34138336/118173
    batch_size = tf.shape(sequence_lengths)[0]
    max_len = tf.reduce_max(sequence_lengths)

    lengths_transposed = tf.expand_dims(sequence_lengths, 1)

    rng = tf.range(max_len)
    rng_row = tf.expand_dims(rng, 0)

    return tf.less(rng_row, lengths_transposed)

###
# decoder with attention

def attention_decodercustom_(attention_input, attention_lengths, initial_state, target_input,
                      target_input_lengths, num_units, num_attn_units, embeddings, W_out, b_out,
                      batch_len = 3, name='decoder', swap=False):
    """Decoder with attention.
    Note that the number of units in the attention decoder must always
    be equal to the size of the initial state/attention input.
    Keyword arguments:
        attention_input:    the input to put attention on. expected dims: [batch_size, attention_length, attention_dims]
        initial_state:      The initial state for the decoder RNN.
        target_input:       The target to replicate. Expected: [batch_size, max_target_sequence_len, embedding_dims]
        num_attn_units:     Number of units in the alignment layer that produces the context vectors.
    """
    with tf.variable_scope(name):
        target_dims = target_input.get_shape()[2]
        attention_dims = attention_input.get_shape()[2]
        input_max_len =  attention_input.get_shape()[1]
        attn_len = tf.shape(attention_input)[1]
        max_sequence_length = tf.reduce_max(target_input_lengths)
        num_units = attention_dims
        weight_initializer = tf.truncated_normal_initializer(stddev=0.1)
        attention_input_mod = tf.transpose(attention_input , [0,2,1])
        # map initial state to num_units
        var = tf.get_variable # for ease of use
        # target_dims + num_units is because we stack embeddings and prev. hidden state to
        # optimize speed
        W_z_x = var('W_z_x', shape=[target_dims, num_units], initializer=weight_initializer)
        W_z_h = var('W_z_h', shape=[num_units, num_units], initializer=weight_initializer)
        b_z = var('b_z', shape=[num_units], initializer=weight_initializer)
        W_r_x = var('W_r_x', shape=[target_dims, num_units], initializer=weight_initializer)
        W_r_h = var('W_r_h', shape=[num_units, num_units], initializer=weight_initializer)
        b_r = var('b_r', shape=[num_units], initializer=weight_initializer)
        W_c_x = var('W_c_x', shape=[target_dims, num_units], initializer=weight_initializer)
        W_c_h = var('W_c_h', shape=[num_units, num_units], initializer = weight_initializer)
        b_c = var('b_c', shape=[num_units], initializer=weight_initializer)
        middle_matrix = var('middle', shape=[num_units, num_units], initializer = weight_initializer)
        # project initial state

        # TODO: don't use convolutions!
        # TODO: fix the bias (b_a)


        # make inputs time-major
        inputs = tf.transpose(target_input, perm=[1, 0, 2])
        inputs_temp = inputs
        # make tensor array for inputs, these are dynamic and used in the while-loop
        # these are not in the api documentation yet, you will have to look at github.com/tensorflow
        input_ta_temp = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True , name='input_ta_temp')
        input_ta_temp = input_ta_temp.unpack(inputs_temp)
        time = tf.constant(0)

        # calculate the GRU
        x_t = input_ta_temp.read(time)
        z = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(initial_state, W_z_h) + b_z) # update gate
        r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(initial_state, W_r_h) + b_r) # reset gate
        c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*initial_state, W_c_h) + b_c) # proposed new state
        new_state = (1-z)*c + z*initial_state # new state
        initial_state = new_state
        input_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, name = 'input_ta')
        input_ta = input_ta.unpack(inputs)



        def decoder_cond(time, state, output_ta_t, attention_tracker):
            return tf.less(time, max_sequence_length)


        def decoder_body_builder(feedback=False):
            def decoder_body(time, old_state, output_ta_t, attention_tracker):
                if feedback:
                    def from_previous():
                        prev_1 = tf.matmul(old_state, W_out) + b_out
                        return tf.gather(embeddings, tf.argmax(prev_1, 1))
                    x_t = tf.cond(tf.greater(time, 0), from_previous, lambda: input_ta.read(0))
                else:
                    x_t = input_ta.read(time)

                 # calculate the GRU



                def sub_decoder_cond(sub_time,temp_holder_):
                        return tf.less(sub_time, 3) 

                def sub_decoder_body_builder():

                    def sub_decoder_body(sub_time ,temp_holder_t):
                        sub_x_t = tf.reshape(sub_initial_.read(sub_time) , [1,-1])
                        sub_i_t = sub_input.read(sub_time)
                        sub_res = tf.matmul(sub_x_t, sub_i_t)
                        temp_holder_t.write(sub_time, sub_x_t)

                        return(sub_time+1, temp_holder_t )
                    return sub_decoder_body




                we_project = tf.tanh(tf.matmul( initial_state , middle_matrix ))
                sub_initial_ = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True ,tensor_array_name='sub_initial')
                sub_input = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, tensor_array_name = 'sub_input')
                temp_holder = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True , tensor_array_name = 'temp_holder'  )

                sub_initial_ = sub_initial_.unpack(we_project)
                sub_input = sub_input.unpack(attention_input_mod)
                sub_time = tf.constant(0)

                sub_loop_vars = [sub_time, temp_holder]

                _, temp_holder = tf.while_loop(sub_decoder_cond,
                                               sub_decoder_body_builder(),
                                               sub_loop_vars,
                                               swap_memory=swap)




                alpha_time = temp_holder.pack()
                # temp_holder.close()
                alpha = tf.to_float(mask(attention_lengths)) * alpha_time
                alpha_softmax = alpha
                # alpha_softmax = tf.nn.softmax(alpha)
                z = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(old_state, W_z_h) + b_z) # update gate
                r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(old_state, W_r_h) + b_r) # reset gate
                c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*old_state, W_c_h) + b_c) # proposed new state
                new_state = (1-z)*c + z*old_state # new state

                # writing output
                output_ta_t = output_ta_t.write(time+1, new_state)
                attention_tracker = attention_tracker.write(time, alpha_softmax)
                # context = tf.reduce_sum(tf.expand_dims(alpha_softmax, 2) * attention_input, [1])


                return (time + 1, new_state, output_ta_t, attention_tracker)
            return decoder_body


        output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
        attention_tracker = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
        time = tf.constant(0)
        loop_vars = [time, initial_state, output_ta, attention_tracker]

        _, state, output_ta, valid_attention_tracker = tf.while_loop(decoder_cond,
                                               decoder_body_builder(),
                                               loop_vars,
                                               swap_memory=swap)

        # _, valid_state, valid_output_ta, valid_attention_tracker = tf.while_loop(decoder_cond,
        #                                                 decoder_body_builder(feedback=True),
        #                                                 loop_vars,
        #                                                 swap_memory=swap)

        dec_out = tf.transpose(output_ta.pack(), perm=[1, 0, 2])
        # valid_dec_out = tf.transpose(valid_output_ta.pack(), perm=[1, 0, 2])
        valid_attention_tracker = tf.transpose(valid_attention_tracker.pack(), perm=[1, 0, 2])

        # return dec_out, valid_dec_out, valid_attention_tracker

        return dec_out,  valid_attention_tracker

from tensorflow-tutorial.

alrojo avatar alrojo commented on September 26, 2024

Hi, TensorFlow supports this type of behaviour now in the seq2seq section. However, to keep the tutorial a learning experience we will keep the old way of doing it, as it gives the learner an intuition about how to build custom encoders and decoders from scratch in TensorFlow.

from tensorflow-tutorial.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.