Code Monkey home page Code Monkey logo

tf-rnn-attention's Introduction

Tensorflow implementation of attention mechanism for text classification tasks.
Inspired by "Hierarchical Attention Networks for Document Classification", Zichao Yang et al. (http://www.aclweb.org/anthology/N16-1174).

Requirements

  • Python >= 2.6
  • Tensorflow >= 1.0
  • Keras (IMDB dataset)
  • tqdm

To view visualization example visit http://htmlpreview.github.io/?https://github.com/ilivans/tf-rnn-attention/blob/master/visualization.html

My bachelor's thesis on sentiment classification of Russian texts using Bi-RNN with attention mechanism: https://github.com/ilivans/attention-sentiment

tf-rnn-attention's People

Contributors

hudsonhuang avatar ilivans avatar nicolay-r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tf-rnn-attention's Issues

Is there any code for visualizing attention picture

Thanks for your code about training a attention model~
But I didn't find a visualizing code which can product the picture in your README.md

Is it possible to share the code by which we can visualize the attention weight?

sentence and word ecnoder.

I can only see the word level encoder. Am I missing something? There should be two Bi-GRU layers right?

Regarding softmax used for attention implementation.

Thank you very much for sharing your code.

When calculating attention scores, you take softmax over the whole sequence. In case the sequence is of variable length, the softmax takes into account the scores (which are zeros) over the padded tokens also. Although eventually multiplying their attention scores with zeros from hidden states masks their contribution, the softmax implementation is still not correct as it can diminish the contribution (alphas) of non-padded time-steps.

Regarding this something like sparse softmax (??) could help.

Thank You,

what is the meaning of attention_size?

In attention part, the attention_size is a Hyperparameter, when we calculate the alpha, the shape of alpha is not about of attention_size. So, the attention_size is to do what?
image

Masking out padded positions for attention

Hi ilivans,

Great implementation! I was just wondering if the padded sequences don't have to be masked out somehow for the attention layer. I imagine that the logits in rnn_outputs have values up to max_len, so they include the padded positions (which the Bilstm has ignored) and then this is used for the calculations in attention. Do we need to set their attention scores to zero or something like that, or is the network able to learn that the padded positions should not be taken into account?

Thanks in advance.

About padding sentences

Hi Ilivans,

In your implementation, you padded all training set to a fixed length: 250

  • X_train = zero_pad(X_train, SEQUENCE_LENGTH)
  • batch_ph = tf.placeholder(tf.int32, [None, SEQUENCE_LENGTH])
    But I want to calculate this factor for each mini-batch, that's why I have to declare batch_ph of shape [None, None]. So in the attention.py file: sequence_length = inputs_shape[1].value will be "None" then the error will be appear in the command: exps = tf.reshape(tf.exp(vu), [-1, sequence_length]):
    "Failed to convert object of type <class 'list'> to Tensor. Contents: [-1, None]. Consider casting elements to a supported type."
    Do you have any idea for this problem?
    Thank you in advance!

Question for accuracy metric in toy_example.py

I found the codes of in toy_example.py likes these:
accuracy = 1. - tf.reduce_mean(tf.cast(tf.equal(tf.round(y_hat), target_ph), tf.float32))
I am confused. Why don't you use the codes "accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(y_hat), target_ph), tf.float32))" ? Is it a bug ?

Finding actual length of sequence

Why is there a +1 at line 114: seq_len = np.array([list(x).index(0) + 1 for x in x_batch]) # actual lengths of sequences and line 129 in train.py? Won't list(x).index(0) without the +1 be enough?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.