ilivans / tf-rnn-attention Goto Github PK
View Code? Open in Web Editor NEWTensorflow implementation of attention mechanism for text classification tasks.
License: MIT License
Tensorflow implementation of attention mechanism for text classification tasks.
License: MIT License
Hi Ilivans,
In your implementation, you padded all training set to a fixed length: 250
Love this repo, helps me a lot! is there any paper of yours I can cite?
I am trying out the code with imdb reuters example, which has 46 classes. I am getting highly negative loss on each iteration.
Thank you very much for sharing your code.
When calculating attention scores, you take softmax over the whole sequence. In case the sequence is of variable length, the softmax takes into account the scores (which are zeros) over the padded tokens also. Although eventually multiplying their attention scores with zeros from hidden states masks their contribution, the softmax implementation is still not correct as it can diminish the contribution (alphas) of non-padded time-steps.
Regarding this something like sparse softmax (??) could help.
Thank You,
Thanks for your code about training a attention model~
But I didn't find a visualizing code which can product the picture in your README.md
Is it possible to share the code by which we can visualize the attention weight?
Hi :
Which version of tensorflow is used?
Why is there a +1
at line 114: seq_len = np.array([list(x).index(0) + 1 for x in x_batch]) # actual lengths of sequences
and line 129 in train.py
? Won't list(x).index(0)
without the +1
be enough?
I can only see the word level encoder. Am I missing something? There should be two Bi-GRU layers right?
https://github.com/pemywei/attention-nmt/blob/master/seq2seq.py#L121
The link project use prev_state as one of the input parameter.
@ilivans Thank you!!!
I found the codes of in toy_example.py likes these:
accuracy = 1. - tf.reduce_mean(tf.cast(tf.equal(tf.round(y_hat), target_ph), tf.float32))
I am confused. Why don't you use the codes "accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(y_hat), target_ph), tf.float32))" ? Is it a bug ?
Hi ilivans,
Great implementation! I was just wondering if the padded sequences don't have to be masked out somehow for the attention layer. I imagine that the logits in rnn_outputs have values up to max_len, so they include the padded positions (which the Bilstm has ignored) and then this is used for the calculations in attention. Do we need to set their attention scores to zero or something like that, or is the network able to learn that the padded positions should not be taken into account?
Thanks in advance.
I was mentioning this paper
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.