ilivans / tf-rnn-attention Goto Github PK

View Code? Open in Web Editor NEW

745.0 31.0 290.0 1.98 MB

Tensorflow implementation of attention mechanism for text classification tasks.

License: MIT License

Python 73.63% HTML 26.37%

attention tensorflow rnn text-classification sentiment-analysis

tf-rnn-attention's People

Contributors

Stargazers

Watchers

Forkers

nieshaoshuai delphine0379 hellozjj ericxsun chagge laisun binbinbian sunjieee chao-jiang imutlab benjamesbabala vyraun kelvict computermomo zsh1993 ykwon0407 o-github-o happyphonon shamanez mrchristophrivera ownerwei xuerenlv robertyin-sa hades210 yuanzhike saquib-ali-khan scpei larab401 zcakzwa nlpformyself guokeda melody-xiaomi hypathia sanwushuosi sweaterr ivanvera amano-ginji ling19871017 xiaojingyi zhangjiulong aracthon lovingliferwj stevenlol hujunxianligong chaitanyacixlive phoebejinx r-wheeler rguo12 changfengfeng webblearning soumajyoti txye generalzh 460130107 xhero05 deoko newlcj93 yfzx iqbal-chowdhury zhaomeiqian hhwowen gsygsy96 kevinking huoliangyu afternoonzhou soon2soon kauttoj fendaq siltommcat bb-beta saradhix redpanda0314 gloriajuice nayname eternalfeather fakeryfx cooravi husihao satels zherongz hudsonhuang iris8beiny raiabhishek idoamihai hitzkrieg duytinvo superalexander a382695908 vino5211 codeaspoetry zsgchinese berryhn tfnlp clementviricel lzd0825 yyl8781697 db-li edcmartin superlyc thanhtd91

tf-rnn-attention's Issues

About padding sentences

Hi Ilivans,

In your implementation, you padded all training set to a fixed length: 250

X_train = zero_pad(X_train, SEQUENCE_LENGTH)
batch_ph = tf.placeholder(tf.int32, [None, SEQUENCE_LENGTH])
But I want to calculate this factor for each mini-batch, that's why I have to declare batch_ph of shape [None, None]. So in the attention.py file: sequence_length = inputs_shape[1].value will be "None" then the error will be appear in the command: exps = tf.reshape(tf.exp(vu), [-1, sequence_length]):
"Failed to convert object of type <class 'list'> to Tensor. Contents: [-1, None]. Consider casting elements to a supported type."
Do you have any idea for this problem?
Thank you in advance!

where is the previous hidden state of the decoder

which paper to cite?

Love this repo, helps me a lot! is there any paper of yours I can cite?

Getting negative loss for keras "reuters" dataset

I am trying out the code with imdb reuters example, which has 46 classes. I am getting highly negative loss on each iteration.

Can't see any graph/scalar in tensorboard

Regarding softmax used for attention implementation.

Thank you very much for sharing your code.

When calculating attention scores, you take softmax over the whole sequence. In case the sequence is of variable length, the softmax takes into account the scores (which are zeros) over the padded tokens also. Although eventually multiplying their attention scores with zeros from hidden states masks their contribution, the softmax implementation is still not correct as it can diminish the contribution (alphas) of non-padded time-steps.

Regarding this something like sparse softmax (??) could help.

Thank You,

Is there any code for visualizing attention picture

Thanks for your code about training a attention model~
But I didn't find a visualizing code which can product the picture in your README.md

Is it possible to share the code by which we can visualize the attention weight?

tensorflow version

Hi :
Which version of tensorflow is used?

Finding actual length of sequence

Why is there a +1 at line 114: seq_len = np.array([list(x).index(0) + 1 for x in x_batch]) # actual lengths of sequences and line 129 in train.py? Won't list(x).index(0) without the +1 be enough?

sentence and word ecnoder.

I can only see the word level encoder. Am I missing something? There should be two Bi-GRU layers right?

What the different between this project and the one below?

https://github.com/pemywei/attention-nmt/blob/master/seq2seq.py#L121
The link project use prev_state as one of the input parameter.
@ilivans Thank you!!!

what is the meaning of attention_size?

In attention part, the attention_size is a Hyperparameter, when we calculate the alpha, the shape of alpha is not about of attention_size. So, the attention_size is to do what?

Question for accuracy metric in toy_example.py

I found the codes of in toy_example.py likes these:
accuracy = 1. - tf.reduce_mean(tf.cast(tf.equal(tf.round(y_hat), target_ph), tf.float32))
I am confused. Why don't you use the codes "accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(y_hat), target_ph), tf.float32))" ? Is it a bug ?

Masking out padded positions for attention

Hi ilivans,

Great implementation! I was just wondering if the padded sequences don't have to be masked out somehow for the attention layer. I imagine that the logits in rnn_outputs have values up to max_len, so they include the padded positions (which the Bilstm has ignored) and then this is used for the calculations in attention. Do we need to set their attention scores to zero or something like that, or is the network able to learn that the padded positions should not be taken into account?

Thanks in advance.

how this attention is different from attention described by Bahdanau?

I was mentioning this paper