Comments (5)
every sequence of LSTM output is a 2D, and the context vector is 1D. The product of them is 1D. The context vector is trained to assign weight to the 2D so that you can think of it as a weighted vector, such that ideally, it will give more weight to the important token.
from textclassifier.
Hi, thanks for your answer.
However I'm afraid I already understood this concept - my issue is with the tanh activation. In the paper, it is performed on the dense layer before the multiplication with the context vector. In your implementation, it is performed on the dot product of these vectors.
According to the code, we actually stack two linear operations on the output of the GRU layer - first the Dense layer, and then the dot multiplication with self.W, without non-linearity in between. Theoretically, this could be converted with a single linear layer (as explained here).
Again, maybe I miss something, will be glad for an explanation :)
from textclassifier.
Which equation are you referring to? The tanh activation at my code refers to equation (5) and (8). h_it is from GRU output.
from textclassifier.
I'll try to be as rigorous as possible:
(194) l_lstm_sent = Bidirectional(GRU(100, return_sequences=True))(review_encoder)
(195) l_dense_sent = TimeDistributed(Dense(200))(l_lstm_sent)
(196) l_att_sent = AttLayer()(l_dense_sent)
These are lines 194-196 in the code, referring to the upper hierarchy layer.
(5) u_it = tanh(W_w * h_it + b_w)
(6) a_it = exp(u_it * u_w) / sigma(exp(u_it * u_w))
And these are equations 5, 6 from the paper. The case is the same for lines 187-189 in the code and for equations 8-10, however I'll demonstrate only on these parts.
As you've said, h_it is the GRU output. In line 195, it is being passed through a Dense layer, therefore implementing the W_w * h_it + b_w
part. My question refers to the next step.
According to the code, this output is now passed through the Attention layer. Note we do not have any activation in line 195, so we proceed only with the inner linear part of equation 5, rather than with u_it. More specifically, the next operation takes place in the call() method of the layer:
(174) eij = K.tanh(K.dot(x, self.W))
(175)
(176) ai = K.exp(eij)
(177) weights = ai/K.sum(ai, axis=1).dimshuffle(0,'x')
x being the input of the layer, or literally W_w * h_it + b_w
. The next thing we know in the inner parantheses of line 174 is (W_w * h_it + b_w) * u_w
, where u_w == self.W is the context vector. However this product happens, according to the paper, only in equation 6. We have skipped the tanh operation.
Only then, by line 174 in the code, we apply the tanh on the product. Note that this product is directly inserted to the exp in equation 6, without any non-linearities in between.
To my understanding, this is a different procedure than the one practiced in the paper. I may be wrong, or possibly this somehow leads to similar behavior, but I'd just like to hear why :)
Thanks!
from textclassifier.
Ha, you found a HUGE bug in my code that I didn't realize. I'm quite sure you are the first one to point out even someone asked why I use time distributed dense function (depricated).
The bug is I placed the tanh in the wrong place and wrong order. The TimeDistributed(Dense(200))(l_lstm_sent) is intended to do a one layer MLP, and as you said, there should be a tanh activation function before the dot product. The solution is either
- (195) l_dense_sent = TimeDistributed(Dense(200, activation='tanh'))(l_lstm_sent)
eij = K.dot(x, self.W) (by removing tanh)
or - l_dense_sent = TimeDistributed(Dense(200))(l_lstm_sent) is the same
then eij = K.dot(K.tanh(x), self.W) (by changing order)
It has been so long that I have to reread the paper to bring backs the memory. I hope I didn't make mistakes again. Let me know :)
from textclassifier.
Related Issues (20)
- Non-exact Implementation of CNN Sentence Classifier HOT 1
- Tokenization performed with validation data (HATT) HOT 1
- y_permute_dim.pop(-2) pop index out of range HOT 5
- mask zero and activation in HATT HOT 6
- some problem about data preprocess HOT 5
- save model error HOT 3
- Not able to train HAN because of the following error. HOT 8
- AttributeError: 'DataFrame' object has no attribute 'review' HOT 3
- the run problem in textClassifierHATT
- Capturing attention weights and seeing which words contributed to the classification HOT 1
- Dimensions must be equal, but are 15 and 100 for 'att_layer_2/mul' (op: 'Mul') with input shapes: [?,15], [?,15,100]. HOT 12
- Int to long
- ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (40261, 2, 2) HOT 1
- The performance is worse
- ValueError HOT 1
- Incorporating this model into tensorflow project HOT 3
- Performance on Yelp 2015 (HAN)
- ValueError: Dimensions must be equal, but are 15 and 100 for '{{node Equal}} = Equal[T=DT_BOOL, incompatible_shape_error=true](mask, SequenceMask/Less)' with input shapes: [?,15,100], [?,100,?]. HOT 1
- Getting error on TimeDistributed()
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from textclassifier.