Code Monkey home page Code Monkey logo

hierarchical-attention-networks's People

Contributors

pedrocardoso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hierarchical-attention-networks's Issues

One-layer MLP Possibly Missing

The attention layer works directly on the GRU embeddings (denoted by h_it in the HAN paper) in the call function of the AttentionLayer. In the paper description, h_it should be fed to a one-layer MLP with a tanh activation to obtain u_it by u_it = tanh(W.h_it + b). The attention weights are then computed on u_it. Is this happening in the code and I have missed it out, or has this been (intentionally) left out? Please clarify.

Code to Visualize Attention Weights

Need some help in writing the code to obtain and visualize the attention weights like that in the HAN paper (heat map). To obtain the attention weights, I'm currently thinking of obtaining the hidden representations of the GRUs (h_it) and then manually using h_it to compute the attention weights using the equations from the call function of the attention layer.

layer_name = 'GRU'
intermediate_layer_model = Model(input=model.input, output=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(input_variable)
h_it = intermediate_output
#use h_it from above to compute attention weights

If there is a more direct way (direct function call in Keras or some existing code available), it will be helpful.

IndexError problem

When I tried your implementation, I've got the following error.
(I use Tensorflow 0.12 version as backend).
Do you have any solution?

(tensorflow_0.12) C:\Users\admin\Google 드라이브\SRC_Code\HierAtt_TextClassification\hierarchical-attention-networks>python imdb_train.py
Using TensorFlow backend.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 1, 80)
X_test shape: (25000, 1, 80)
Build model...
Traceback (most recent call last):
File "imdb_train.py", line 31, in
model, modelAttEval = createHierarchicalAttentionModel(maxlen, embeddingSize = 200, vocabSize = max_features)
File "C:\Users\admin\Google 드라이브\SRC_Code\HierAtt_TextClassification\hierarchical-attention-networks\model.py", line 84, in createHierarchicalAttentionModel
attention = AttentionLayer()(wordRnn)
File "H:\Anaconda3\envs\tensorflow_0.12\lib\site-packages\keras\engine\topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "H:\Anaconda3\envs\tensorflow_0.12\lib\site-packages\keras\engine\topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "H:\Anaconda3\envs\tensorflow_0.12\lib\site-packages\keras\engine\topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "C:\Users\admin\Google 드라이브\SRC_Code\HierAtt_TextClassification\hierarchical-attention-networks\model.py", line 33, in call
multData = K.exp(K.dot(x, self.Uw))
File "H:\Anaconda3\envs\tensorflow_0.12\lib\site-packages\keras\backend\tensorflow_backend.py", line 819, in dot
y_permute_dim = [y_permute_dim.pop(-2)] + y_permute_dim
IndexError: pop index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.