synthesio / hierarchical-attention-networks Goto Github PK

View Code? Open in Web Editor NEW

58.0 58.0 22.0 4 KB

Implementation of Hierarchical Attention Networks as presented in https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

hierarchical-attention-networks's People

Contributors

Stargazers

Watchers

hierarchical-attention-networks's Issues

One-layer MLP Possibly Missing

The attention layer works directly on the GRU embeddings (denoted by h_it in the HAN paper) in the call function of the AttentionLayer. In the paper description, h_it should be fed to a one-layer MLP with a tanh activation to obtain u_it by u_it = tanh(W.h_it + b). The attention weights are then computed on u_it. Is this happening in the code and I have missed it out, or has this been (intentionally) left out? Please clarify.

Code to Visualize Attention Weights

Need some help in writing the code to obtain and visualize the attention weights like that in the HAN paper (heat map). To obtain the attention weights, I'm currently thinking of obtaining the hidden representations of the GRUs (h_it) and then manually using h_it to compute the attention weights using the equations from the call function of the attention layer.

layer_name = 'GRU'
intermediate_layer_model = Model(input=model.input, output=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(input_variable)
h_it = intermediate_output
#use h_it from above to compute attention weights

If there is a more direct way (direct function call in Keras or some existing code available), it will be helpful.

IndexError problem

When I tried your implementation, I've got the following error.
(I use Tensorflow 0.12 version as backend).
Do you have any solution?

(tensorflow_0.12) C:\Users\admin\Google 드라이브\SRC_Code\HierAtt_TextClassification\hierarchical-attention-networks>python imdb_train.py
Using TensorFlow backend.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 1, 80)
X_test shape: (25000, 1, 80)
Build model...
Traceback (most recent call last):
File "imdb_train.py", line 31, in
model, modelAttEval = createHierarchicalAttentionModel(maxlen, embeddingSize = 200, vocabSize = max_features)
File "C:\Users\admin\Google 드라이브\SRC_Code\HierAtt_TextClassification\hierarchical-attention-networks\model.py", line 84, in createHierarchicalAttentionModel
attention = AttentionLayer()(wordRnn)
File "H:\Anaconda3\envs\tensorflow_0.12\lib\site-packages\keras\engine\topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "H:\Anaconda3\envs\tensorflow_0.12\lib\site-packages\keras\engine\topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "H:\Anaconda3\envs\tensorflow_0.12\lib\site-packages\keras\engine\topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "C:\Users\admin\Google 드라이브\SRC_Code\HierAtt_TextClassification\hierarchical-attention-networks\model.py", line 33, in call
multData = K.exp(K.dot(x, self.Uw))
File "H:\Anaconda3\envs\tensorflow_0.12\lib\site-packages\keras\backend\tensorflow_backend.py", line 819, in dot
y_permute_dim = [y_permute_dim.pop(-2)] + y_permute_dim
IndexError: pop index out of range

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.