The keras-bucketed-sequence from tbennun

Using this in a Hierarchical Attention Network for NLP

Hi,
I am wondering is there a way to use this across multiple LSTMs fed into one model in a Hierarchical Attention Network style as implemented in this blog post? https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-HATN/

This currently works for just one LSTM layer by passing in shape=(None, word_vector_dimension) but how to make it work for both the word-level LSTM and the sentence-level LSTM? The Hierarchical Attention Network uses one LSTM at word level to encode features of words in a sentence using attention to determine which words in the sentence are important, then again another attention layer at the document level to determine which sentences are important out of all sentences in a document. I don't currently know how to get your code to work for both levels because when I try to use shape=(None, None) for the input of the "review" level (comparing multiple sentences in one document), I get

AsTensorError: ('Cannot convert (-1, None) to TensorType', <class 'tuple'>)

For reference here is my current code:

sentence_input= Input(shape=(None, 300))
l_lstm = Bidirectional(GRU(100, return_sequences=True))(sentence_input)
l_dense = TimeDistributed(Dense(200))(l_lstm)
l_att = AttLayer()(l_dense)
sentEncoder = Model(sentence_input, l_att)
 
#review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32')
review_input = Input(shape=(7,None), dtype='int32')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(GRU(100, return_sequences=True))(review_encoder)
l_dense_sent = TimeDistributed(Dense(200))(l_lstm_sent)
l_att_sent = AttLayer()(l_dense_sent)
preds = Dense(2, activation='softmax')(l_att_sent)
model = Model(review_input, preds)

Making Predictions

With some slight modifications in your code I got bucketed-sequence to run. However, when I try to do model.predict, the number of arrays (length) in my prediction output is the number of words in my input string. Does this mean the model is training on stateful input (1 timestep at a time IIRC) , or does it mean that my input to the prediction is automatically being converted to multiple inputs to the model? I need my RNN to predict with a sequence of words, not each word by itself.

tbennun / keras-bucketed-sequence Goto Github PK

keras-bucketed-sequence's People

Contributors

Stargazers

Watchers

Forkers

keras-bucketed-sequence's Issues

Using this in a Hierarchical Attention Network for NLP

Making Predictions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent