edgenetworks / attention-networks-for-classification Goto Github PK

View Code? Open in Web Editor NEW

595.0 595.0 135.0 239 KB

Hierarchical Attention Networks for Document Classification in PyTorch

Python 13.04% Jupyter Notebook 86.96%

deep-learning document-classification hierarchical-attention-networks lstm nlp pytorch

attention-networks-for-classification's People

Contributors

Stargazers

Watchers

Forkers

g-wang benjamesbabala alesuglia wlcoolongs s4sarath akankshamalhotra bountrisv zhhengcs konglongteng yinll314 ml-lab akbari59 anilsh adityachivu lynnucas lixin4ever generalzh timecracker codeaudit nininininini zeweichu caoxu915683474 jadielam fendaq machenfeng gybta muzaluisa viveksck jkhlot tianforks kwanegx klpek yuyuqi dsp6414 christinaliang waiteryee1 gsygsy96 wlhust halhenke ruizheliuoa queenie88 hatleon aidenhuen lberrada lingyugao afcarl wuyou-hic rahulomishra rishabhpatel4499 imsok113 lebron-kun aliendeep rogervaas atnlp nd1511 sysujayce manduner hunterhawk wendy81214 surbhardwaj shubhampachori12110095 chaoongithub liviuslw adrienguille datazwer chuanfanyoudong cshaowang haiyu94 keep-steady soonhwan-kwon leeshiyang easonla pandinosaurus learnerzhang jiaaoc harveyaot tbbaby catherinekzhou zhihaopan melonxi jasontlam suxuanyuan qinghuazhao leatingflower mindis chinasilva sbatururimi jackcheng8668 zhouzhou97 aiah swirlingcloud dennisshaw simba2017 richard-he baskaranangappan jtbai qfxlcyc juyongjiang carlomarxdk erichoang

attention-networks-for-classification's Issues

single lstm for all sentences

According to my understanding, the LSTM trained on different sentence should be different, but according to your model, each sentence has the same LSTM parameters for wordAttnRNN?

Init hidden state for the 2nd sentence onward

Hi,

Thanks for sharing your implementation. This helps me a lot.

I just wonder the way you initialize the hidden state for the question second question onward. Precisely, in the "def train_data(mini_batch, targets, word_attn_model, sent_attn_model, word_optimizer, sent_optimizer, criterion):" function (in the "attention_model_validation_experiments" notebook), you currently use a loop over the sentence: "_s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)". That means, both "the forward and backward states of the last word in the sentence i" are used for initializing the forward and backward states of sentence i+1. I can understand the case for forward state as the two sentence are consecutive, but the backward state initialization seems not very reasonable.

Can you please explain this in more detail? Thanks.

Can not run the script

I got the RuntimeError: input must have 3 dimensions, got 2 in y_pred, state_sent, _ = sent_attn_model(s, state_sent), It seems the dimensions of the s is 2, but the sent_attn_model need 3.

Dimensionalities of word minibatch and Embedding layer don't match

I wonder whether there is an error due to what Pytorch is expecting as input to the nn.Embedding module.

In the function train_data(), it's written:

 for i in xrange(max_sents):
        _s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)

In this way, after the .transpose(0,1), the resulting mini_batch matrix has size (max_tokens, batch_size).

However, the first function to be called in the forward() is the self.lookup(embed), which is expecting a (batch_size, list_of_indeces).

Currently, the lookup function is (wrongly!?) extracting first all the word embeddings for the beggining words of each sentence in the minibatch. Then, all the word embeddings for the second words and so on.
To be fixed, it just needs to be without the .transpose(0,1).

If this is correct, it requires to fix up all the following code.

你好可以把数据和文本预处理的代码发我一下么我这边访问不了

邮箱[email protected]

Sentence model bug when GRU are not bidirectional

A small bug does not allow to train the network when Bidirectional is set to False.

In "model.py", on line 135, Bidirectional should be set to "False": Line 135

From:
self.sent_gru = nn.GRU(word_gru_hidden, sent_gru_hidden, bidirectional= True)
to:
self.sent_gru = nn.GRU(word_gru_hidden, sent_gru_hidden, bidirectional= False)

single data prediction

I'm new in python. In your model,hidden state depend on the batch_size.when i want a single data prediction i've got problems.could you help me or give me some code?

An example for save this model

thanks for your code. i've got an 40% acc in my work.but when i attempt to save this model, i'm confused. plz help me and give me an example for save this model.
thank you!

RNN mask issue

Hi, author. I don't find the way of RNN mask. How did you do in you code.

Performance comparison to baseline models

I was wondering whether you could add accuracy numbers to the notebook/README rather than just the loss numbers. Also, do you know the accuracy/loss values for a baseline network? Does the HAN outperform?

Thanks!

the pad_batch function is error?

I feel that np.max should be used.However, here is np.mean

transpose?

Why do you need transpose here
_s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)

and here:
torch.from_numpy(main_matrix).transpose(0,1) in def pad_batch

Thanks :)

how could I run this on Python 3

RuntimeError Traceback (most recent call last)
in ()
----> 1 loss_full= train_early_stopping(64, X_train, y_train, X_test, y_test, word_attn, sent_attn, word_optmizer, sent_optimizer, criterion, 5000, 1000, 50)

in train_early_stopping(mini_batch_size, X_train, y_train, X_test, y_test, word_attn_model, sent_attn_model, word_attn_optimiser, sent_attn_optimiser, loss_criterion, num_epoch, print_val_loss_every, print_loss_every)
13 try:
14 tokens, labels = next(g)
---> 15 loss = train_data(tokens, labels, word_attn_model, sent_attn_model, word_attn_optimiser, sent_attn_optimiser, loss_criterion)
16 acc = test_accuracy_mini_batch(tokens, labels, word_attn_model, sent_attn_model)
17 accuracy_full.append(acc)

in train_data(mini_batch, targets, word_attn_model, sent_attn_model, word_optimizer, sent_optimizer, criterion)
12 else:
13 s = torch.cat((s,_s),0)
---> 14 y_pred, state_sent, _ = sent_attn_model(s, state_sent)
15 loss = criterion(y_pred, targets)
16 loss.backward()

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)

in forward(self, word_attention_vectors, state_sent)
36 print(state_sent.size())
37
---> 38 output_sent, state_sent = self.sent_gru(word_attention_vectors, state_sent)
39 sent_squish = batch_matmul_bias(output_sent, self.weight_W_sent,self.bias_sent, nonlinearity='tanh')
40 sent_attn = batch_matmul(sent_squish, self.weight_proj_sent)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\rnn.py in forward(self, input, hx)
160 flat_weight=flat_weight
161 )
--> 162 output, hidden = func(input, self.all_weights, hx)
163 if is_packed:
164 output = PackedSequence(output, batch_sizes)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in forward(input, *fargs, **fkwargs)
349 else:
350 func = AutogradRNN(*args, **kwargs)
--> 351 return func(input, *fargs, **fkwargs)
352
353 return forward

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in forward(input, weight, hidden)
242 input = input.transpose(0, 1)
243
--> 244 nexth, output = func(input, hidden, weight)
245
246 if batch_first and batch_sizes is None:

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in forward(input, hidden, weight)
82 l = i * num_directions + j
83
---> 84 hy, output = inner(input, hidden[l], weight[l])
85 next_hidden.append(hy)
86 all_output.append(output)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in forward(input, hidden, weight)
111 steps = range(input.size(0) - 1, -1, -1) if reverse else range(input.size(0))
112 for i in steps:
--> 113 hidden = inner(input[i], hidden, *weight)
114 # hack to handle LSTM
115 output.append(hidden[0] if isinstance(hidden, tuple) else hidden)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in GRUCell(input, hidden, w_ih, w_hh, b_ih, b_hh)
54 gi = F.linear(input, w_ih, b_ih)
55 gh = F.linear(hidden, w_hh, b_hh)
---> 56 i_r, i_i, i_n = gi.chunk(3, 1)
57 h_r, h_i, h_n = gh.chunk(3, 1)
58

C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\variable.py in chunk(self, num_chunks, dim)
745
746 def chunk(self, num_chunks, dim=0):
--> 747 return Chunk.apply(self, num_chunks, dim)
748
749 def squeeze(self, dim=None):

C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd_functions\tensor.py in forward(ctx, i, num_chunks, dim)
540 def forward(ctx, i, num_chunks, dim=0):
541 ctx.dim = dim
--> 542 result = i.chunk(num_chunks, dim)
543 ctx.mark_shared_storage(*((i, chunk) for chunk in result))
544 return result

C:\ProgramData\Anaconda3\lib\site-packages\torch\tensor.py in chunk(self, n_chunks, dim)
172 See :func:torch.chunk.
173 """
--> 174 return torch.chunk(self, n_chunks, dim)
175
176 def matmul(self, other):

C:\ProgramData\Anaconda3\lib\site-packages\torch\functional.py in chunk(tensor, chunks, dim)
42 if dim < 0:
43 dim += tensor.dim()
---> 44 split_size = (tensor.size(dim) + chunks - 1) // chunks
45 return split(tensor, split_size, dim)
46

RuntimeError: invalid argument 2: dimension 1 out of range of 1D tensor at d:\projects\pytorch\torch\lib\th\generic/THTensor.c:24

Got this runtime Error when I try to run it on Python3. Could anyone help me with this?

Thanks a lot!