Code Monkey home page Code Monkey logo

attention-networks-for-classification's People

Contributors

sandeep42 avatar zeweichu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attention-networks-for-classification's Issues

single lstm for all sentences

According to my understanding, the LSTM trained on different sentence should be different, but according to your model, each sentence has the same LSTM parameters for wordAttnRNN?

Init hidden state for the 2nd sentence onward

Hi,

Thanks for sharing your implementation. This helps me a lot.

I just wonder the way you initialize the hidden state for the question second question onward. Precisely, in the "def train_data(mini_batch, targets, word_attn_model, sent_attn_model, word_optimizer, sent_optimizer, criterion):" function (in the "attention_model_validation_experiments" notebook), you currently use a loop over the sentence: "_s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)". That means, both "the forward and backward states of the last word in the sentence i" are used for initializing the forward and backward states of sentence i+1. I can understand the case for forward state as the two sentence are consecutive, but the backward state initialization seems not very reasonable.

Can you please explain this in more detail? Thanks.

Can not run the script

I got the RuntimeError: input must have 3 dimensions, got 2 in y_pred, state_sent, _ = sent_attn_model(s, state_sent), It seems the dimensions of the s is 2, but the sent_attn_model need 3.

Dimensionalities of word minibatch and Embedding layer don't match

I wonder whether there is an error due to what Pytorch is expecting as input to the nn.Embedding module.

In the function train_data(), it's written:

 for i in xrange(max_sents):
        _s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)

In this way, after the .transpose(0,1), the resulting mini_batch matrix has size (max_tokens, batch_size).

However, the first function to be called in the forward() is the self.lookup(embed), which is expecting a (batch_size, list_of_indeces).

Currently, the lookup function is (wrongly!?) extracting first all the word embeddings for the beggining words of each sentence in the minibatch. Then, all the word embeddings for the second words and so on.
To be fixed, it just needs to be without the .transpose(0,1).

If this is correct, it requires to fix up all the following code.

Sentence model bug when GRU are not bidirectional

A small bug does not allow to train the network when Bidirectional is set to False.

In "model.py", on line 135, Bidirectional should be set to "False": Line 135

From:
self.sent_gru = nn.GRU(word_gru_hidden, sent_gru_hidden, bidirectional= True)
to:
self.sent_gru = nn.GRU(word_gru_hidden, sent_gru_hidden, bidirectional= False)

single data prediction

I'm new in python. In your model,hidden state depend on the batch_size.when i want a single data prediction i've got problems.could you help me or give me some code?

An example for save this model

thanks for your code. i've got an 40% acc in my work.but when i attempt to save this model, i'm confused. plz help me and give me an example for save this model.
thank you!

RNN mask issue

Hi, author. I don't find the way of RNN mask. How did you do in you code.

Performance comparison to baseline models

I was wondering whether you could add accuracy numbers to the notebook/README rather than just the loss numbers. Also, do you know the accuracy/loss values for a baseline network? Does the HAN outperform?

Thanks!

transpose?

Why do you need transpose here
_s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word)

and here:
torch.from_numpy(main_matrix).transpose(0,1) in def pad_batch

Thanks :)

how could I run this on Python 3

RuntimeError Traceback (most recent call last)
in ()
----> 1 loss_full= train_early_stopping(64, X_train, y_train, X_test, y_test, word_attn, sent_attn, word_optmizer, sent_optimizer, criterion, 5000, 1000, 50)

in train_early_stopping(mini_batch_size, X_train, y_train, X_test, y_test, word_attn_model, sent_attn_model, word_attn_optimiser, sent_attn_optimiser, loss_criterion, num_epoch, print_val_loss_every, print_loss_every)
13 try:
14 tokens, labels = next(g)
---> 15 loss = train_data(tokens, labels, word_attn_model, sent_attn_model, word_attn_optimiser, sent_attn_optimiser, loss_criterion)
16 acc = test_accuracy_mini_batch(tokens, labels, word_attn_model, sent_attn_model)
17 accuracy_full.append(acc)

in train_data(mini_batch, targets, word_attn_model, sent_attn_model, word_optimizer, sent_optimizer, criterion)
12 else:
13 s = torch.cat((s,_s),0)
---> 14 y_pred, state_sent, _ = sent_attn_model(s, state_sent)
15 loss = criterion(y_pred, targets)
16 loss.backward()

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)

in forward(self, word_attention_vectors, state_sent)
36 print(state_sent.size())
37
---> 38 output_sent, state_sent = self.sent_gru(word_attention_vectors, state_sent)
39 sent_squish = batch_matmul_bias(output_sent, self.weight_W_sent,self.bias_sent, nonlinearity='tanh')
40 sent_attn = batch_matmul(sent_squish, self.weight_proj_sent)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\rnn.py in forward(self, input, hx)
160 flat_weight=flat_weight
161 )
--> 162 output, hidden = func(input, self.all_weights, hx)
163 if is_packed:
164 output = PackedSequence(output, batch_sizes)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in forward(input, *fargs, **fkwargs)
349 else:
350 func = AutogradRNN(*args, **kwargs)
--> 351 return func(input, *fargs, **fkwargs)
352
353 return forward

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in forward(input, weight, hidden)
242 input = input.transpose(0, 1)
243
--> 244 nexth, output = func(input, hidden, weight)
245
246 if batch_first and batch_sizes is None:

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in forward(input, hidden, weight)
82 l = i * num_directions + j
83
---> 84 hy, output = inner(input, hidden[l], weight[l])
85 next_hidden.append(hy)
86 all_output.append(output)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in forward(input, hidden, weight)
111 steps = range(input.size(0) - 1, -1, -1) if reverse else range(input.size(0))
112 for i in steps:
--> 113 hidden = inner(input[i], hidden, *weight)
114 # hack to handle LSTM
115 output.append(hidden[0] if isinstance(hidden, tuple) else hidden)

C:\ProgramData\Anaconda3\lib\site-packages\torch\nn_functions\rnn.py in GRUCell(input, hidden, w_ih, w_hh, b_ih, b_hh)
54 gi = F.linear(input, w_ih, b_ih)
55 gh = F.linear(hidden, w_hh, b_hh)
---> 56 i_r, i_i, i_n = gi.chunk(3, 1)
57 h_r, h_i, h_n = gh.chunk(3, 1)
58

C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\variable.py in chunk(self, num_chunks, dim)
745
746 def chunk(self, num_chunks, dim=0):
--> 747 return Chunk.apply(self, num_chunks, dim)
748
749 def squeeze(self, dim=None):

C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd_functions\tensor.py in forward(ctx, i, num_chunks, dim)
540 def forward(ctx, i, num_chunks, dim=0):
541 ctx.dim = dim
--> 542 result = i.chunk(num_chunks, dim)
543 ctx.mark_shared_storage(*((i, chunk) for chunk in result))
544 return result

C:\ProgramData\Anaconda3\lib\site-packages\torch\tensor.py in chunk(self, n_chunks, dim)
172 See :func:torch.chunk.
173 """
--> 174 return torch.chunk(self, n_chunks, dim)
175
176 def matmul(self, other):

C:\ProgramData\Anaconda3\lib\site-packages\torch\functional.py in chunk(tensor, chunks, dim)
42 if dim < 0:
43 dim += tensor.dim()
---> 44 split_size = (tensor.size(dim) + chunks - 1) // chunks
45 return split(tensor, split_size, dim)
46

RuntimeError: invalid argument 2: dimension 1 out of range of 1D tensor at d:\projects\pytorch\torch\lib\th\generic/THTensor.c:24


Got this runtime Error when I try to run it on Python3. Could anyone help me with this?

Thanks a lot!

Loss can start as NaN

Any solution for this or any idea why (should be a division by zero somewhere)? Happens sometimes

imdb_final.json

could you share with me the imdb_final.json file you are using

Having 2 optimizers

Hi there! Thank you for making this implementation open-source!
I have one question though: Although you have one backward step, you have 2 optimizers. shouldn't you combine both model's parameters and use only one optimizer?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.