Code Monkey home page Code Monkey logo

practical-pytorch's Introduction

These tutorials have been merged into the official PyTorch tutorials. Please go there for better maintained versions of these tutorials compatible with newer versions of PyTorch.


Practical Pytorch

Learn PyTorch with project-based tutorials. These tutorials demonstrate modern techniques with readable code and use regular data from the internet.

Tutorials

Series 1: RNNs for NLP

Applying recurrent neural networks to natural language tasks, from classification to generation.

Series 2: RNNs for timeseries data

  • WIP Predicting discrete events with an RNN

Get Started

The quickest way to run these on a fresh Linux or Mac machine is to install Anaconda:

curl -LO https://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh
bash Anaconda3-4.3.0-Linux-x86_64.sh

Then install PyTorch:

conda install pytorch -c soumith

Then clone this repo and start Jupyter Notebook:

git clone http://github.com/spro/practical-pytorch
cd practical-pytorch
jupyter notebook

Recommended Reading

PyTorch basics

Recurrent Neural Networks

Machine translation

Attention models

Other RNN uses

Other PyTorch tutorials

Feedback

If you have ideas or find mistakes please leave a note.

practical-pytorch's People

Contributors

alisonswu avatar ariddell avatar jemgold avatar notpratheek avatar spro avatar tejaslodaya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

practical-pytorch's Issues

unicode problem.

i have get the unicode problem:
TypeError: normalize() argument 2 must be unicode, not str
and i have solved this problem as this:
in the data.py

-- coding: utf-8 --

from io import open
open(filename,encoding='utf-8')
in the train.py
add
from future import unicode_literals, print_function, division

A bug in seq2seq translation

When we run the code testing the models, It raises the error :

RuntimeError: Expected argument self to have 1 dimension(s), but has 2 at /Users/soumith/miniconda2/conda-bld/pytorch_1502000696751/work/torch/csrc/generic/TensorMethods.cpp:23020

Details are show below:

EncoderRNN (
  (embedding): Embedding(10, 10)
  (gru): GRU(10, 10, num_layers=2)
)
AttnDecoderRNN (
  (embedding): Embedding(10, 10)
  (gru): GRU(20, 10, num_layers=2, dropout=0.1)
  (out): Linear (20 -> 10)
  (attn): Attn (
    (attn): Linear (10 -> 10)
  )
)
---------------------------------------------------------
RuntimeError            Traceback (most recent call last)
<ipython-input-14-7c49add1a901> in <module>()
     22 
     23 for i in range(3):
---> 24     decoder_output, decoder_context, decoder_hidden, decoder_attn = decoder_test.forward(word_inputs[i], decoder_context, decoder_hidden, encoder_outputs)
     25     print(decoder_output.size(), decoder_hidden.size(), decoder_attn.size())
     26     decoder_attns[0, i] = decoder_attn.squeeze(0).cpu().data

<ipython-input-13-1e8710146be2> in forward(self, word_input, last_context, last_hidden, encoder_outputs)
     30 
     31         # Calculate attention from current RNN state and all encoder outputs; apply to encoder outputs
---> 32         attn_weights = self.attn(rnn_output.squeeze(0), encoder_outputs)
     33         context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x 1 x N
     34 

~/anaconda/envs/pytorch_nmt3.5/lib/python3.5/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

<ipython-input-12-f800dd294bc2> in forward(self, hidden, encoder_outputs)
     22         # Calculate energies for each encoder output
     23         for i in range(seq_len):
---> 24             attn_energies[i] = self.score(hidden, encoder_outputs[i])
     25 
     26         # Normalize energies to weights in range 0 to 1, resize to 1 x 1 x seq_len

<ipython-input-12-f800dd294bc2> in score(self, hidden, encoder_output)
     35         elif self.method == 'general':
     36             energy = self.attn(encoder_output)
---> 37             energy = hidden.dot(energy)
     38             return energy
     39 

~/anaconda/envs/pytorch_nmt3.5/lib/python3.5/site-packages/torch/autograd/variable.py in dot(self, other)
    629 
    630     def dot(self, other):
--> 631         return Dot.apply(self, other)
    632 
    633     def _addcop(self, op, args, inplace):

~/anaconda/envs/pytorch_nmt3.5/lib/python3.5/site-packages/torch/autograd/_functions/blas.py in forward(ctx, vector1, vector2)
    209         ctx.save_for_backward(vector1, vector2)
    210         ctx.sizes = (vector1.size(), vector2.size())
--> 211         return vector1.new((vector1.dot(vector2),))
    212 
    213     @staticmethod

RuntimeError: Expected argument self to have 1 dimension(s), but has 2 at /Users/soumith/miniconda2/conda-bld/pytorch_1502000696751/work/torch/csrc/generic/TensorMethods.cpp:23020

Why RELU is used in the input of Decoder?

Hello, thank you for your codes and notes.

I have a question about "Practical PyTorch: Translation with a Sequence to Sequence Network and Attention", where the "simple decoder" uses a RELU after input embedding. Why did you add an activation function between input embedding and GRU? Is it a trick? Thank you

weights update for "char-rnn-classification"

In the "char-ran-classification" tutorial, the weights are updated by the following code.

  for p in rnn.parameters():
    p.data.add_(-learning_rate, p.grad.data)

I was trying to use optimizer.step() to update weights as the following:

optimizer = optim.SGD(rnn.parameters(), lr=0.005)
def train(category_tensor, line_tensor):

  hidden = rnn.initHidden()
  optimizer.zero_grad()
  for i in range(line_tensor.size()[0]):
    output, hidden = rnn(line_tensor[i], hidden)  
  loss.backward()
  optimizer.step()
return output, loss.data[0]

However, the results are very bad. most of the results are predicted wrong. I'm new to pytorch, and can any one explain what's the difference between these two methods?

Thanks

==================Results with optimizer.step()======================

5000  5% (0m12s) 2.9460 Jigalev / Portuguese โœ— (Russian)
10000  10% (0m24s) 2.9493 Gil / Portuguese โœ— (Korean)
15000  15% (0m36s) 2.9261 Ha / Spanish โœ— (Korean)
20000  20% (0m48s) 2.9530 Rog / English โœ— (Polish)
25000  25% (1m1s) 2.7749 Ki / Japanese โœ“
30000  30% (1m13s) 2.8230 Messer / English โœ— (German)
35000  35% (1m25s) 2.9910 Paszek / English โœ— (Polish)
40000  40% (1m38s) 2.9733 Banh / English โœ— (Vietnamese)
45000  45% (1m50s) 2.7955 Serafim / Portuguese โœ“
50000  50% (2m2s) 2.9550 Teng / English โœ— (Chinese)
55000  55% (2m15s) 2.7941 Kan / Portuguese โœ— (Chinese)
60000  60% (2m27s) 2.9335 Victors / Scottish โœ— (French)
65000  65% (2m39s) 2.7903 Lobo / Portuguese โœ“
70000  70% (2m51s) 2.7830 Soga / Japanese โœ“
75000  75% (3m3s) 2.9486 Hong / English โœ— (Chinese)
80000  80% (3m16s) 2.9172 Brisimitzakis / Portuguese โœ— (Greek)
85000  85% (3m27s) 2.9579 Linville / Japanese โœ— (French)
90000  90% (3m40s) 2.9035 Kerr / Japanese โœ— (Scottish)
95000  95% (3m52s) 2.8205 Tsen / Portuguese โœ— (Chinese)
100000  100% (4m4s) 2.8934 Kenmotsu / Greek โœ— (Japanese)

==================Results using the method in the tutorial======================

5000  5% (0m12s) 2.0740 Trapani / Italian โœ“
10000  10% (0m26s) 2.1635 Rzehak / Czech โœ“
15000  15% (0m39s) 2.7131 Bishara / Japanese โœ— (Arabic)
20000  20% (0m53s) 0.8122 Villamov / Russian โœ“
25000  25% (1m7s) 2.0739 Mercier / German โœ— (French)
30000  30% (1m20s) 0.8251 Isozaki / Japanese โœ“
35000  35% (1m33s) 2.4339 Cumming / Italian โœ— (English)
40000  40% (1m46s) 0.1408 Mckenzie / Scottish โœ“
45000  45% (1m59s) 0.9425 Menendez / Spanish โœ“
50000  50% (2m13s) 0.0060 Haritopoulos / Greek โœ“
55000  55% (2m26s) 0.6606 Zientek / Polish โœ“
60000  60% (2m40s) 2.2221 Desjardins / Greek โœ— (French)
65000  65% (2m53s) 1.3020 Seaghdha / Irish โœ“
70000  70% (3m6s) 2.5087 Meier / French โœ— (Czech)
75000  75% (3m19s) 0.8131 Dasios / Greek โœ“
80000  80% (3m31s) 0.2416 Poggi / Italian โœ“
85000  85% (3m44s) 0.5777 Kim / Korean โœ“
90000  90% (3m57s) 2.6680 See  / Dutch โœ— (Chinese)
95000  95% (4m10s) 2.0200 Weisener / German โœ— (Czech)
100000  100% (4m23s) 1.5813 O'Gorman / French โœ— (Irish)

I have a puzzle about how to use cuda correctly

Thank you for your tutorial, It's very helpful to me.
however, I have a puzzle about how to use cuda correctly.

In my local environment, I confirmed torch.cuda.is_available() will return True, but I run example of "Translation with a Sequence to Sequence Network and Attention", it seems mainly use CPU to run.

I use top command, find CPU usage is 100%, and use nvidia-smi see GPU usage below 15%.

No definition for max_length in attention

Hi @spro,

I don't know if this is expected, but you don't define the attention size based on max_length of words. Can you elaborate more about this?

Also, this example do not train using batch, how we do it?

Thank you

Official pytorch tutorials

Hi,

I've added three of your tutorials (albeit with slight modifications) to official pytorch tutorials at http://pytorch.org/tutorials/. There are rendered much better there. For example, this is how your char-nn classification looks like.

Would you mind contributing to them? Here's the repo: https://github.com/pytorch/tutorials

It'll be great for the community if all the tutorials are in one place. I see that you added a new Shakespeare char rnn tutorial. I can add that to pytorch.org/tutorials for you. But in the future, can you please make updates/changes to pytorch/tutorials?

Thanks,
Sasank.

Doubling input size by two in decoder

Hi @spro,

I still don't get the point why you double the input size by two when initiating attention decoder, RNN layer, and output

self.attn = nn.Linear(self.hidden_size * 2, hidden_size)
self.gru = nn.GRU(hidden_size * 2, hidden_size, n_layers, dropout=dropout_p)
self.out = nn.Linear(hidden_size * 2, output_size)

Can you please elaborate more about this? Thank you!

question for the seq2seq

In the seq2seq example, the input sentence is fed into the encoder one word by one word using a for loop:

for ei in range(input_length):
    encoder_output, encoder_hidden = encoder(input_variable[ei], encoder_hidden)
    encoder_outputs[ei] = encoder_output[0][0]

This is different with other seq2seq examples like this:
https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/Models.py#L141
,where the sentence is fed into the encoder only once.

My question is, I am curious are these two equivalent? Thanks!

What is Batch Size in RNN?

Hello I just find interesting thing in rnn name classification tutorial.

To make a word we join a bunch of those into a 2D matrix <line_length x 1 x n_letters>.
That extra 1 dimension is because PyTorch assumes everything is in batches - we're just using a batch size of 1 here.
-print(line_to_tensor('Jones').size())
--torch.Size([5, 1, 57])

what is batch size in here and what is the effect? Is it to make faster processing since we consider for example batch size is 2 so there will be 2 data being processed in single run? Is there any order, for example in HMM(Hidden Markov Model) we have high order which means we process more than 1 previous input. Is it possible do that?, or the batch size actually tell the order?
-Thank you-

Running the code without attention mechanism

I am trying to change the decoder from AttnDecoderRNN to DecoderRNN and have the program work after this change. I adjust the input arguments when creating the model. However, I get the following error that occurs while training the model in train() function.

The call that causes the error:

decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_output, encoder_outputs)

The error:

in __call__
result = self.forward(*input, **kwargs)
TypeError: forward() takes exactly 3 arguments (5 given) 

Cuda transfer

Hi,
thanks for the tuutorial, Could you let me know how can i process the "seq2seq-translation" code on the GPU. I tried putting objects -"encoder1" and "attn_decoder1" using ".cuda()" with no success.
TypeError: torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

Scheduled sampling in batched seq2seq

Hi, in the batched_seq2seq example, you have mentioned using Scheduled Sampling, however in the code there is no implementation of that. I am confused in how to use it in a batched setting. In case of non-batched mode, when there is no teacher forcing, we did:

topv, topi = decoder_output.data.topk(1)
ni = topi[0][0]
decoder_input = Variable(torch.LongTensor([[ni]]))

But for batched mode, the decoder input should be a full sequence of target sequences as given in the teacher forcing part:

decoder_input = target_variables[di]

How do I proceed to supply the previous output when my decoder itself is not batched?

hello, I have encountered some new questions about Attention mechanism

First, thanks again for your tutorial code.

I read the paper "NEURAL MACHINE TRANSLATIONBY JOINTLY LEARNING TO ALIGN AND TRANSLATE". The description of the attention model is different from your code.

In your code, usedecoder pre hidden and target input as inputs to calculate the attn weights.
However, in this paper, it seems use decoder pre hidden and encoder hidden state to calculate the weights(see Eq.(6)).
I'm not sure my understanding is correct.

Could you please tell me why?
Thank!

Error running char-rnn-generation

File "/Users/ave/anaconda/lib/python3.6/site-packages/torch/nn/functional.py", line 492, in nll_loss raise ValueError('Expected 2 or 4 dimensions (got {})'.format(dim)) ValueError: Expected 2 or 4 dimensions (got 1)

This is the error I get when running the char-rnn-generation train.py script. I was able to fix it by changing line 49 from:

loss += criterion(output.view(-1), target[c])

to:

loss += criterion(torch.unsqueeze(output.view(-1), 0), target[c])

I am running python 3.6 and torch 0.1.11+b39a2f2 installed from source. Not sure if it's an isolated issue or not, but figured I'd post what worked for me in case it's a problem for anyone else.

Unicode Problem

Hi I am really new in pytorch. First run this step I got this error. Anybody have suggestion how can I fix this?
-Thank you-
image

Attention Visualisation

The diagrams present for attention visualisation don't show the network has learned very well. For eg : in the first figure, the output word "she" has the most attention to the input word "cinq", which is not correct. And in general, the learning seems to be very bad based on the attention visualisations. On the other hand, the training graph shows that the loss is very low (around 0.6); so the network must have learnt well.

I am confused by these conflicting understanding. Could you please help explain it.

let us know when you are ready

this is great!

I went through the first tutorial quickly and it's nicely done.
Let me know when this repo is in a good shape, and I'll publicize it a bit more for visibility.

Edit: decoder --> encoder

Not sure this is actually an issue. It is only a typo. But it also exists here:
http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

In Training --> Defining a training iteration --> Next the decoder is given the last hidden state of the decoder as its first hidden state, and the token as its first input.

Shouldn't it become the following?

Next the decoder is given the last hidden state of the encoder as its first hidden state, and the token as its first input.

Batched seq2seq evaluation issue

In seq2seq-translation-batched.ipynb, in evaluate function you have

# Create starting vectors for decoder
    decoder_input = Variable(torch.LongTensor([SOS_token]), volatile=True) # SOS
    decoder_hidden = encoder_hidden

But shouldn't the decoder hidden use last or forward hidden state from encoder as in training?

decoder_hidden = encoder_hidden[:decoder.n_layers]

I had tensor mismatch issue in this step and after fixing this it works.

Seq2seq multiple input features

Is there a way to pass extra feature along with the existing word tokens as input and feed it to the encoder RNN?

Lets consider the NMT problem , say I have 2 more feature columns for the corresponding source vocabulary( Feature1 here ). For example, consider this below:

Feature1 Feature2 Feature3
word1 x a
word2 y b
word3 y c
.
.

Would be great of anyone tells how to practically implement/get this done in pytorch. Thanks in advance.

Best way to save the model in "RNN Classification"

Hi guys I am a newbie in pytorch. I find that pytorch has simple way to save our model. When practicing tutorial in RNN Classification, I found a problem to save the model. To save the model I do a simple way by execute torch.save(rnn,'char-rnn-classification.pt') , then as in the predict.py files, I load the model by rnn = torch.load('char-rnn-classification.pt'). This mechanism should be save the entire model from network until the weights. However when I execute it, it successfully save the model file but when I predicting the input in testing phase I got this error. Anybody know how to save the model correctly?

python predict.py Satoshi

Traceback (most recent call last):
  File "predict.py", line 32, in <module>
    predict(sys.argv[1])
  File "predict.py", line 17, in predict
    output = evaluate(Variable(lineToTensor(line)))
  File "predict.py", line 12, in evaluate
    output, hidden = rnn(line_tensor[i], hidden)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/mspl/ext1/Desktop/Andi/pytorch/practical-pytorch-master/char-rnn-classification/model.py", line 17, in forward
    hidden = self.i2h(combined)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/linear.py", line 54, in forward
    return self._backend.Linear()(input, self.weight, self.bias)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 10, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: size mismatch, m1: [1 x 186], m2: [185 x 128] at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:1237

caculating word vector distance or similarity

Hello,

Thank you for the excellent practice codes.

For the closest function in glove tutorial,
I suggest that we can use inner product as a similarity between two vectors like this:

def closest2(d, n=10):
    all_dists = [(w, torch.mul(d, get_word(w)).sum()) for w in wv_dict]
    sorted_dists = sorted(all_dists, reverse=True, key=lambda t: t[1])[:n]
    return sorted_dists

It is slightly faster than calculating the distance between two vectors.

Regards,

Adding comments in between the codes

Is it a good idea to add some helpful comments on top of functions/codes so that it will be useful for a newbie like me. And, if I find a place where is it necessary to add short comments, shall I push those changes to you so that you can review/modify to it?

Why reuse GRU in seq2seq with attention?

It seems the same GRU module is reused for all layers in both encoder and decoder. What is the design choice behind that? Wouldn't it be more reasonable to use GRU with parameters num_layers (i.e. separate parameters for each layer) or one GRUCell per layer?

Embedding layer dimensionality from seq2seq-batch example

The docs say that the embedding layer has the following input/output dimensionality:
Input: LongTensor (N, W), N = mini-batch, W = number of indices to extract per mini-batch
Output: (N, W, embedding_dim)

Yet, the tutorial gives as input for the embedding layer with the dimensionality W,N, and gets the dimensionality of the output as W,N, embedding_dim. If I understand this correctly (which I might not be), the order of the dimensions differs between your seq2seq batches example and the docs. If so, should you be transposing the matrices before you the inputs go through the embedding layer?

Needed to manually tqdm

FWIW, I needed to install tqdm in order to work with glove-word-vectors.ipynb.
Is tqdm normally installed by Anaconda?
-Ian

jupiter notebook error

I'm getting this strange error running the reinforce-gridworld notebook.
But not another machine with almost the same conda installation run without problem!
Any guess?

Thank you
Marco


IndexError Traceback (most recent call last)
in ()
1 env = Environment()
----> 2 env.reset()
3 print(env.visible_state)
4
5 done = False

in reset(self)
13 self.t = 0
14 self.history = []
---> 15 self.record_step()
16
17 return self.visible_state

in record_step(self)
20 """Add the current state to history for display later"""
21 grid = np.array(self.grid.grid)
---> 22 grid[self.agent.pos] = self.agent.health * 0.5 # Agent marker faded by health
23 visible = np.array(self.grid.visible(self.agent.pos))
24 self.history.append((grid, visible, self.agent.health))

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Translation error in Translation with a Sequence to Sequence Network and Attention tutorial

I hope this doesn't feel too overzealous, I thought you might want to know :)

In the "Loading data file" section: "I am cold" should translate to "J'ai froid.", and not "Je suis froid.". You would only use "Je suis froid" when referring to the fact that you are a cold person, not when referring to the temperature, which I believe was your intention. It is one of the few cases where you would use "to have" instead of "to be" when talking about how you feel. This is actually an interesting case that the network should be able to disambiguate given some context.

char-nn-classification.pt file

Hi there,
I was just wandering around with the code trying to install and run it. I just reached the running part and got error " No such file or directory: 'char-rnn-classification.pt' " trying to run your first example "python predict.py Hinton "
I'm not sure if it fits here maybe I'm missing something I just wanted to know how to overcome this?!

[Attention model structure]

Hello,

Thanks for the nice tutorial! I have a question on the structure of the attention encoder you plot in the tutorial.

In the tutorial, the attn_weights is computed using two parts: 1) previous hidden states of the encoder, and 2) the embedding vector of input (a word in this case). Then the attn_weight is applied on the encoder outputs, which are a bunch of hidden states.

But according to the paper that proposes the attention model, the attn_weights are computed from 1) previous hidden states of the encoder, and 2) all the outputs of the decoder. Could you explain why you implement the model differently? Thanks!

problem in evaluation

In the evaluation function of seq2seq-translation-batched.ipynb, you have
decoder_hidden = encoder_hidden

Shouldn't it be the same as the train function where you have
decoder_hidden = encoder_hidden[:decoder.n_layers] # Use last (forward) hidden state from encoder ?

Error while code execution in default setting

I am getting the following error while executing the code in default setting.

`Traceback (most recent call last):
File "/home/sapto/Desktop/Translation/seq2seq_translation_tutorial.py", line 812, in
trainEpochs(encoder1, attn_decoder1, 75000, print_every=5000)
File "/home/sapto/Desktop/Translation/seq2seq_translation_tutorial.py", line 678, in trainEpochs
decoder, encoder_optimizer, decoder_optimizer, criterion)
File "/home/sapto/Desktop/Translation/seq2seq_translation_tutorial.py", line 584, in train
input_variable[ei], encoder_hidden)
File "/home/sapto/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/sapto/Desktop/Translation/seq2seq_translation_tutorial.py", line 355, in forward
output, hidden = self.lstm(output, hidden)
File "/home/sapto/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/sapto/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 91, in forward
output, hidden = func(input, self.all_weights, hx)
File "/home/sapto/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 327, in forward
return func(input, *fargs, **fkwargs)
File "/home/sapto/anaconda3/lib/python3.6/site-packages/torch/autograd/function.py", line 202, in _do_forward
flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
File "/home/sapto/anaconda3/lib/python3.6/site-packages/torch/autograd/function.py", line 224, in forward
result = self.forward_extended(*nested_tensors)
File "/home/sapto/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 269, in forward_extended
cudnn.rnn.forward(self, input, hx, weight, output, hy)
File "/home/sapto/anaconda3/lib/python3.6/site-packages/torch/backends/cudnn/rnn.py", line 195, in forward
hx, cx = hx
ValueError: not enough values to unpack (expected 2, got 1)

Process finished with exit code 1`

char-rnn-classification parameters update method confusion

the origin parameters update code

learning_rate = 0.005
def train(category_tensor, line_tensor):
    ...
    for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)
    ...

I tried optimizer.step()

why do not use the optimizer.step()?
I tried optimizer.step()

learning_rate = 0.005
optimizer = torch.optim.SGD(rnn.parameters(), lr=learning_rate)

def train(category_tensor, line_tensor):
    ...
    optimizer.step()
    ...

result

i get a different result, loss is big , and loss flow up and down, do not diminish.

is my code of optimizer wrong?

Questions about training and testing rnn-classification model

Thank you for creating these tutorials! I am new to both RNN and pytorch. I found your tutorials to be incredibly helpful for getting started with the topic.

I have a few high level questions about training and testing the classification model in the char-rnn-classification tutorial.

  1. The data size is unbalanced between different language categories. The smallest category is Vietnamese with 73 names, whereas the largest category is Russian with 9408 names. Your model handles the unbalanced data by sampling names with equal probability from each language category to train the RNN. My question is:
  • Are there other practical solution to handle unbalanced classes in training a RNN classifier?
  • Theoretically, would weighting the loss of each line tensor inversely to their category size help?
  1. The entire data has 20,074 names across all language categories. The RNN is trained with 100,000 names sampled with replacement from the data. Some names are used to train the RNN multiple times. My question is:
  • Is this done in propose the resample from the underrepresented language categories?
  • Would this cause model over-fitting?

Thanks!

PEP8 formatting

This will mostly involve changing variable names from camelCase to snake_case (someone on Reddit said it reminds them of Zope (is that a bad thing?))

Should use term "iteration" instead of "epoch" in code

I thought an epoch is a thorough run through every training example in the training set. A single update to the parameters (whether with grads from a batch or just one example) should be called an iteration.

For example, in char-rnn-classification.ipynb, it's actually retrieving one training example per time. However, the index of loop is named epoch which I thought is improper.

Error in Masked Cross Entropy

I have built a seq to seq code in batch mode. I am facing run time issues in training.

Code:
from masked_cross_entropy import * (downloaded from here https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/masked_cross_entropy.py )

Loss calculation and backpropagation

print(all_decoder_outputs.transpose(0, 1).contiguous().size() , target_batch.transpose(0, 1).contiguous().size())
loss = masked_cross_entropy(
    all_decoder_outputs.transpose(0, 1).contiguous(), # -> batch x seq
    target_batch.transpose(0, 1).contiguous(), # -> batch x seq
    target_batch_length
)
loss.backward()

The error is following : (128 is batch size , 6 is number of words in the sentences in the batch , 42005 is the vocabulary , 15 is the maximum length of words in a sentence allowed)

torch.Size([128, 6, 42005]) torch.Size([128, 15])

/home/ubuntu/masked_cross_entropy.py in masked_cross_entropy(logits, target, length)
41 target_flat = target.view(-1, 1)
42 # losses_flat: (batch * max_len, 1)
---> 43 losses_flat = -torch.gather(log_probs_flat, dim=1, index=target_flat)
44 # losses: (batch, max_len)
45 losses = losses_flat.view(*target.size())

/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/variable.py in gather(self, dim, index)
621
622 def gather(self, dim, index):
--> 623 return Gather(dim)(self, index)
624
625 def scatter(self, dim, index, source):

/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py in forward(self, input, index)
539 self.input_size = input.size()
540 self.save_for_backward(index)
--> 541 return input.gather(self.dim, index)
542
543 def backward(self, grad_output):

RuntimeError: Input tensor must have same size as output tensor apart from the specified dimension at /py/conda-bld/pytorch_1493681908901/work/torch/lib/THC/generic/THCTensorScatterGather.cu:29

Regarding exercises in Character-Level RNN

I was wondering where I can find the dataset for the exercises given in Classifying Names with Character-Level RNN.
For example:
Any word -> language
First name -> gender
Character name -> writer
Page title -> blog or subreddit

To complete this task, do I have to create my own dataset or is there any repo where I can download that dataset?

Why using multiple optimizers

Hi, thanks! Your seq2seq-translation tutorial is great!

I notice you're using two optimizers for encoder and decoder module individually.

# Initialize optimizers and criterion
encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate * decoder_learning_ratio)

I'm curious why you do so. Does it mean I should create optimizers as many as the number of sub-modules I've created in the model?
Since there is usually a single optimizer in Tensorflow seq2seq code, is there any consideration to separate the training process of these two parts?

element-wise assignment in attention weight computing might be slow

for b in range(this_batch_size): # Calculate energy for each encoder output for i in range(max_len): attn_energies[b, i] = self.score(hidden[:, b], encoder_outputs[i, b].unsqueeze(0))

A better way to handle it is just to use tensor manipulation.
H = hidden.repeat(max_len,1,1).transpose(0,1) encoder_outputs = encoder_outputs.transpose(0,1) # [B*T*H] attn_energies = self.score(H,encoder_outputs) # compute attention score
along with some modify in self.score implementation.
The whole implementation has been posted in my repo.
Thanks!

unicodeToAscii function seems unnecessary

I noticed two other errors posted regarding a similar issue, but my solution hasn't been listed yet so I felt it worthwhile to start a new thread. Originally I got the same error as @herleeyandi, but I solved it by removing the unicodeToAscii function altogether. The text in the language names files weren't in unicode, so I didn't see a need to convert them from unicode to Ascii. So the readLines function looks as follows:

def readLines(filename):
    lines = open(filename).read().strip().split('\n')
    return [line for line in lines]

#seq2seq-translation set up optimizer for encoder and decoder together rather than separately?

Hi there,

This is more a question rather than issue... I was writing the seq2seq model as a single class that combine the encoder and decoder class. Writing this way, I can simply set up an optimizer for the entire seq2seq class like this: optimizer = optim.Adam(seq2seq_model.parameters(), lr=learning_rate). Is it OK to do this? In the tutorial I see the optimizers are set up differently for the encoder and the decoder, in that the learning rates are different. Thanks in advance!

Correct use of "epoch" vs "iteration"

One epoch should be one pass over the entire dataset. In many of the tutorials, what is called an epoch is really an iteration over a single training example.

Bahdanau Decoder Implementation

Hi @spro,

Thanks for really great explanation of decoder, especially for Bahdanau decoder. But, i'm little bit confuse about code in init function of BahdanauAttnDecoderRNN class.

self.attn = GeneralAttn(hidden_size)

I can't find any class that define GeneralAttn. This is built-in class? Can you please elaborate for this? Thanks again!

Dimension error in default setting

Hi, I tried to run the code, but found a dimension error.
I changed output dimension in train function, and it worked well.
I modified output.view(-1) to output.view(1,-1).
The full error message is attached below.


ValueError Traceback (most recent call last)
in ()
15
16 for epoch in range(1, n_epochs + 1):
---> 17 loss = train(*random_training_set())
18 loss_avg += loss
19

in train(inp, target)
6 for c in range(chunk_len):
7 output, hidden = decoder(inp[c], hidden)
----> 8 loss += criterion(output.view(-1), target[c])
9
10 loss.backward()

/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
204
205 def call(self, *input, **kwargs):
--> 206 result = self.forward(*input, **kwargs)
207 for hook in self._forward_hooks.values():
208 hook_result = hook(self, input, result)

/anaconda/lib/python3.6/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
319 _assert_no_grad(target)
320 return F.cross_entropy(input, target,
--> 321 self.weight, self.size_average)
322
323

/anaconda/lib/python3.6/site-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average)
531 for each minibatch.
532 """
--> 533 return nll_loss(log_softmax(input), target, weight, size_average)
534
535

/anaconda/lib/python3.6/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average)
498 f = _functions.thnn.NLLLoss2d(size_average, weight=weight)
499 else:
--> 500 raise ValueError('Expected 2 or 4 dimensions (got {})'.format(dim))
501 return f(input, target)
502

ValueError: Expected 2 or 4 dimensions (got 1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.