ibm / pytorch-seq2seq Goto Github PK
View Code? Open in Web Editor NEWAn open source framework for seq2seq models in PyTorch.
Home Page: https://ibm.github.io/pytorch-seq2seq/public/index.html
License: Apache License 2.0
An open source framework for seq2seq models in PyTorch.
Home Page: https://ibm.github.io/pytorch-seq2seq/public/index.html
License: Apache License 2.0
I'm using pytorch-seq2seq for chatbot. I used two dataset ubuntu and twitter. I 've formatted the datasets, modified data path in "example.py" and tuned some hyper-parameters(e.g. hidden_size
batch_size
epoches
).
While I fail to get meaningful response after model finished training. When I typed in some sentences like hello
how are you
, it often gave me ['EOS'] or ['i', 'i', 'EOS']. Is there any suggestion to handle this issue?
To ensure contributors are following Google style guides, it is usually advised that the project includes a linter like flake8
and/or pylint
. Looks like there is no linter included in the requirements.txt
or anywhere else in the repo.
1.0.0:
Requested Addition:
I may not be setting TopKDecoder up right, but it throws a lot of exceptions involving mismatches between torch.FloatTensor and torch.cuda.FloatTensor. Looks like if the encoder data is in the GPU then TopKDecoder is unhappy.
1.0.0
Requested change:
It's unclear in the README what command to run the tests. Found it in the .travis.yml
file:
nosetests --with-coverage --cover-erase --cover-package=seq2seq
In addition, the requirements.txt does not include development packages like nose
. It would be helpful if the README covered how to install the required packages to run the tests.
Traceback (most recent call last):
File "examples/sample.py", line 88, in <module>
t.train(seq2seq, dataset, num_epochs=4, dev_data=dev_set, resume=opt.resume)
File "/root/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 160, in train
resume=resume, dev_data=dev_data, teacher_forcing_ratio=teacher_forcing_ratio)
File "/root/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 111, in _train_epoches
loss = self._train_batch(input_variables, target_variables, model, teacher_forcing_ratio)
File "/root/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 74, in _train_batch
return loss.get_loss()
File "/root/pytorch-seq2seq/seq2seq/loss/loss.py", line 140, in get_loss
nll = super(Perplexity, self).get_loss()
File "/root/pytorch-seq2seq/seq2seq/loss/loss.py", line 109, in get_loss
return self.acc_loss.data[0] / self.norm_term
ZeroDivisionError: float division by zero
Ran into this error when the training data contains unicode characters.
File "sample.py", line 53, in <module>
tgt_vocab=output_vocab)
File "build/bdist.linux-x86_64/egg/seq2seq/dataset/dataset.py", line 36, in __init__
File "build/bdist.linux-x86_64/egg/seq2seq/dataset/utils.py", line 50, in prepare_data
TypeError: coercing to Unicode: need string or buffer, NoneType found
Benchmarked the two implementations using WMT's newstest2013 from German to English. See training logs in the gist. Despite accuracy differences, pytorch-seq2seq is 10 times slower than OpenNMT.py.
Add integration test in addition to unit tests.
PyTorch has released 0.2 and there are a few changes of behaviors. Should update pytorch-seq2seq's dependency and code to make it compatible.
Looks like the tests create a vocab_pickle
file in the root directory. To adhere with consistency, the file should be created in the tests directory and should be deleted following the tests.
Write a generic plotting function for visualizing training parameters. A few helpful links that this could be based on
In the code of DecoderRNN, the parameter order of forward function is
def forward(self, inputs=None, encoder_hidden=None, function=F.log_softmax, encoder_outputs=None, teacher_forcing_ratio=0):
but in the document the order is
Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio
I am encountering a strange problem when I try to change the number of layers in the encoder: when I run this on a cpu, it runs without problems, when I call the exact same script on a gpu, however it gives me a dimensionality error. The only thing I changed is the call of the decoder in the sample.py script:
encoder = EncoderRNN(len(src.vocab), max_len, hidden_size, n_layers=2, bidirectional=bidirectional, variable_lengths=True)
Which results in the following error when forward is called:
File "/home/dhupkes/.local/lib/python2.7/site-packages/torch/backends/cudnn/rnn.py", line 266, in forward hidden_size, tuple(hx.size()))) RuntimeError: Expected hidden size (1, 32L, 256), got (2L, 32L, 256L)
I imagined that this would be due to what is passed to the decoder, but when I started to debug on a cpu I discovered to my surprise that the error was not raised there with the exact same script.
Anyone an idea what is going on?
Hi, I have seen your implement about copynet and I have a question. In your code, you directly concatenate vocab_prob with copy_prob, but there are some words , which are not oov words, in these two probability distributions. So if you return such a probability distribution, how do you calculate the NLLLoss in later step? Looking forward to your reply, thanks!
This paper discussed and evaluated several regularization and optimization methods and gave the ablations on each techniques. It'd be interesting to experiment some techniques on seq2seq.
https://arxiv.org/pdf/1708.02182.pdf
Very Nice Seq2Seq Implement :)
I'm using this module for Text Generation, I found that the training data order is exact same for every epoch. The source code is:
batch_iterator = torchtext.data.BucketIterator(
dataset=data, batch_size=self.batch_size,
sort=True, sort_key=lambda x: len(x.src),
device=device, repeat=False)
Maybe we should set sort=False
to shuffle the data and sort_within_batch=True
to sort the lengths array in decreasing order
Took me a while to track this down, but there is an error if you run the sample code with the git version of torchtext.
File "torch/nn/utils/rnn.py", line 79, in pack_padded_sequence
raise ValueError("lengths array has to be sorted in decreasing order")
The reason is this commit introduced a month ago in torchtext:
pytorch/text@a5049b9
This conflicts with this line in the supervised trainer:
Simply removing the negative sign fixes the issue, however this will break code if the pypi version of torchtext is used.
Few fixes:
Researches have shown that adversarial loss is more effective than MLE training, consider developing an adversarial trainer.
https://arxiv.org/abs/1704.06933
https://arxiv.org/abs/1703.04887
I sometimes notice that not using teacher forcing at all gives better results at inference time than using teacher forcing all the time. This paper provides evidence for this behavior and proposed scheduled sampling as a curriculum learning approach for training seq2seq.
Trainer logs "Time elapsed" in training process, which is redundant/configurable from the logging module.
As configuring an experiment becomes more complicated with more features, it would be easier to read experiment configurations from a file and build the experiment.
Currently, output of Topkdecoder is as below.
if i use
decoder_outputs, decoder_hidden, metadata = topkdecoder(...)
sequence = metadata['sequence']
Then, is this a proper way to get a j-th likely sequence of i-th batch?
j_th_likely_sequence_of_i_th_batch = []
for token in sequence:
# token: [batch size, beam size]
j_th_likely_sequence_of_i_th_batch.append(token[i-1, j-1])
The docstring of Topkdecoder is as below.
Outputs: decoder_outputs, decoder_hidden, ret_dict
- **decoder_outputs** (batch): batch-length list of tensors with size (max_length, hidden_size) containing the
outputs of the decoder.
- **decoder_hidden** (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden
state of the decoder.
- **ret_dict**: dictionary containing additional information as follows {*length* : list of integers
representing lengths of output sequences, *sequence* : list of sequences, where each sequence is a list of
predicted token IDs, *inputs* : target outputs if provided for decoding}.
of which last part is
# Do backtracking to return the optimal values
output, h_t, h_n, s, l, p = self._backtrack(stored_outputs, stored_hidden,
stored_predecessors, stored_emitted_symbols, stored_scores, b, h)
# Build return objects
decoder_outputs = [step[:, 0, :] for step in output]
decoder_hidden = h_n[:, :, 0, :]
metadata = {}
metadata['inputs'] = inputs
metadata['output'] = output
metadata['h_t'] = h_t
metadata['score'] = s
metadata['length'] = l
metadata['sequence'] = p
return decoder_outputs, decoder_hidden, metadata
and the _backtrack
returns
Returns:
output [(batch, k, vocab_size)] * sequence_length: A list of the output probabilities (p_n)
from the last layer of the RNN, for every n = [0, ... , seq_len - 1]
h_t [(batch, k, hidden_size)] * sequence_length: A list containing the output features (h_n)
from the last layer of the RNN, for every n = [0, ... , seq_len - 1]
h_n(batch, k, hidden_size): A Tensor containing the last hidden state for all top-k sequences.
score [batch, k]: A list containing the final scores for all top-k sequences
length [batch, k]: A list specifying the length of each sequence in the top-k candidates
p (batch, k, sequence_len): A Tensor containing predicted sequence
When resuming an experiment, trainer would load the latest checkpoint, so that sample.py doesn' have to do it.
Only a 2x speed up on a P100 Tesla vs a Intel i7 CPU
GPU:
Time elapsed: 4m 36s, Progress: 8%, Train Perplexity: 1.1057
CPU:
Time elapsed: 4m 1s, Progress: 3%, Train Perplexity: 1.1451
Running the on SimpleQuestion dataset.
https://github.com/IBM/pytorch-seq2seq/blob/master/seq2seq/util/checkpoint.py#L87 Tries to load the directory where the checkpoint was saved in resulting in the IO error IOError: [Errno 21] Is a directory:
Command:
python examples/sample.py --train_path $TRAIN_PATH --dev_path $DEV_PATH --resume --load_checkpoint 2017_07_10_19_26_58/
Error:
loading checkpoint...
Loading checkpoints from ./experiment/checkpoints/2017_07_10_19_26_58/
Traceback (most recent call last):
File "examples/sample.py", line 41, in <module>
checkpoint = Checkpoint.load(checkpoint_path)
File "/root/pytorch-seq2seq/seq2seq/util/checkpoint.py", line 87, in load
model = torch.load(path)
File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 227, in load
f = open(f, 'rb')
IOError: [Errno 21] Is a directory: './experiment/checkpoints/2017_07_10_19_26_58
```/
https://github.com/IBM/pytorch-seq2seq/blob/master/seq2seq/loss/loss.py#L108
If size_average=False
, shouldn't the get_loss return without dividing by the normalization term?
pytorch/text is now available from pypi, seq2seq should be compatible with the new version from pypi and install dependencies from there.
Current Travis script installs pytorch for python2, so it can't test with python3 because pytorch has different wheels for 2/3. Need to programmatically install pytorch given the version of python in .travis.yaml
.
Looks like the license in setup.py
is inconsistent with the LICENSE file.
Benchmark with WMT machine translation dataset so that the performance of the library can be evaluated and compared with other implementations.
As a developer
I want to see only logs using python's logging framework
So that we are following coding conventions
We currently have print statements in several places that have to be replaced by logs
Trainer uses the difference between the checkpoint step and current step to calculate average loss. But when the checkpoint is created at the multiple of the number of steps per epoch, that difference would be zero and thus results in the error below.
Traceback (most recent call last):
File "examples/sample.py", line 125, in <module>
resume=opt.resume)
File "build/bdist.linux-x86_64/egg/seq2seq/trainer/supervised_trainer.py", line 187, in train
File "build/bdist.linux-x86_64/egg/seq2seq/trainer/supervised_trainer.py", line 130, in _train_epoches
ZeroDivisionError: division by zero
Pointer network and the models presented in this paper are useful models for combinatorial problems, e.g. reversing a sequence.
pytorch-seq2seq/seq2seq/models/DecoderRNN.py
Lines 162 to 170 in aa27eda
Here Line 169 wants to slice the attn variable, but if you are using teacher_forcing but not attention, attn will be None, thus throwing out an error. Simply changing it to the following fixes this:
if attn:
step_attn = attn[:, di, :]
else:
step_attn = None
Unit tests needed for the SupervisedTrainer.
inputs
should have size (batch_size, seq_len, hidden_size)
since RNN is set to be batch first.encoder_hidden
should be (num_layers * num_direction, batch_size, hidden_size)
decoder_outputs
should be (seq_len, batch_size, vocab_size)
Evaluator does not use the evaluation mode for the model
http://pytorch.org/docs/master/nn.html#torch.nn.Module.eval
Without eval mode, dropout is enabled during inference.
Updated to the latest version 1.3.0, run examples/sample.py
and a bug appeared.
Below is the selected part of the log. Seems that a tensor transformation should be added
2017-09-05 21:00:28,942 seq2seq.trainer.supervised_trainer INFO Finished epoch 4: Train Perplexity: 5.5457, Dev Perplexity: 1.0269
Type in a source sequence:1 2 3
1 2 3
Traceback (most recent call last):
File "/home/Projects/pytorch-seq2seq/examples/sample.py", line 134, in <module>
print(predictor.predict(seq))
File "/home/Projects/pytorch-seq2seq/seq2seq/evaluator/predictor.py", line 38, in predict
softmax_list, _, other = self.model(src_id_seq, [len(src_seq)], decoder_kick)
File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/Projects/pytorch-seq2seq/seq2seq/models/seq2seq.py", line 48, in forward
teacher_forcing_ratio=teacher_forcing_ratio)
File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/Projects/pytorch-seq2seq/seq2seq/models/DecoderRNN.py", line 167, in forward
function=function)
File "/home/Projects/pytorch-seq2seq/seq2seq/models/DecoderRNN.py", line 92, in forward_step
embedded = self.embedding(input_var)
File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/modules/sparse.py", line 94, in forward
self.scale_grad_by_freq, self.sparse
File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/_functions/thnn/sparse.py", line 53, in forward
output = torch.index_select(weight, 0, indices.view(-1))
TypeError: torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)
Much like Pytorch released pytorch/vision to be a utility library for vision projects. There is active work on pytorch/text.
The current IBM repo reimplements some of the utilities provided in pytorch/text
in the seq2seq/dataset
directory. The design of the objects defined in seq2seq/dataset
does not align with pytorch/text
.
To future proof this repo, I think it will be important to either add pytorch/text
or to design seq2seq/dataset
similar to pytorch/text
.
It's important that when pytorch/text
is released to pip that it can be a drop in replacement for seq2seq/dataset
.
Faster loss function similar to OpenNMT memory efficient loss. Instead of looping row by row evaluating the loss batch times. We transformed the target and output from 2D and 3D to 1D and 2D. Evaluated the loss once for the entire batch.
# (seq len, batch size, dictionary size) -> (batch size * seq len, dictionary size)
outputs = outputs.view(-1, outputs.size(2))
# (seq len, batch size) -> (batch size * seq len)
targets = targets.view(-1)
self.criterion(outputs, targets)
The codebase contains a TopKDecoder
which can be used to do beam search while generating sentences. According to the docstring, the __init__
method takes as input a DecoderRNN
object but the code is accessing attributes like .lang
and .SOS_token_id
which are not present in the DecoderRNN
class.
Also my understanding is that the TopKDecoder
can be used to generate sentences after the DecoderRNN
has been trained. Is this understanding correct.
torch.load loads files with the extension ".pt" as seen here:
http://pytorch.org/docs/master/torch.html?highlight=torch%20save#torch.load
In Checkpoint, can you save files with the appropriate file extension?
Instead of updating the learning rate @ Optim.py. There are Pytorch schedulers that are supported.
http://pytorch.org/docs/master/optim.html#how-to-adjust-learning-rate
Using the ./example/sample.py, with optimizer part unmarked.
optimizer = Optimizer(torch.optim.Adam(seq2seq.parameters()), max_grad_norm=5)
scheduler = StepLR(optimizer.optimizer, 1)
optimizer.set_scheduler(scheduler)
First run for a while to collect checkpoints, then run with '--resume'. The error pops out as below:
python examples/sample.py --train_path $TRAIN_PATH --dev_path $DEV_PATH --resume
2017-11-05 14:54:53,118 root INFO Namespace(dev_path='data/toy_reverse/dev/data.txt', expt_dir='./experiment', load_checkpoint=None, log_level='info', resume=True, train_path='data/toy_reverse/train/data.txt')
Loading checkpoints from ~/pytorch-seq2seq-master/./experiment/checkpoints/2017_11_05_14_54_09
Traceback (most recent call last):
File "examples/sample.py", line 129, in <module>
resume=opt.resume)
File "~/miniconda3/envs/ape/lib/python3.6/site-packages/seq2seq-0.1.4-py3.6.egg/seq2seq/trainer/supervised_trainer.py", line 169, in train
TypeError: __init__() got an unexpected keyword argument 'initial_lr'
Revise the current dev guide to cover the following topics
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.