ibm / pytorch-seq2seq Goto Github PK

View Code? Open in Web Editor NEW

1.5K 59.0 374.0 8.17 MB

An open source framework for seq2seq models in PyTorch.

Home Page: https://ibm.github.io/pytorch-seq2seq/public/index.html

License: Apache License 2.0

Python 99.40% Shell 0.60%

pytorch seq2seq deeplearning

pytorch-seq2seq's Issues

fail to get meaningful response using pytorch-seq2seq for chatbot

I'm using pytorch-seq2seq for chatbot. I used two dataset ubuntu and twitter. I 've formatted the datasets, modified data path in "example.py" and tuned some hyper-parameters(e.g. hidden_size batch_size epoches).

While I fail to get meaningful response after model finished training. When I typed in some sentences like hello how are you, it often gave me ['EOS'] or ['i', 'i', 'EOS']. Is there any suggestion to handle this issue?

Linter

To ensure contributors are following Google style guides, it is usually advised that the project includes a linter like flake8 and/or pylint. Looks like there is no linter included in the requirements.txt or anywhere else in the repo.

Add dataset initialization with list of pairs

1.0.0:

Dataset initializes from a file that contains two lists of sequences.

Requested Addition:

Add a initialization method for lists data structure.

TopKDecoder may not be GPU-safe

I may not be setting TopKDecoder up right, but it throws a lot of exceptions involving mismatches between torch.FloatTensor and torch.cuda.FloatTensor. Looks like if the encoder data is in the GPU then TopKDecoder is unhappy.

Vocabulary files can be save in a Text format

1.0.0

Vocabularies saved as a pickled files.

Requested change:

Vocabularies are usually small in terms of file size
No performance gain via saving it as pickles.
If saved as text files, they are readable for users and can be used directly to initialize datasets.

README Add test instructions.

It's unclear in the README what command to run the tests. Found it in the .travis.yml file:
nosetests --with-coverage --cover-erase --cover-package=seq2seq

In addition, the requirements.txt does not include development packages like nose. It would be helpful if the README covered how to install the required packages to run the tests.

Float Division by Zero

Traceback (most recent call last):
  File "examples/sample.py", line 88, in <module>
    t.train(seq2seq, dataset, num_epochs=4, dev_data=dev_set, resume=opt.resume)
  File "/root/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 160, in train
    resume=resume, dev_data=dev_data, teacher_forcing_ratio=teacher_forcing_ratio)
  File "/root/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 111, in _train_epoches
    loss = self._train_batch(input_variables, target_variables, model, teacher_forcing_ratio)
  File "/root/pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py", line 74, in _train_batch
    return loss.get_loss()
  File "/root/pytorch-seq2seq/seq2seq/loss/loss.py", line 140, in get_loss
    nll = super(Perplexity, self).get_loss()
  File "/root/pytorch-seq2seq/seq2seq/loss/loss.py", line 109, in get_loss
    return self.acc_loss.data[0] / self.norm_term
ZeroDivisionError: float division by zero

Error when data contains unicode

Ran into this error when the training data contains unicode characters.

File "sample.py", line 53, in <module>
    tgt_vocab=output_vocab)
  File "build/bdist.linux-x86_64/egg/seq2seq/dataset/dataset.py", line 36, in __init__
  File "build/bdist.linux-x86_64/egg/seq2seq/dataset/utils.py", line 50, in prepare_data
TypeError: coercing to Unicode: need string or buffer, NoneType found

Add accuracy to evaluation metric

pytorch-seq2seq slower than OpenNMT-py

Benchmarked the two implementations using WMT's newstest2013 from German to English. See training logs in the gist. Despite accuracy differences, pytorch-seq2seq is 10 times slower than OpenNMT.py.

Integration Test

Add integration test in addition to unit tests.

Upgrade PyTorch to 0.2

PyTorch has released 0.2 and there are a few changes of behaviors. Should update pytorch-seq2seq's dependency and code to make it compatible.

Delete vocab_pickle

Looks like the tests create a vocab_pickle file in the root directory. To adhere with consistency, the file should be created in the tests directory and should be deleted following the tests.

have generic plotting functionality for loss and other metrics

Write a generic plotting function for visualizing training parameters. A few helpful links that this could be based on

Javascript app from Practical pytorch repo : https://github.com/spro/sconce-python
Tensorboard without tensorflow : https://github.com/torrvision/crayon or https://github.com/TeamHG-Memex/tensorboard_logger

The order of forward of DecoderRNN is not consist with the DOC

In the code of DecoderRNN, the parameter order of forward function is
def forward(self, inputs=None, encoder_hidden=None, function=F.log_softmax, encoder_outputs=None, teacher_forcing_ratio=0):
but in the document the order is
Inputs: inputs, encoder_hidden, encoder_outputs, function, teacher_forcing_ratio

Add support for Bidirectional RNN for the Encoder

dimensionality error on GPU for changed number of encoder layers

I am encountering a strange problem when I try to change the number of layers in the encoder: when I run this on a cpu, it runs without problems, when I call the exact same script on a gpu, however it gives me a dimensionality error. The only thing I changed is the call of the decoder in the sample.py script:

encoder = EncoderRNN(len(src.vocab), max_len, hidden_size, n_layers=2, bidirectional=bidirectional, variable_lengths=True)

Which results in the following error when forward is called:

File "/home/dhupkes/.local/lib/python2.7/site-packages/torch/backends/cudnn/rnn.py", line 266, in forward hidden_size, tuple(hx.size()))) RuntimeError: Expected hidden size (1, 32L, 256), got (2L, 32L, 256L)

I imagined that this would be due to what is passed to the decoder, but when I started to debug on a cpu I discovered to my surprise that the error was not raised there with the exact same script.

Anyone an idea what is going on?

Copy decoder in copy branch

Hi, I have seen your implement about copynet and I have a question. In your code, you directly concatenate vocab_prob with copy_prob, but there are some words , which are not oov words, in these two probability distributions. So if you return such a probability distribution, how do you calculate the NLLLoss in later step? Looking forward to your reply, thanks!

On regularizing and optimizing LSTM

This paper discussed and evaluated several regularization and optimization methods and gave the ablations on each techniques. It'd be interesting to experiment some techniques on seq2seq.
https://arxiv.org/pdf/1708.02182.pdf

Should shuffle data while training

Very Nice Seq2Seq Implement :)

I'm using this module for Text Generation, I found that the training data order is exact same for every epoch. The source code is:

batch_iterator = torchtext.data.BucketIterator(
            dataset=data, batch_size=self.batch_size,
            sort=True, sort_key=lambda x: len(x.src),
            device=device, repeat=False)

Maybe we should set sort=False to shuffle the data and sort_within_batch=True to sort the lengths array in decreasing order

ValueError: lengths array has to be sorted in decreasing order

Took me a while to track this down, but there is an error if you run the sample code with the git version of torchtext.

File "torch/nn/utils/rnn.py", line 79, in pack_padded_sequence
    raise ValueError("lengths array has to be sorted in decreasing order")

The reason is this commit introduced a month ago in torchtext:
pytorch/text@a5049b9

This conflicts with this line in the supervised trainer:

pytorch-seq2seq/seq2seq/trainer/supervised_trainer.py

Line 85 in 9e9fefb

sort_key=lambda x: -len(x.src),

Simply removing the negative sign fixes the issue, however this will break code if the pypi version of torchtext is used.

Few fixes:

Ask torchtext maintainers to revert this upstream change. See PR pytorch/text#95
Detect undesired sorting and reverse batch.
Detect version of torchtext and sort accordingly.
Add in option for sort direction into the supervised trainer.

Training with adversarial loss

Researches have shown that adversarial loss is more effective than MLE training, consider developing an adversarial trainer.

https://arxiv.org/abs/1704.06933
https://arxiv.org/abs/1703.04887

Scheduling Teacher Forcing Ratio as Curriculum Learning

I sometimes notice that not using teacher forcing at all gives better results at inference time than using teacher forcing all the time. This paper provides evidence for this behavior and proposed scheduled sampling as a curriculum learning approach for training seq2seq.

Remove "Time elapsed" now that logging module is used

Trainer logs "Time elapsed" in training process, which is redundant/configurable from the logging module.

Run experiments with text configuration

As configuring an experiment becomes more complicated with more features, it would be easier to read experiment configurations from a file and build the experiment.

Proper usage of output of TopKDecoder

Currently, output of Topkdecoder is as below.

if i use

decoder_outputs, decoder_hidden, metadata = topkdecoder(...)
sequence = metadata['sequence']

Then, is this a proper way to get a j-th likely sequence of i-th batch?

j_th_likely_sequence_of_i_th_batch = []
for token in sequence:
    # token: [batch size, beam size]
    j_th_likely_sequence_of_i_th_batch.append(token[i-1, j-1])

The docstring of Topkdecoder is as below.

Outputs: decoder_outputs, decoder_hidden, ret_dict
        - **decoder_outputs** (batch): batch-length list of tensors with size (max_length, hidden_size) containing the
          outputs of the decoder.
        - **decoder_hidden** (num_layers * num_directions, batch, hidden_size): tensor containing the last hidden
          state of the decoder.
        - **ret_dict**: dictionary containing additional information as follows {*length* : list of integers
          representing lengths of output sequences, *sequence* : list of sequences, where each sequence is a list of
          predicted token IDs, *inputs* : target outputs if provided for decoding}.

of which last part is

 # Do backtracking to return the optimal values
        output, h_t, h_n, s, l, p = self._backtrack(stored_outputs, stored_hidden,
                                                 stored_predecessors, stored_emitted_symbols, stored_scores, b, h)

        # Build return objects
        decoder_outputs = [step[:, 0, :] for step in output]
        decoder_hidden = h_n[:, :, 0, :]
        metadata = {}
        metadata['inputs'] = inputs
        metadata['output'] = output
        metadata['h_t'] = h_t
        metadata['score'] = s
        metadata['length'] = l
        metadata['sequence'] = p
        return decoder_outputs, decoder_hidden, metadata

and the _backtrack returns

 Returns:
            output [(batch, k, vocab_size)] * sequence_length: A list of the output probabilities (p_n)
            from the last layer of the RNN, for every n = [0, ... , seq_len - 1]
            h_t [(batch, k, hidden_size)] * sequence_length: A list containing the output features (h_n)
            from the last layer of the RNN, for every n = [0, ... , seq_len - 1]
            h_n(batch, k, hidden_size): A Tensor containing the last hidden state for all top-k sequences.
            score [batch, k]: A list containing the final scores for all top-k sequences
            length [batch, k]: A list specifying the length of each sequence in the top-k candidates
            p (batch, k, sequence_len): A Tensor containing predicted sequence

It's redundant to load the latest checkpoint in sample.py

When resuming an experiment, trainer would load the latest checkpoint, so that sample.py doesn' have to do it.

GPU Tesla P100 vs Intel i7 CPU. GPU is only 2x faster.

Only a 2x speed up on a P100 Tesla vs a Intel i7 CPU

GPU:
Time elapsed: 4m 36s, Progress: 8%, Train Perplexity: 1.1057

CPU:
Time elapsed: 4m 1s, Progress: 3%, Train Perplexity: 1.1451

Running the on SimpleQuestion dataset.

Resume checkpoints does not work.

https://github.com/IBM/pytorch-seq2seq/blob/master/seq2seq/util/checkpoint.py#L87 Tries to load the directory where the checkpoint was saved in resulting in the IO error IOError: [Errno 21] Is a directory:

Command:
python examples/sample.py --train_path $TRAIN_PATH --dev_path $DEV_PATH --resume --load_checkpoint 2017_07_10_19_26_58/

Error:

loading checkpoint...
Loading checkpoints from ./experiment/checkpoints/2017_07_10_19_26_58/
Traceback (most recent call last):
  File "examples/sample.py", line 41, in <module>
    checkpoint = Checkpoint.load(checkpoint_path)
  File "/root/pytorch-seq2seq/seq2seq/util/checkpoint.py", line 87, in load
    model = torch.load(path)
  File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 227, in load
    f = open(f, 'rb')
IOError: [Errno 21] Is a directory: './experiment/checkpoints/2017_07_10_19_26_58
```/

size_average=False

https://github.com/IBM/pytorch-seq2seq/blob/master/seq2seq/loss/loss.py#L108

If size_average=False, shouldn't the get_loss return without dividing by the normalization term?

Use new version of torchtext from pypi

pytorch/text is now available from pypi, seq2seq should be compatible with the new version from pypi and install dependencies from there.

Install different version of pytorch for python 2/3 in TravisCI

Current Travis script installs pytorch for python2, so it can't test with python3 because pytorch has different wheels for 2/3. Need to programmatically install pytorch given the version of python in .travis.yaml.

Update license in setup.py

Looks like the license in setup.py is inconsistent with the LICENSE file.

Benchmark with WMT machine translation

Benchmark with WMT machine translation dataset so that the performance of the library can be evaluated and compared with other implementations.

Change logging to use python's logging framework

As a developer
I want to see only logs using python's logging framework
So that we are following coding conventions

We currently have print statements in several places that have to be replaced by logs

Division by zero when resuming from multiple of epoch steps

Trainer uses the difference between the checkpoint step and current step to calculate average loss. But when the checkpoint is created at the multiple of the number of steps per epoch, that difference would be zero and thus results in the error below.

Traceback (most recent call last):
  File "examples/sample.py", line 125, in <module>
    resume=opt.resume)
  File "build/bdist.linux-x86_64/egg/seq2seq/trainer/supervised_trainer.py", line 187, in train
  File "build/bdist.linux-x86_64/egg/seq2seq/trainer/supervised_trainer.py", line 130, in _train_epoches
ZeroDivisionError: division by zero

Implement PtrNet and Order Invariant Models

Pointer network and the models presented in this paper are useful models for combinatorial problems, e.g. reversing a sequence.

step_attn is not subscriptable

pytorch-seq2seq/seq2seq/models/DecoderRNN.py

Lines 162 to 170 in aa27eda

    
           if use_teacher_forcing: 
        
               decoder_input = inputs[:, :-1] 
        
               decoder_output, decoder_hidden, attn = self.forward_step(decoder_input, decoder_hidden, encoder_outputs, 
        
                                                                        function=function) 
        
               for di in range(decoder_output.size(1)): 
        
                   step_output = decoder_output[:, di, :] 
        
                   step_attn = attn[:, di, :] 
        
                   decode(di, step_output, step_attn)

Here Line 169 wants to slice the attn variable, but if you are using teacher_forcing but not attention, attn will be None, thus throwing out an error. Simply changing it to the following fixes this:

                if attn:
                    step_attn = attn[:, di, :]
                else:
                    step_attn = None

Unit test for trainer module

Unit tests needed for the SupervisedTrainer.

Docs of dimensions of decoder RNN are wrong

inputs should have size (batch_size, seq_len, hidden_size) since RNN is set to be batch first.
encoder_hidden should be (num_layers * num_direction, batch_size, hidden_size)
decoder_outputs should be (seq_len, batch_size, vocab_size)

Model Eval

Evaluator does not use the evaluation mode for the model

http://pytorch.org/docs/master/nn.html#torch.nn.Module.eval

Without eval mode, dropout is enabled during inference.

new version bug report

Updated to the latest version 1.3.0, run examples/sample.py and a bug appeared.

Below is the selected part of the log. Seems that a tensor transformation should be added

2017-09-05 21:00:28,942 seq2seq.trainer.supervised_trainer INFO     Finished epoch 4: Train Perplexity: 5.5457, Dev Perplexity: 1.0269
Type in a source sequence:1 2 3
1 2 3
Traceback (most recent call last):
  File "/home/Projects/pytorch-seq2seq/examples/sample.py", line 134, in <module>
    print(predictor.predict(seq))
  File "/home/Projects/pytorch-seq2seq/seq2seq/evaluator/predictor.py", line 38, in predict
    softmax_list, _, other = self.model(src_id_seq, [len(src_seq)], decoder_kick)
  File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/Projects/pytorch-seq2seq/seq2seq/models/seq2seq.py", line 48, in forward
    teacher_forcing_ratio=teacher_forcing_ratio)
  File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/Projects/pytorch-seq2seq/seq2seq/models/DecoderRNN.py", line 167, in forward
    function=function)
  File "/home/Projects/pytorch-seq2seq/seq2seq/models/DecoderRNN.py", line 92, in forward_step
    embedded = self.embedding(input_var)
  File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/modules/sparse.py", line 94, in forward
    self.scale_grad_by_freq, self.sparse
  File "/home/Projects/pytorch-seq2seq/v-seq2seq-py2/lib/python2.7/site-packages/torch/nn/_functions/thnn/sparse.py", line 53, in forward
    output = torch.index_select(weight, 0, indices.view(-1))
TypeError: torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor), but expected (torch.cuda.FloatTensor source, int dim, torch.cuda.LongTensor index)

Future proof w/ torchtext

Much like Pytorch released pytorch/vision to be a utility library for vision projects. There is active work on pytorch/text.

The current IBM repo reimplements some of the utilities provided in pytorch/text in the seq2seq/dataset directory. The design of the objects defined in seq2seq/dataset does not align with pytorch/text.

To future proof this repo, I think it will be important to either add pytorch/text or to design seq2seq/dataset similar to pytorch/text.

It's important that when pytorch/text is released to pip that it can be a drop in replacement for seq2seq/dataset.

Faster loss functions based on #27

Faster loss function similar to OpenNMT memory efficient loss. Instead of looping row by row evaluating the loss batch times. We transformed the target and output from 2D and 3D to 1D and 2D. Evaluated the loss once for the entire batch.
# (seq len, batch size, dictionary size) -> (batch size * seq len, dictionary size)
outputs = outputs.view(-1, outputs.size(2))
# (seq len, batch size) -> (batch size * seq len)
targets = targets.view(-1)
self.criterion(outputs, targets)

Compatibility of TopKDecoder with DecoderRNN

The codebase contains a TopKDecoder which can be used to do beam search while generating sentences. According to the docstring, the __init__ method takes as input a DecoderRNN object but the code is accessing attributes like .lang and .SOS_token_id which are not present in the DecoderRNN class.

Also my understanding is that the TopKDecoder can be used to generate sentences after the DecoderRNN has been trained. Is this understanding correct.

optimizer = Optimizer(torch.optim.Adam(seq2seq.parameters()), max_grad_norm=5)
scheduler = StepLR(optimizer.optimizer, 1)
optimizer.set_scheduler(scheduler)

First run for a while to collect checkpoints, then run with '--resume'. The error pops out as below:

python examples/sample.py --train_path $TRAIN_PATH --dev_path $DEV_PATH --resume
2017-11-05 14:54:53,118 root         INFO     Namespace(dev_path='data/toy_reverse/dev/data.txt', expt_dir='./experiment', load_checkpoint=None, log_level='info', resume=True, train_path='data/toy_reverse/train/data.txt')
Loading checkpoints from ~/pytorch-seq2seq-master/./experiment/checkpoints/2017_11_05_14_54_09
Traceback (most recent call last):
  File "examples/sample.py", line 129, in <module>
    resume=opt.resume)
  File "~/miniconda3/envs/ape/lib/python3.6/site-packages/seq2seq-0.1.4-py3.6.egg/seq2seq/trainer/supervised_trainer.py", line 169, in train
TypeError: __init__() got an unexpected keyword argument 'initial_lr'

Improve Developer Guide

Revise the current dev guide to cover the following topics

Development cycle: Sprint and release cycle
Code style: Use linter and codacy for code style check
Checkin requirements: require code + test + documentation
Test: guide for running tests
Discussion: provide a place for general discussion (Options: slack or forum)

GPU Issues

When input is none, the default input variable should be moved to CUDA.
RNN parameters should be flattened when loading saved model for GPU.

	if use_teacher_forcing:
	decoder_input = inputs[:, :-1]
	decoder_output, decoder_hidden, attn = self.forward_step(decoder_input, decoder_hidden, encoder_outputs,
	function=function)

	for di in range(decoder_output.size(1)):
	step_output = decoder_output[:, di, :]
	step_attn = attn[:, di, :]
	decode(di, step_output, step_attn)

ibm / pytorch-seq2seq Goto Github PK

pytorch-seq2seq's Issues

Recommend Projects

Recommend Topics

Recommend Org