timbmg / sentence-vae Goto Github PK

PyTorch Re-Implementation of "Generating Sentences from a Continuous Space" by Bowman et al 2015 https://arxiv.org/abs/1511.06349

Python 99.00% Shell 1.00%

pytorch nlp vae deep-learning neural-network generative-model ptb

sentence-vae's Introduction

Hi there 👋

🧑‍🔬 I am a PhD Student at the Ubiquitous Knowledge Processing Lab at TU Darmstadt. My research is on Natural Language Processing, focussing on scientific Question Answering.
💬 I am one of the main developers of UKP-SQuARE (github), a Question Answering platform that allows you to rapidly make your QA models available to the public and conduct qualitative analysis.
👨‍💻 I am interested in developing efficient, scalable and easy-to-use Machine Learning applications.

sentence-vae's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 hyzcn rohitkeshari stevenlol w4ngatang zhang-jian vikingmew ibulu abhyudaynj ml-ai-nlp-ir pbloem cnglen haonanli dongcin leechikara nick11roberts jx57 userxiname mengxuehu jinyeong zhaoyb210 dantodor sungjinlees dwang68 ramdhanoriya hitcszq ashutoshml dhanajitb memray hmxv2 ranjaniviyer jduan1 smutahoang ralph831005 lumosx jcaip loveddy owalnuto sean0719 yaolu zeinsh yanshuaicao caoxu915683474 imageslr preke paulzhangising b02902026 beethovenvirus bigheiniu rainyrainyguo jolinxql xiaochen0630 ykumards anurag1paul divya-pitta abcdefgs0324 sanatansharma spoorthybhat yuklam6 jadecastro wenxiuxiu roholazandie hamedmx louner xingboliu polceanum xh256 gentaiscool elderwanng brightgems anchit1704 craaaa yufeng98 serilee18 yaoxinzhi mtingzhi shunyooo renqincai 1871947993 diandiaye sfahad1414 pomcho555 huyvu0508 nahidalam ttxttx1111 rosssong anurag-anand71994 mokkemeguru riffsircar borispolonsky saurabhgarg1996 abishekjha the8floor maridia caoyuji1986 kaletap bojana-rankovic yicheng-w daniellaszlo qin-folks

sentence-vae's Issues

maybe a bug in the kl_anneal_function

in the function :

def kl_anneal_function(anneal_function, step, k, x0):

    if anneal_function == 'logistic':  

        return float(1/(1+np.exp(-k*(step-x0))))  

    elif anneal_function == 'linear':  

        return min(1, step/x0)

you will see k is a str.that is wrong and i change it into 1

Optimiser SGD leads to KL vanishing

Hi,

Thank you so much for your code, it helped me a lot.

I am wondering about optimiser, I think in the paper, they say they used SGD, but when I changed the optimiser to SGD in your code, I got the KL vanishing problem. It is okay if I use Adam, I don't know why this happens, I am wondering if you have insights about this, thank you so much!

bi-directional gru

I found a bug in the code, when I use bi-directional gru the dimensions don't correspond.

Traceback (most recent call last):
File "inference.py", line 80, in
main(args)
File "inference.py", line 44, in main
samples, z = model.inference(n=args.num_samples)
File "/home/bli/Binyun/Generation/Sentence-VAE/model.py", line 153, in inference
output, hidden = self.decoder_rnn(input_embedding, hidden)
File "/home/bli/.conda/envs/Xihe/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/bli/.conda/envs/Xihe/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 819, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/home/bli/.conda/envs/Xihe/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 229, in check_forward_args
self.check_hidden_size(hidden, expected_hidden_size)
File "/home/bli/.conda/envs/Xihe/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 223, in check_hidden_size
raise RuntimeError(msg.format(expected_hidden_size, list(hx.size())))
RuntimeError: Expected hidden size (2, 10, 256), got [1, 2, 10, 256]

The input of decoder seems not right

In your model.py,

        # decoder input
        input_embedding = self.word_dropout(input_embedding)
        packed_input = rnn_utils.pack_padded_sequence(input_embedding, sorted_lengths.data.tolist(), batch_first=True)

the input of decoder (i.e. input_embedding) is the same as the input of encoder, that seems not correct.
According to the cited paper,
the input of encoder is ['RNNs', 'work']
the input of decoder is ['<EOS>', 'RNNs', 'work']
the output of decoder is ['RNNs', 'work', '<EOS>']
So I think the input of decoder should has one token earlier than the input of encoder...

maybe a bug in the train.py

I got a bug in the following code:
target = target[:, :torch.max(length).data[0]].contiguous().view(-1)
and it fixed when I change code into:
target = target[:, :torch.max(length).data].contiguous().view(-1)

Confusion about the batch in model.py(I think batch should be the second dimension.)

I read the code very carefully.
I am confused about the 65th line of model.py.
I think the second dimension of "hidden" is batch, not the first one.
Even though the encoder have been set with "batch_first=True”, the output will be have "batch" in the first dimension, but the hidden state is not.
I have test this on my own computer.

Of course, the code can run without problems. I just feel confused since the two dimensions are mixed up in the code.
Is there any one can help me with it?

Zero-dimensional tensor

Hi,

When I run python train.py, I get the error at line 71: zero-dimensional tensor (at position 1) cannot be concatenated. Could you please help me debug this?

Regards
Ramsu

Change in ELBO during training

Hi,

Thanks for sharing the code, this is quite helpful! One thing I noticed while training the model - I find that the mean validation ELBo almost never improves (i.e. the validation ELBo after 1st epoch is the lowest among all 10 epochs of training). But training ELBo reduces smoothly (as expected). In such a situation, how do we choose the best model? Is it the one after 1st epoch (but might not be sufficiently trained) or its generally found that training for longer epochs helps SentenceVAE models so we pick the last checkpoint? Also, curious to know if others also faced similar issues while training or I'm missing something here?

Thanks,
Soumya

Question about the inference function

In your inference function you have

if self.bidirectional or self.num_layers > 1:
    # unflatten hidden state
     hidden = hidden.view(self.hidden_factor, batch_size, self.hidden_size)

hidden = hidden.unsqueeze(0)

Shouldn't the unsequeeze happen only in an else condition if bidirectional is false and number of layers is one? Otherwise if the model uses a bidirectional GRU or has num_layers > 1 you will have a 4d tensor: (1, hidden_factor, batch_size, hidden_size)

Pre-trained model?

Is there a pre-trained model somewhere? I'd like to play around with this and it would be nice to skip the training step if someone else has already done it.

Error due to invalid data types

Hi Tim,
I guess you forgot to add .cuda to model at line no. 40, This is causing TypeError while trying to execute the inference.py code.

use_gpu = torch.cuda.is_available()
    if use_gpu:
        model=model.cuda()
    model.eval()

I guess the above snippet will fix it. :)

inference

I trained the model with penn treebank dataset. When i run the inference.py, generated samples look the same. It always generate that "the company said that..." or "the company...". It is very rare that generated sentence contains words different from "the company". Am i missing something?

word dropout or word embedding dropout?

According to the papar, "We do this by randomly replacing some fraction of the conditioned-on word tokens with the generic unknown word token unk", but it seems that this implementation is dropping out word embedding.

Repeated content

I used your code and trained a model to generate new sentences. The problem is that there are so many repeated tokens in generated samples.

Any insight how to deal with this?

For example, token appears so many times.

https://pastebin.com/caxz43CQ

imputation

How to use this model to impute a sentence like ' I love _'? I do not undersatnd about this part.

Maybe a bug in constructing hidden states.

(1) hidden = self.latent2hidden(z)

(2) if self.bidirectional or self.num_layers > 1:
(3)     hidden = hidden.view(self.hidden_factor, batch_size, self.hidden_size)
(4) else:
(5)     hidden = hidden.unsqueeze(0)

I think line (3) should be

hidden = hidden.view(batch_size, self.hidden_factor, self.hidden_size).transpose(0, 1)

This snap of code appears in both forward and inference.

question on inputs to encoder and decoder

line 59 and line 95 in model.py are almost the same except the input to decoder is using embedding dropout. my question is shouldn't the input to decoder starting with 'eos'?

Wrong usage of Bidirectional GRU

while declaring the encoder cell, the resultant hidden size is multiplied by hidden_factor. I think Pytorch already accommodates this new size and returns the outputs accordingly. Therefore there is no need to apply this scaling by hidden_factor.

which version of Pytorch, why i take the project to run,there are many problem? such as this code"hidden = hidden[:,running_seqs]" wish you can give me some advise, thanks

Inference edge-case bug.

There is a bug which arises when len(running_seqs) == 1 and len(input_sequence.size()) == 0. This is fairly rare, something like 1/100 times when using inferencing. It occurs when there is just 1 sequence yet to be completed and it isn't at the max length for that sentence so it still tries to generate. The issue is with trying to index input_sequence = input_sequence[running_seqs] which gives the error "IndexError: too many indices for tensor of dimension 0." My fairly naive solution (still getting into working with tensors) is the following:

if len(running_seqs) > 0:
    if len(running_seqs) == 1 and len(input_sequence.size()) == 0:
        pass
    else:
        input_sequence = input_sequence[running_seqs]
        hidden = hidden[:, running_seqs]

Which simply doesn't try to index the input sequence as it is just a scalar. Then it needs to be handled differently at the top of the while loop too:

if len(running_seqs) == 1 and len(input_sequence.size()) == 0:
     input_sequence = input_sequence.unsqueeze(0)
         
input_sequence = input_sequence.unsqueeze(1)

To add the dimensionality it needs to match the shape required to run it through the network.

Word Dropout not working

Using --word_dropout causes an error.
For eg. python3 train.py --word_dropout 0.62 gives the following error:

Traceback (most recent call last):
  File "train.py", line 214, in <module>
    main(args)
  File "train.py", line 118, in main
    logp, mean, logv, z = model(batch['input'], batch['length'])
  File "/user1/anaconda3/envs/sentence-vae/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/user1/Sentence-VAE/model.py", line 90, in forward
    prob[(input_sequence.data - self.sos_idx) * (input_sequence.data - self.pad_idx) == 0] = 1
TypeError: Performing basic indexing on a tensor and encountered an error indexing dim 0 with an object of type torch.cuda.ByteTensor. The only supported types are integers, slices, numpy scalars, or if indexing with a torch.LongTensor or torch.ByteTensor only a single Tensor may be passed.

How to get the init hidden state of LSTM?

LSTM decoder use h_0 and c_0 as init state, how to get the two values from z?

Index Error during Inference

Hi,

When I run inference on my trained vanilla model I often geht an error that there are too many indices for tensor of dimension 0.

$ python inference.py --data_dir data/dlk_txt/ -c models/dlk_model_1/2020-Aug-07-08\:43\:03/E19.pytorch -n 5
Model loaded from models/dlk_model_1/2020-Aug-07-08:43:03/E19.pytorch
Traceback (most recent call last):
  File "inference.py", line 80, in <module>
    main(args)
  File "inference.py", line 44, in main
    samples, z = model.inference(n=args.num_samples)
  File "/mnt/[...]/vae/Sentence-VAE/model.py", line 172, in inference
    input_sequence = input_sequence[running_seqs]
IndexError: too many indices for tensor of dimension 0

I downgraded my python 3.6 virtual environment according to the requirements.txt, but the error still persists.

Have you seen this before?

Thanks.

Unable to download the dataset

hi @timbmg, I am unable to access the dataset with the given command:

wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz

It seems that this file has been deleted. Would you please help to provide a valid link?

Thanks a lot!

Discrete latent variable

Hi,
I am wondering what would have to be changed if we want to have discrete latent variables.

Cheers

Maybe a bug: data leak in decoder when bidirectional == True

Model.py, Line 41:
self.decoder_rnn = rnn(embedding_size, hidden_size, num_layers=num_layers, bidirectional=self.bidirectional, batch_first=True)

If bidirectional == True:
When the decoder is decoding the t-th token, it can obtain information from the whole input sentence.

I think this may be a bug, and I'd appreciate that if you can help double-check this. Many thanks.

Missing requirements.txt

Thanks for creating this repository!
Can you please add a requirements.txt to the repository or add a dependencies section (w/ version numbers) in the README?

Thanks!