Code Monkey home page Code Monkey logo

sentence-vae's Introduction

Hi there ๐Ÿ‘‹

๐Ÿง‘โ€๐Ÿ”ฌ I am a PhD Student at the Ubiquitous Knowledge Processing Lab at TU Darmstadt. My research is on Natural Language Processing, focussing on scientific Question Answering.
๐Ÿ’ฌ I am one of the main developers of UKP-SQuARE (github), a Question Answering platform that allows you to rapidly make your QA models available to the public and conduct qualitative analysis.
๐Ÿ‘จโ€๐Ÿ’ป I am interested in developing efficient, scalable and easy-to-use Machine Learning applications.

Tim's GitHub stats

sentence-vae's People

Contributors

dependabot[bot] avatar dhanajitb avatar kaletap avatar timbmg avatar topshik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sentence-vae's Issues

maybe a bug in the kl_anneal_function

in the function :

def kl_anneal_function(anneal_function, step, k, x0):

    if anneal_function == 'logistic':  

        return float(1/(1+np.exp(-k*(step-x0))))  

    elif anneal_function == 'linear':  

        return min(1, step/x0)  

you will see k is a str.that is wrong and i change it into 1

Optimiser SGD leads to KL vanishing

Hi,

Thank you so much for your code, it helped me a lot.

I am wondering about optimiser, I think in the paper, they say they used SGD, but when I changed the optimiser to SGD in your code, I got the KL vanishing problem. It is okay if I use Adam, I don't know why this happens, I am wondering if you have insights about this, thank you so much!

bi-directional gru

I found a bug in the code, when I use bi-directional gru the dimensions don't correspond.

Traceback (most recent call last):
File "inference.py", line 80, in
main(args)
File "inference.py", line 44, in main
samples, z = model.inference(n=args.num_samples)
File "/home/bli/Binyun/Generation/Sentence-VAE/model.py", line 153, in inference
output, hidden = self.decoder_rnn(input_embedding, hidden)
File "/home/bli/.conda/envs/Xihe/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/bli/.conda/envs/Xihe/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 819, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/home/bli/.conda/envs/Xihe/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 229, in check_forward_args
self.check_hidden_size(hidden, expected_hidden_size)
File "/home/bli/.conda/envs/Xihe/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 223, in check_hidden_size
raise RuntimeError(msg.format(expected_hidden_size, list(hx.size())))
RuntimeError: Expected hidden size (2, 10, 256), got [1, 2, 10, 256]

The input of decoder seems not right

In your model.py,

        # decoder input
        input_embedding = self.word_dropout(input_embedding)
        packed_input = rnn_utils.pack_padded_sequence(input_embedding, sorted_lengths.data.tolist(), batch_first=True)

the input of decoder (i.e. input_embedding) is the same as the input of encoder, that seems not correct.
According to the cited paper,
the input of encoder is ['RNNs', 'work']
the input of decoder is ['<EOS>', 'RNNs', 'work']
the output of decoder is ['RNNs', 'work', '<EOS>']
So I think the input of decoder should has one token earlier than the input of encoder...

maybe a bug in the train.py

I got a bug in the following code:
target = target[:, :torch.max(length).data[0]].contiguous().view(-1)
and it fixed when I change code into:
target = target[:, :torch.max(length).data].contiguous().view(-1)

Confusion about the batch in model.py(I think batch should be the second dimension.)

I read the code very carefully.
I am confused about the 65th line of model.py.
I think the second dimension of "hidden" is batch, not the first one.
Even though the encoder have been set with "batch_first=Trueโ€, the output will be have "batch" in the first dimension, but the hidden state is not.
I have test this on my own computer.

Of course, the code can run without problems. I just feel confused since the two dimensions are mixed up in the code.
Is there any one can help me with it?

Zero-dimensional tensor

Hi,

When I run python train.py, I get the error at line 71: zero-dimensional tensor (at position 1) cannot be concatenated. Could you please help me debug this?

Regards
Ramsu

Change in ELBO during training

Hi,

Thanks for sharing the code, this is quite helpful! One thing I noticed while training the model - I find that the mean validation ELBo almost never improves (i.e. the validation ELBo after 1st epoch is the lowest among all 10 epochs of training). But training ELBo reduces smoothly (as expected). In such a situation, how do we choose the best model? Is it the one after 1st epoch (but might not be sufficiently trained) or its generally found that training for longer epochs helps SentenceVAE models so we pick the last checkpoint? Also, curious to know if others also faced similar issues while training or I'm missing something here?

Thanks,
Soumya

Question about the inference function

In your inference function you have

if self.bidirectional or self.num_layers > 1:
    # unflatten hidden state
     hidden = hidden.view(self.hidden_factor, batch_size, self.hidden_size)

hidden = hidden.unsqueeze(0)

Shouldn't the unsequeeze happen only in an else condition if bidirectional is false and number of layers is one? Otherwise if the model uses a bidirectional GRU or has num_layers > 1 you will have a 4d tensor: (1, hidden_factor, batch_size, hidden_size)

Pre-trained model?

Is there a pre-trained model somewhere? I'd like to play around with this and it would be nice to skip the training step if someone else has already done it.

Error due to invalid data types

Hi Tim,
I guess you forgot to add .cuda to model at line no. 40, This is causing TypeError while trying to execute the inference.py code.

use_gpu = torch.cuda.is_available()
    if use_gpu:
        model=model.cuda()
    model.eval()

I guess the above snippet will fix it. :)

inference

I trained the model with penn treebank dataset. When i run the inference.py, generated samples look the same. It always generate that "the company said that..." or "the company...". It is very rare that generated sentence contains words different from "the company". Am i missing something?

Repeated content

I used your code and trained a model to generate new sentences. The problem is that there are so many repeated tokens in generated samples.

Any insight how to deal with this?

For example, token appears so many times.

https://pastebin.com/caxz43CQ

imputation

How to use this model to impute a sentence like ' I love _'? I do not undersatnd about this part.

Maybe a bug in constructing hidden states.

(1) hidden = self.latent2hidden(z)

(2) if self.bidirectional or self.num_layers > 1:
(3)     hidden = hidden.view(self.hidden_factor, batch_size, self.hidden_size)
(4) else:
(5)     hidden = hidden.unsqueeze(0)

I think line (3) should be

hidden = hidden.view(batch_size, self.hidden_factor, self.hidden_size).transpose(0, 1)

This snap of code appears in both forward and inference.

question on inputs to encoder and decoder

line 59 and line 95 in model.py are almost the same except the input to decoder is using embedding dropout. my question is shouldn't the input to decoder starting with 'eos'?

Wrong usage of Bidirectional GRU

while declaring the encoder cell, the resultant hidden size is multiplied by hidden_factor. I think Pytorch already accommodates this new size and returns the outputs accordingly. Therefore there is no need to apply this scaling by hidden_factor.

Inference edge-case bug.

There is a bug which arises when len(running_seqs) == 1 and len(input_sequence.size()) == 0. This is fairly rare, something like 1/100 times when using inferencing. It occurs when there is just 1 sequence yet to be completed and it isn't at the max length for that sentence so it still tries to generate. The issue is with trying to index input_sequence = input_sequence[running_seqs] which gives the error "IndexError: too many indices for tensor of dimension 0." My fairly naive solution (still getting into working with tensors) is the following:

if len(running_seqs) > 0:
    if len(running_seqs) == 1 and len(input_sequence.size()) == 0:
        pass
    else:
        input_sequence = input_sequence[running_seqs]
        hidden = hidden[:, running_seqs]

Which simply doesn't try to index the input sequence as it is just a scalar. Then it needs to be handled differently at the top of the while loop too:

if len(running_seqs) == 1 and len(input_sequence.size()) == 0:
     input_sequence = input_sequence.unsqueeze(0)
         
input_sequence = input_sequence.unsqueeze(1)

To add the dimensionality it needs to match the shape required to run it through the network.

Word Dropout not working

Using --word_dropout causes an error.
For eg. python3 train.py --word_dropout 0.62 gives the following error:

Traceback (most recent call last):
  File "train.py", line 214, in <module>
    main(args)
  File "train.py", line 118, in main
    logp, mean, logv, z = model(batch['input'], batch['length'])
  File "/user1/anaconda3/envs/sentence-vae/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/user1/Sentence-VAE/model.py", line 90, in forward
    prob[(input_sequence.data - self.sos_idx) * (input_sequence.data - self.pad_idx) == 0] = 1
TypeError: Performing basic indexing on a tensor and encountered an error indexing dim 0 with an object of type torch.cuda.ByteTensor. The only supported types are integers, slices, numpy scalars, or if indexing with a torch.LongTensor or torch.ByteTensor only a single Tensor may be passed.

Index Error during Inference

Hi,

When I run inference on my trained vanilla model I often geht an error that there are too many indices for tensor of dimension 0.

$ python inference.py --data_dir data/dlk_txt/ -c models/dlk_model_1/2020-Aug-07-08\:43\:03/E19.pytorch -n 5
Model loaded from models/dlk_model_1/2020-Aug-07-08:43:03/E19.pytorch
Traceback (most recent call last):
  File "inference.py", line 80, in <module>
    main(args)
  File "inference.py", line 44, in main
    samples, z = model.inference(n=args.num_samples)
  File "/mnt/[...]/vae/Sentence-VAE/model.py", line 172, in inference
    input_sequence = input_sequence[running_seqs]
IndexError: too many indices for tensor of dimension 0

I downgraded my python 3.6 virtual environment according to the requirements.txt, but the error still persists.

Have you seen this before?

Thanks.

Discrete latent variable

Hi,
I am wondering what would have to be changed if we want to have discrete latent variables.

Cheers

Maybe a bug: data leak in decoder when bidirectional == True

Model.py, Line 41:
self.decoder_rnn = rnn(embedding_size, hidden_size, num_layers=num_layers, bidirectional=self.bidirectional, batch_first=True)

If bidirectional == True:
When the decoder is decoding the t-th token, it can obtain information from the whole input sentence.

I think this may be a bug, and I'd appreciate that if you can help double-check this. Many thanks.

Missing requirements.txt

Thanks for creating this repository!
Can you please add a requirements.txt to the repository or add a dependencies section (w/ version numbers) in the README?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.