franxyao / dgm_latent_bow Goto Github PK

Implementation of NeurIPS 19 paper: Paraphrase Generation with Latent Bag of Words

License: MIT License

Python 100.00%

natural-language-generation deep-generative-model paraphrase-generation gumbel-softmax latent-variable-models

dgm_latent_bow's Introduction

Hi there 👋 |

I am a Ph.D. student at the University of Edinburgh with professor Mirella Lapata. Previously I finished my M.S. at Columbia University and B.S. at Peking University. My email address is [email protected]

I study large-scale generative models for human language.

dgm_latent_bow's People

Contributors

Stargazers

Watchers

Forkers

birdhunter22 colinsongf phymucs weifanjiang mars-wei timhuang1 vrublack jieli4970 aolney filco306 dlrandom c00renut mishav78 lukeli97

dgm_latent_bow's Issues

How to finetune this model

I have a task of paraphrasing questions in Vietnamese, but I can't find out how to finetune this model. I tried Googling how to save and load using Tensorflow but the code doesn't work. Is there any way to finetune this model that is actually already written in the original code?

question about test and bow

Hi, thanks for your good paper and codes. and here are my questions.

it seems the controller part misses an independent test codes for inference?
how the ground truth bow produced for the bow prediction task of the encoder?
thanks again.

VAE model

Hello,

Is the code to reproduce the VAE baseline running ?
It tried to run python main.py --model_name=vae but the Vae class seems empty.

Nice work and thanks for the code 👍

is it possible to use this model for paraphrasing general sentences?

i want to paraphrase some english sentences so, i wondered if you can give a hint or script which takes a sentence as an input and paraphrase it using the mode.

Any pre-trained model available?

Kept getting different sort of errors when trying to run the code for training... so just wondering if there is any pre-trained model available?

which rouge version

i attempt to install different rouge versions:0.2.0 0.2.1 0.3.1 0.3.2 ,all this versions meet this error:

Traceback (most recent call last):
File "main.py", line 125, in
main()
File "main.py", line 118, in main
controller = Controller(config)
File "/home/techengin/Documents/dgm_latent_bow/src/controller.py", line 199, in init
self.rouge_evaluator = rouge.Rouge(metrics=['rouge-n', 'rouge-l'], max_n=2)
TypeError: init() got an unexpected keyword argument 'max_n'
I found rouge.Rouge only have args: metrics and stats, not have max_n

which tensorflow version works best ?

Store dependencies

Let's add a constraints/denpendencies file to the repo - perhaps a Pipfile, or requirements.txt.
It would make it easier for people to reproduce.

Pretrained model

Hi @FranxYao,

Nice work!Trying to reproduce the results from your paper.
Do you have a pretrained model? because the training does not seem to be working.

-Ramya

wiki2bio/original_data/word_vocab.txt

Hi,

An error was throwed when I ran the main.py:
FileNotFoundError: [Errno 2] No such file or directory: '../wiki2bio/original_data/word_vocab.txt'

It seems that the file wiki2bio/original_data/word_vocab.txt is not included.

Test error

Hi, Thanks for your great work.

This code cannot run main_test.py successfully when I run python main_test.py, The error message told me that I don't have wikibio dataset, so I add the code for Quora dataset as main.py, but also got following problem:

Traceback (most recent call last):
File "main_test.py", line 87, in
batch = dset.next_batch('train', 100)
TypeError: next_batch() missing 1 required positional argument: 'model'

Could you please give the script of inference and update the code?

Thanks and best regards

how to run the trained checkpoint?

Model directory is empty

Do I need to rebuild the model?

Is there any branch that already has built in model?

Question about setting some hyperparameter values

Hi,
I was wondering whether there is some "intuitive" way to set the following parameters: max_enc_bow, max_dec_bow and sample_size; like how the max_sent_len is set to the 95th percentile of training sequence lengths.
Or are these parameters set based on the validation metric (or is there some additional way)?

what is key2id?

File "main.py", line 91, in main
config.key_size = len(dset.key2id)
AttributeError: 'Dataset' object has no attribute 'key2id'

how can i resolve this ?

Does this project need a specific version of rouge?

I use "pip install rouge" to install , and it automatically installs version 1.0.0

I got errors when running main.py
1.rouge.Rouge(metrics=['rouge-n', 'rouge-l'], max_n=2)
max_n is an unexpected param
rouge-n is not found

Then I delete them and run it again.
2.rouge_scores = self.rouge_evaluator.get_scores(rouge_pred, rouge_ref)
rouge_ref is an invalid parameter. The API get_scores takes rouge_ref as a list of str, but here rouge_ref is a list of list of str

Then I modify rouge_ref, and pass to get_scores a correct param
3.metrics_dict["rouge_1"] = rouge_scores["rouge-1"]["r"]
rouge_scores is a list, not a dict.

So, am I using the correct rouge?

Repetition of words in the generation process

Hi...
I have trained and tested this model rigorously , using Quora dataset and my dataset. I see a lot of repetition of words in the same sentence while generating.
example: I want to order to order a phone , how to clear cookies caches and caches etc.,

Any help in this regard will be appreciated:)
Thanks

bow_topk_prob are not always less than 1

Hi,
I see that while you are calculating the bow_topk_prob in bow_seq2seq.py lin no. 146, you are just summing up the probabilities along with the dimension 1 , but this summing up will lead to probabilities >1 and when you do log x where x>1 log x will be positive and nll will become negative.

bow_topk_prob += tf.reduce_sum(bow_prob, 1) shuld be bow_topk_prob += tf.reduce_mean(bow_prob, 1)

correct me if i am wrong