guxd / dialogwae Goto Github PK

Source Code for DialogWAE: Multimodal Response Generation with Conditional Wasserstein Autoencoder (https://arxiv.org/abs/1805.12352)

License: Other

Python 0.24% OpenEdge ABL 99.76%

dialogwae's People

Contributors

Stargazers

Watchers

dialogwae's Issues

Code explanation

Hi, I want to ask about your code.
What does backward(one) or backward(minus_one) do here.

Confused about the evaluation of inter-dist metrics.

Hi, thanks for your insightful work!
I'm confused about the evaluation of inter-dist metrics in the ``sample.py'', as follows (simplified):

while True:
    batch = test_loader.next_batch()  # batch_size is 1 here
    ...
    intra_dist1, intra_dist2, inter_dist1, inter_dist2 = metrics.div_distinct(sample_words, sample_lens)
    inter_dist1s.append(inter_dist1)
    inter_dist2s.append(inter_dist2)
    ...
inter_dist1 = float(np.mean(inter_dist1s))
inter_dist2 = float(np.mean(inter_dist2s))
print("inter_dist1 %f, inter_dist2 %f" % (inter_dist1, inter_dist2))
print("Done testing")

I understand that that inter_dist is computed regarding #n_samples predictions in the metrics.div_distinct(). However, shouldn't this inter_dist be calculated on the entire test set?
That is, inter_dist1 = #distinct_unigram/#total_unigram. From the code, it seems that the inter_dist just measures the single batch, and averages the results. Besides, the batch_size is 1 here, so the inter_dist only measures #n_samples predictions (#n_samples=5 in the code). If #n_samples == 1, then the inter_dist is equivalent to the intra_dist?

Hoping for your replies! Thanks ahead!

Code explanation about data prepocessing

Hello, thank for your open source. I am trying to understand your code. However, in the data.py, it is confused for me to preprocess the data.

In building vocabulary,

print("Load corpus with train size %d, valid size %d, "
              "test size %d raw vocab size %d vocab size %d at cut_off %d OOV rate %f"
              % (len(self.train_corpus), len(self.valid_corpus), len(self.test_corpus),
                 raw_vocab_size, len(vocab_count), vocab_count[-1][1], float(discard_wc) / len(all_words)))

What do the train size, valid size, and test size mean?
The values of all are 2 since they are a tuple with length of 2.

Do you mean that all vocabularies are from the training, testing, and validation data?
However, it only uses the training data to build the vocabulary in the code.

In formatting dialogue,
Is it essential to add [<s>,<d>,</s>] in the start of the dialogue?
Can I not use this?

thank you.

seems the loss of both generator and discriminator would collapse?

Hello, I am running your code with SWDA dataset but the loss going like this:

train_loss_AE:2.7721 train_loss_G:298.7391 train_loss_D:-300.4698
DialogWAE_GMP-basic|SWDA@gpu0 epo:[84/100] iter:[1200/1279] step_time:56s elapsed:0:11:33<0:0:46

train_loss_AE:3.0337 train_loss_G:203.4530 train_loss_D:-84.7853

Valid begins with 41 batches with 28 left over samples
Validation valid_loss_AE:3.0522 valid_loss_G:353.3996 valid_loss_D:-353.3996
Valid begins with 5165 batches with 0 left over samples

I did not change any configuration, but I haven't checked the generated data yet. The loss
is just supposed to be like this? or it has already collapsed?

Thank you for the help!

Wasserstein distance between prior and posterior

Hi,

I'm trying to find the part of the code that attempts to compute the Wasserstein distance between prior and posterior (as in Eq. 5 in your ICLR paper), but couldn't find it. Would you please point to the part of the code for this distance?

Moreover, I found that the latent variables are computed directly from the model (e.g., a fully connected layer) rather than predicting \mu and \sigma and then sampling from that distribution, as stated in Eq. 3 and Eq. 4. Would you please clarify this?

Thanks

Could I apply for your pretrained model in DailyDialog?

Thank you in advance!

It is reasonable if I set n_samples to 1 when I run the sample.py

SWDA seems to give lot of repetition in the sample responses for the test data.

Context 4-1: ('yeah ', 2)
Context 5-0: ("and the people in the city were saying well why should i go do that make the government do that that ' s not my job ", 27)
Context 6-1: ("right they ' ve got a lot of adjustments to make with coming out of what they ' ve been through ", 22)
Context 7-1: ('now and ', 3) Context 8-1: ("they don ' t understand that to make that work they ' ve got to take some responsibility for themselves it ' s not just the government ' s responsibility anymore ", 32)
Target >> you can't just blame it on the government when they give you the freedom to take care of yourself then that puts some responsibility on you as well

Sample 0 >> it the their their their their their their the she she she she she she she she'she'i she'i she'she'i she'she'i'' she'
Sample 1 >> yeah
Sample 2 >> it is is is but she is
Sample 3 >> the the high school is high high school system
Sample 4 >> and but it is
Sample 5 >> these are just
Sample 6 >> in their of their high of their life is worth of an life life life life life life life life life life life life life life life life life life life life life life life life life life life life
Sample 7 >> it in their their their something is something is something is something something something her life life something something something something something something something something something something something something something something something something something something something something something something something
Sample 8 >> but but but i but i i'i i
Sample 9 >> but and but of their name of their life and their life just never just

Is there some way to avoid this?

Unable to achieve published result in DailyDialogue

Hi,
I am trying to retrain your model as a baseline, and till now SWDA gave the results as per the paper. actually, slightly better. But for the DailyDialog dataset, even after multiple runs the best we got is, (row1 is no validation, row2 on test set

A, E, G are for sim_bow
BLEU-R | BLEU-P | F1 | A | E | G
0.305 | 0.170 | 0.218 | 0.940 | 0.609 | 0.857
0.298 | 0.163 | 0.211 | 0.940 | 0.605 | 0.857

Whereas the paper mentions the best results to be

Was there any changes made to the code with respect to the configuration in the paper? I couldn't find any discrepancy. Can you point me to what might be the issue?

Warning when run sample.py：RNN module weights are not part of single contiguous chunk of memory.

After training for 100 epoch，I try to test the performance of models using the sample.py.
The first question is: which metric should I use to select the best model? The BLEU precision keeps decreasing while the recall reach the highest point at 88th epoch. I am a freshman in GAN.

Then I decided to find the best performer by testing the models. Here comes the problem:

/search/odin/hejunqing/DialogWAE/modules.py:86: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
hids, h_n = self.rnn(inputs, init_hidden)
/search/odin/hejunqing/DialogWAE/modules.py:141: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
hids, h_n = self.rnn(utt_floor_encs, init_hidden)
/search/odin/hejunqing/DialogWAE/modules.py:289: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
decoder_output, decoder_hidden = self.rnn(decoder_input, decoder_hidden)

These warnings appear when I run sample.py. with pytorch 0.4.0.

Stop training the context during train_G/train_D?

I find it reasonable to stop training utterance encoder when training generator and discriminator.

However, why don't you shut down context encoder as well?

Is there any specific consideration?

Thanks.

guxd / dialogwae Goto Github PK

dialogwae's People

Contributors

Stargazers

Watchers

Forkers

dialogwae's Issues

Code explanation

Confused about the evaluation of inter-dist metrics.

Code explanation about data prepocessing

seems the loss of both generator and discriminator would collapse?

Wasserstein distance between prior and posterior

Could I apply for your pretrained model in DailyDialog?

It is reasonable if I set n_samples to 1 when I run the sample.py

SWDA seems to give lot of repetition in the sample responses for the test data.

Unable to achieve published result in DailyDialogue

Warning when run sample.py：RNN module weights are not part of single contiguous chunk of memory.

Stop training the context during train_G/train_D?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent