It seems there is a logical error in synthetic_dataset.py about cone-of-silence HOT 4 CLOSED

vivjay30 commented on August 29, 2024

It seems there is a logical error in synthetic_dataset.py

from cone-of-silence.

Comments (4)

zhangshengoo commented on August 29, 2024

I get overfitting results during training. The training loss is decreasing but the test loss is almost unchanged. I adopt the same model parameters and the same size of dataset mentioned in the article. Have you encountered this problem？Sorry to bother you.

from cone-of-silence.

hust-cxl commented on August 29, 2024

I get overfitting results during training. The training loss is decreasing but the test loss is almost unchanged. I adopt the same model parameters and the same size of dataset mentioned in the article. Have you encountered this problem？Sorry to bother you.

I don't know whether your test set is "real test set", I mean if you did not change any code of this project, the log output like "Done with training, going to testing, Test set: Average Loss: XXXX" you see is not real test set loss, it is validation loss instead. It depends on your data division:
i) data set for validation was generated from your training data(part of your training set)
ii) data in your validation set has not been seen in training set
if your data division was same as i), maybe you ara facing overfitting, but if your strategy was same as ii), maybe you should consider the training set itself.

from cone-of-silence.

vivjay30 commented on August 29, 2024

In synthetic_dataset.py get_mixture_and_gt() funtion, about in line 161

shifted_gt, _ = utils.shift_mixture(np.stack(gt_waveforms), target_pos, self.mic_radius, self.sr)
should be in the outside of the loop like this:

Thanks for the fix! Indeed this should only be calculated once at the end

from cone-of-silence.

vivjay30 commented on August 29, 2024

Regarding the overfitting problem: If you are training with purely synthetically rendered data, then you can generate arbitrarily many samples. Consider grabbing more types of noises, or even using other voice examples like Librispeech etc. When training with real data, it is time consuming to gather this data. Therefore it is very easy to overfit. However it is important to train with real data because a network trained on purely synthetic data will not generalize to real data

from cone-of-silence.

It seems there is a logical error in synthetic_dataset.py about cone-of-silence HOT 4 CLOSED

Comments (4)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent