Hi I am a super noob when it comes to this so bare with me <p di

I am having the same problem but when train_forward step. <div class="highlight hi

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

non-empty TensorList?,about as-ideas/forwardtacotron

cschaefer26 commented on September 25, 2024

Hi, no problem. This error usually occurs when the duration predictor predicts zero durations - this happens mostly in one of the two cases: 1) weird input - e.g. only a single whitespace or similar or 2) a non-trained model (0k steps). With standard settings the repo stores the latest weights under checkpoints/ljspeech_tts.forward/latest_weights.pyt (and latest_optim.pyt) and in the same directory also checkpoints every 10K steps. If you want to resume training from a certain checkpoint simply copy the respective weitghts and optimizer file to latest_weights.pyt and latest_optim.pyt. Good luck!

from forwardtacotron.

prajwaljpj commented on September 25, 2024

I am having the same problem but when train_forward step.

(forwardtacoenv) ubuntu@prajwal:~/projects/ForwardTacotron$ CUDA_VISIBLE_DEVICES=1,2,3,4 python train_forward.py                                                                                           
/home/ubuntu/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.                 
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.                   
  from numba.decorators import jit as optional_jit
Using device: cuda

Initialising Forward TTS Model...

num params 22759281
Restoring from latest checkpoint...
Loading latest weights: /home/ubuntu/projects/ForwardTacotron/checkpoints/ljspeech_tts.forward/latest_weights.pyt
Loading latest optimizer state: /home/ubuntu/projects/ForwardTacotron/checkpoints/ljspeech_tts.forward/latest_optim.pyt
Loaded model at step: 0
+-----------+------------+---------------+
|   Steps   | Batch Size | Learning Rate |
+-----------+------------+---------------+
| 10k Steps |     32     |    0.0001     |
+-----------+------------+---------------+

| Epoch: 1/14 (761/762) | Mel Loss: 2.563 | Dur Loss: 5.416 | 2.8 steps/s | Step: 0k | Traceback (most recent call last):
  File "train_forward.py", line 97, in <module>
    trainer.train(model, optimizer)
  File "/home/ubuntu/projects/ForwardTacotron/trainer/forward_trainer.py", line 38, in train
    self.train_session(model, optimizer, session)
  File "/home/ubuntu/projects/ForwardTacotron/trainer/forward_trainer.py", line 64, in train_session
    m1_hat, m2_hat, dur_hat = model(x, m, dur)
  File "/home/ubuntu/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/projects/ForwardTacotron/models/forward_tacotron.py", line 137, in forward
    x, _ = self.lstm(x)
  File "/home/ubuntu/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 577, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: stack expects a non-empty TensorList

Unfortunately rerunning throws the same error at the same step. How do I pin point what the problem is?

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Hi, did you check if your tacotron was trained correctly before extracting durations, i.e. a diagonal attention alignment? If the problem occurs in training I am assuming that the extracted durations are off.

from forwardtacotron.

prajwaljpj commented on September 25, 2024

Yes it could not get attentions. Would removing leading and trailing silences help? Im training on Hindi dataset of my own of 50 hours. Here are some of the details. I have tried to make t as close as to LJ Speech.

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Hi, for sure it will help to trim the silences. Is the above plot teacher forced (gta)? Only the gta alignment is important. It may help also to play around with the reduction schedule.

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Also, is the phonemizer doing a good job at hindi?

from forwardtacotron.

prajwaljpj commented on September 25, 2024

The plot above is generated attention.
Here is the GTA attention

I havent actually checked the phonemizer as im not very fluent in Hindi myself. Where can I find these?
I just made the changes to

language = 'en-us'
tts_cleaner_name = 'english_cleaners'

to

language = 'hi'
tts_cleaner_name = 'basic_cleaners'

in hparams.py as provided in espeak-ng/docs/languages.md

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Hi, that actually looks pretty ok to me. Could you check the numpy files in the /data/alg folder? I suspect there might be a fishy broken file with zero durations or similar - i.e. you could iterate through the numpy files and check if sum(durations) is very low. Does the training break at the same spot every time? I could imagine its breaking for a specific file id.

from forwardtacotron.

cschaefer26 commented on September 25, 2024

EDIT: I am seeing in the plot that at the first step the attention seems all over the place, that could mess with the duration extraction (there is a simple correction to some jumps) - I assume that cutting the leading silence would remove this.

from forwardtacotron.

prajwaljpj commented on September 25, 2024

It looks like there are files which have a lot of zero. But there are no files with sum(duration) low.

  [ 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   4   4   6   4   6  37   0   0   0   0
 568]
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0 685   0]
[13  4  5  8  7  4  3  4  3  2  5  3  5  2  5  4  2  2  5  6  6  4  6  7
  4  4  7 40  2  6  5  6  6  6  9  5  4  5  6  4  3  7  6  4  2  3  3  3
  3  5  8  7  4  3  5  2  2  5  8  6  4  4  4  3  3  3  5  7  7  4  4  5
  6  6  3  5  3  3  4  3  9  4  5  3  4  5  5  5  4  5  2  3  7 82]
[ 3  7  8  4  3  5  7  8  4  5  4  6  7  4  4  4  4  8  4  3  4  6  3  5
  5  5  4  3  2  6  3  3 10  6  3  6  6  7  3  4  8  5  5  5  2  0  5  4
  3  4  6  6  8  4  7 13  3  7  7  7  3  4  5  5  4  3  5  4  7  6  4  5
  3  3  4  6  5  6  3  3  4  4  9 62]
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 415   0]
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0 473   0]
[16  1  3  5  7  2  3  3  5  5  5  7  6  5  5  4  5  5  7  5  6 11  7  7
  7  5  4  3  6  6  6  3  5  5  4 12 23  9  9  6  6  9  4  6  4  3  2  3
  5  6  6  8  7  3  4  6  3  5  5  4  5  7  8  5  5  3  7  4  6  8  5  8
  7 66  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
[52  4  3  4  5  6  3  3  8 11  7  6  6  9  5  3  4  4  4  5  7  5  5  5
  4  4  2  4  3  3  5  4  7  3  7  7  6  5  7  8  4  6  6  4  3  4  4  4
  6  5  5  4  4  6 86  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0]
[ 31   2   8   7   9   8   5  11   0  13   7   9  10   8   4   7   7   5
   4   3   6  35   0  11   8  10   9  15   6   9  14  12  11   1   7   7
   8   6   5   4   4   5   6   5   3   5   5   2   4   4   5   5   7   4
   6  10   6   8   4   2   4   8   4   6   5   8   5   5   5   6   7   6
   3   3   4   5   4   7  39   0   9   6   5   4   6   6   4   5   5   3
   4   7   6   7   6 137   0   0]
[17  3  6  5  5  4  5  2  7  4  6  3  5  3  7  6  5  3  5  5  6  4 24  5
  3  3  4  3  3  3  4  3  4  4  5  6  7  5  4  7  4  3  2  6  4  5  2  4
  6 33  2  8  9  7  7  9  8  6  4  6  5  6 14 13  8  7  4  3  3  4  4  4
  6  3  5  4  2  4  3  2  6  5  7  5  3  3  3  3  5 82]

I also tested the phonemizer on a few sentences with someone who is familiar with Hindi and it seems to be doing really well!

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Hi, yes these are messed up durations and definitely cause trouble. You could either remove the ones with first entries zero or try to redo the duration extraction with leading silences removed. Let me know if it solves the problem!

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Yeah these are messed up. You could either try to remove those or redo the duration extraction with cutted silences.

…

On Thu, 27 Aug 2020, 14:54 Prajwal Rao, ***@***.***> wrote: It looks like there are files which have a lot of zero. But there are no files with sum(duration) low. [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 6 4 6 37 0 0 0 0 568] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 685 0] [13 4 5 8 7 4 3 4 3 2 5 3 5 2 5 4 2 2 5 6 6 4 6 7 4 4 7 40 2 6 5 6 6 6 9 5 4 5 6 4 3 7 6 4 2 3 3 3 3 5 8 7 4 3 5 2 2 5 8 6 4 4 4 3 3 3 5 7 7 4 4 5 6 6 3 5 3 3 4 3 9 4 5 3 4 5 5 5 4 5 2 3 7 82] [ 3 7 8 4 3 5 7 8 4 5 4 6 7 4 4 4 4 8 4 3 4 6 3 5 5 5 4 3 2 6 3 3 10 6 3 6 6 7 3 4 8 5 5 5 2 0 5 4 3 4 6 6 8 4 7 13 3 7 7 7 3 4 5 5 4 3 5 4 7 6 4 5 3 3 4 6 5 6 3 3 4 4 9 62] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 415 0] [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 473 0] [16 1 3 5 7 2 3 3 5 5 5 7 6 5 5 4 5 5 7 5 6 11 7 7 7 5 4 3 6 6 6 3 5 5 4 12 23 9 9 6 6 9 4 6 4 3 2 3 5 6 6 8 7 3 4 6 3 5 5 4 5 7 8 5 5 3 7 4 6 8 5 8 7 66 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [52 4 3 4 5 6 3 3 8 11 7 6 6 9 5 3 4 4 4 5 7 5 5 5 4 4 2 4 3 3 5 4 7 3 7 7 6 5 7 8 4 6 6 4 3 4 4 4 6 5 5 4 4 6 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [ 31 2 8 7 9 8 5 11 0 13 7 9 10 8 4 7 7 5 4 3 6 35 0 11 8 10 9 15 6 9 14 12 11 1 7 7 8 6 5 4 4 5 6 5 3 5 5 2 4 4 5 5 7 4 6 10 6 8 4 2 4 8 4 6 5 8 5 5 5 6 7 6 3 3 4 5 4 7 39 0 9 6 5 4 6 6 4 5 5 3 4 7 6 7 6 137 0 0] [17 3 6 5 5 4 5 2 7 4 6 3 5 3 7 6 5 3 5 5 6 4 24 5 3 3 4 3 3 3 4 3 4 4 5 6 7 5 4 7 4 3 2 6 4 5 2 4 6 33 2 8 9 7 7 9 8 6 4 6 5 6 14 13 8 7 4 3 3 4 4 4 6 3 5 4 2 4 3 2 6 5 7 5 3 3 3 3 5 82] I also tested the phonemizer on a few sentences with someone who is familiar with Hindi and it seems to be doing really well! The train_forward.py breaks at Epoch: 1/14 (761/762) | Mel Loss: 2.563 | Dur Loss: 5.416 | 2.8 steps/s | Step: 0k | every time — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANCC4WU7T35GTLAMKCHXJPTSCZJQLANCNFSM4QESNVEA> .

from forwardtacotron.

prajwaljpj commented on September 25, 2024

Thanks @cschaefer26 ! I am doing that now. Will keep you updated.
Is there a way i can run it for multi GPUs? I 8 Tesla V100s. I looked into #9.
Also, even on single GPU it seems to utilize only around 30 to 40% capacity. I guess i can change the batch size. I hope this does not affect training. Any pointers?

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Yeah increasing batch size could help. I didn't look into multi gpu training yet as I don't have the hardware to benefit from it, imo it should be pretty straightforward to implement.

from forwardtacotron.

prajwaljpj commented on September 25, 2024

These are attentions i got after trimming start and end silences

The durations are predicted well in this case!

[ 3  5  7  6  5  4  6  4  6  5  7  6  4  8 20 22  5  5  3  4  3  4  5  5
  7  5  6  7  7  4  6 14  5  4  5  4  5  4  5  5  5  6  9  8  8 22  8 10
  6  5  7  5  5  6  4  6  5  4  6  7 28  9  3  6  6  4  4  4  9  7  4  4
 10  5  6  3  3  4  3  5  3  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0]
517
[0 3 6 7 3 2 4 5 4 7 5 5 3 9 8 4 4 3 4 5 3 4 4 5 4 4 3 5 5 4 3 2 3 6 3 7 5
 5 3 3 4 3 5 3 5 5 4 3 4 6 7 5 6 4 3 7 9 4 6 6 8 4 5 6 3 3 4 6 6 5 2 0 0 0
 0 0 0 0 0 0]
323
[ 1  1  6  8  4  3  3  3  2  5  3  6  3  4  3  3  3  4  5  6  3  6  8  4
  5  6 16 29  4  4  6  7  6  8  5  4  5  6  5  3  6  6  3  4  3  2  4  3
  4  8  7  5  4  3  2  3  4  8  6  4  4  5  2  4  3  5  5  9  4  4  5  5
  5  4  5  3  3  5  2  9  5  5  3  3  4  6  6  4  3  1  0  0  0  0]
436
[ 2  7  4  3  5  6  8  5  4  5  6  6  6  3  3  5  8  4  3  4  6  3  5  4
  5  2  3  4  6  4  3  9  8  2  6  5  6  4  4  9  4  5  5  3  0  5  3  4
  4  5  7  7  5  6 13  4  7  6  7  4  5  3  6  3  3  5  5  5  6  5  5  3
  3  4  6  6  5  4  2  4  4  0  0  0]
391
[ 2  4  6  8  4  6  3  7  8  7  6  6  6  7  9 12 13  7  5  7  7 11  9  8
  7  8  5  6  7  6 10 24  6  8  4  4  6  5  8  6  6  6  2  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
302
[  0   1   6   4   7  10   4   5   6   6  10  11   9   4   5  10  22  17
   4   4   6   4   4   6  10   6   5   6   3  11   5   6   9   3   1   4
   3   4   6   5   5   5   4 177   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0]
443
[ 0  2  4  7  3  2  3  5  6  4  7  5  6  3  5  5  6  7  5  6 10  8  6  8
  5  4  3  6  5  7  3  5  4 18 12 12  8  7  7  6  9  4  5  4  4  2  3  6
  5  6  9  5  4  4  5  4  3  7  4  5  8  7  5  4  4  7  3  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
371
[ 0  1  4  5  7  4  1 10  9  8  6  5 10  5  2  4  3  4  6  7  6  4  5  5
  4  2  3  3  3  5  4  6  5  6  6  7  5  7  7  6  6  5  4  3  4  4  5  5
  5  5  4  4  5  5  0 28 49  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0]
341
[ 0  1  6 10  8  5 13 27  9  6  8  9 10  4  6  7  6  3  4  8 26 11  7  8
  8 11 12  6 12 12 13  9  3  8  7  6  6  6  4  5  5  5  6  3  4  5  3  3
  5  6  4  6  4  6  9  7  7  5  3  5  6  4  6  5  9  5  3  6  7  5  6  4
  3  5  4  6  5 30 10  8  7  4  6  7  4  4  5  5  3  4  6  7  5  7 15  2
  0 31 59  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
749
[ 4  3  5  4  5  4  3  6  5  5  4  5  4  6  6  6  2  4  6 17  8  9  6  3
  3  2  4  3  3  4  3  5  3  4  7  6  5  6  5  4  5  1  5  4  5  3  4  4
 12 25  8  8  8  6 10  8  5  5  6  6  5 13 14  9  7  3  3  2  5  4  4  6
  4  5  4  2  3  4  1  6  5  6  5  4  3  3  3  2  0  0  0  0  0  0]
470

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Looks better. Does training work now with the forward tacotron? In the sample it looks though as if there is also some trailing silence left which probably produces some large duration entries at the end.

from forwardtacotron.

prajwaljpj commented on September 25, 2024

@cschaefer26 Thank you! The training is working. I did remove the training silences. I should maybe rerun silence trimming by reducing the threshold even more. Will keep you updated! Around how many iterations do I train it for? The loss seems to be stagnating.

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Hi, 500K is plenty. You could take some of the saved models and generate audio files (maybe together with a vocoder), then manually select the best model - its hard to say from the losses if the model is overfitting or not (the validation loss is kinda meaningless, that's been discussed in one of the other issues), hope the audio quality is fine - otherwise it probably makes sense to start looking into manipulating the preprocessing and / or vocoder.

from forwardtacotron.

prajwaljpj commented on September 25, 2024

Thank you for your feedback @cschaefer26 . I ran inferences over the model and melgan vocoder. Synthesis is great and natural sounding but I seem to be getting kind of a robotic voice. I'm not sure whether its because of melgan is under trained or if forward tacotron is overfitted. Do you have any pointers regarding this?

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Hi, as for robotic - do you mean the prosody or voice quality (e.g. metallic sound)? Do you have an example? Undertrained melgan usually has some hissing metallic sound that slowly declines after 1 week of training.

from forwardtacotron.

prajwaljpj commented on September 25, 2024

Here is an example of the audio. I have been training for about 4 days now but it has trained around 1275k.
sample_synth.zip

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Sounds not bad imo, although I do not understanda single word :-). Is that a sigle sentence? Generally I feel that the model performs best if applied sentence by sentence. I would guess you mean the prosodoy is a bit robotic? You could try to compare some earlier model versions (e.g. after 200k, 500k steps) to see if its different - the prosody is mainly given by the duration predictions. Other than that, it could be worth a try to mess around with the duration predictor architecture (there is some discussion here #7). Generally I feel that the forward architecture is removing a lot of pauses (i.e. after a comma) - which makes the prosody a bit robotic.

from forwardtacotron.

prajwaljpj commented on September 25, 2024

Yes it was a single sentence. Training melgan for longer did the trick! Thanks for pointing out #7, I'm lacking hardware resources now, I will try to implement it once I get them.

from forwardtacotron.

cschaefer26 commented on September 25, 2024

Ah cool. I highly recommend to tweak the melgan using larger receptive fields, i.e. more layers for the resnet. I got a pretty good quality boost using 4-7 layers (successively increasing layers for each resnet)

from forwardtacotron.

non-empty TensorList? about forwardtacotron HOT 24 OPEN

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent