Code Monkey home page Code Monkey logo

parallel-tacotron2's Issues

why Lconv block doesn't have stride argument?

Hi, Thanks for implement.

I think Parallel TacoTron2 using same residual Encoder as parallel tacotron 1.
In parallel tacotron, using five 17 × 1 LConv blocks interleaved with strided 3 × 1 convolutions

캡처

But, in your implementation, Lconvblock doesn't have stride argument.
How did you handle this part?

Thanks.

Training issue

Thanks for sharing the nice model implementation.

image

When I start training, the following warning appears, do you also get the same message?
I think it's a fairseq installation problem.
No module named 'lightconv_cuda'

And I'm training in batch size 5.... on 24G memory sized RTX 3090. Could the above problem be the cause?

weights required

Can someone share the weights file link? I couldn't synthesize it or use its inference. If I am wrong please tell me the correct method of using it. Thanks

LightWeightConv layer warnings during training

If just install specified requirements + Pillow and fairseq following warnings appear during training start:

No module named 'lightconv_cuda'

If install lightconv-layer from fairseq, the folllowing warning displayed:

WARNING: Unsupported filter length passed - skipping forward pass

Pytorch 1.7
Cuda 10.2
Fairseq 1.0.0a0+19793a7

Soft DTW

Hello,
Has anybody been able to train with softdtw loss. It doesn't converge at all. I think there is a problem with the implementation but I could't spot it. When I train with the real alignments it works well

About FVAE

I think your code did not add a network for predicting latent represetation during inference.

Soft DTW with Cython implementation

Hi @keonlee9420 , have you tried the Cython version of Soft DTW from this repo

https://github.com/mblondel/soft-dtw

Is it available to apply for Parallel Tacotron 2 ? I am trying that repo because the current batch is too small when using CUDA implement of @Maghoumi .


I just wonder that @Maghoumi in https://github.com/Maghoumi/pytorch-softdtw-cuda claims that experiment with batch size

image

But when applying for Para Taco, the batch size is too small, are there any gap?

training problem

  File "/data1/hjh/pycharm_projects/tts/parallel-tacotron2_try/model/parallel_tacotron2.py", line 68, in forward
    self.learned_upsampling(durations, V, src_lens, src_masks, max_src_len)
  File "/home/huangjiahong.dracu/miniconda2/envs/parallel_tc2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data1/hjh/pycharm_projects/tts/parallel-tacotron2_try/model/modules.py", line 335, in forward
    mel_mask = get_mask_from_lengths(mel_len, max_mel_len)
  File "/data1/hjh/pycharm_projects/tts/parallel-tacotron2_try/utils/tools.py", line 87, in get_mask_from_lengths
    ids = torch.arange(0, max_len).unsqueeze(0).expand(batch_size, -1).to(device)
RuntimeError: upper bound and larger bound inconsistent with step sign

Thank you for you jobs. I got above problem when training. I guess it's a Duration prediction problem. How to solve it?

Why no alignment at all?

I cloned the code, prepared data according to README, and just updated:

  1. ljspeech data path in config/LJSpeech/train.yaml
  2. unzip generator_LJSpeech.pth.tar.zip to get generator_LJSpeech.pth.tar
    and the code can run!
    But, no matter how many steps I trained, the images are always like this and demo audio sounds like noise:

截屏2022-08-25 下午3 08 07

It seems cannot run

I following your command to run the code, but I get following error.
File "train.py", line 87, in main output = model(*(batch[2:])) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 1 has invalid shape [1, 474, 80], but expected [1, 302, 80]

Handle audios with long duration

When I load audios with mel-spectrogram frames larger than max sequence of mel len (1000 frames):

  • There is a problem when concatenating pos + speaker + mels: I try to set max_seq_len larger (1500),
  • Then lead to a problem with Soft DTW, they said the maximum is 1024

image

For solution, I tried to trim mels for fitting 1024 but it seems complicated, now I filter out all audios with frames > 1024

Any suggestion for handle Long Audios? I wonder how it work at inference steps.

cannot import name II from omegaconf

Great work. But I encounter one problems when train this model :(
The error message:

ImportError: cannot import name II form omegaconf

The version of fairseq is 0.10.2 (latest releaser version) and omegaconf is 1.4.1. How to fix it?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.