keonlee9420 / parallel-tacotron2 Goto Github PK
View Code? Open in Web Editor NEWPyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
License: MIT License
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
License: MIT License
Parallel-Tacotron2/model/modules.py
Line 208 in a589311
If just install specified requirements + Pillow and fairseq following warnings appear during training start:
No module named 'lightconv_cuda'
If install lightconv-layer from fairseq, the folllowing warning displayed:
WARNING: Unsupported filter length passed - skipping forward pass
Pytorch 1.7
Cuda 10.2
Fairseq 1.0.0a0+19793a7
Hi. The work is amazing. I notice that you mentioned there were some bugs in soft-DTW in "Updates". Have you already solved these problems?
File "/data1/hjh/pycharm_projects/tts/parallel-tacotron2_try/model/parallel_tacotron2.py", line 68, in forward
self.learned_upsampling(durations, V, src_lens, src_masks, max_src_len)
File "/home/huangjiahong.dracu/miniconda2/envs/parallel_tc2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data1/hjh/pycharm_projects/tts/parallel-tacotron2_try/model/modules.py", line 335, in forward
mel_mask = get_mask_from_lengths(mel_len, max_mel_len)
File "/data1/hjh/pycharm_projects/tts/parallel-tacotron2_try/utils/tools.py", line 87, in get_mask_from_lengths
ids = torch.arange(0, max_len).unsqueeze(0).expand(batch_size, -1).to(device)
RuntimeError: upper bound and larger bound inconsistent with step sign
Thank you for you jobs. I got above problem when training. I guess it's a Duration prediction problem. How to solve it?
Hi @keonlee9420 , have you tried the Cython version of Soft DTW from this repo
https://github.com/mblondel/soft-dtw
Is it available to apply for Parallel Tacotron 2 ? I am trying that repo because the current batch is too small when using CUDA implement of @Maghoumi .
I just wonder that @Maghoumi in https://github.com/Maghoumi/pytorch-softdtw-cuda claims that experiment with batch size
But when applying for Para Taco, the batch size is too small, are there any gap?
I was installing your repo(to see whether I can make it converse) on GG cloud T4.
When I load audios with mel-spectrogram frames larger than max sequence of mel len (1000 frames):
For solution, I tried to trim mels for fitting 1024 but it seems complicated, now I filter out all audios with frames > 1024
Any suggestion for handle Long Audios? I wonder how it work at inference steps.
Hi.
According to text in README (will be added more)
i would like to suggest to add my open German "Thorsten" dataset.
Thorsten: a single-speaker German open dataset consists of 22.668 short audio clips of a male speaker, approximately 23 hours in total (LJSpeech file/directory syntax).
https://github.com/thorstenMueller/deep-learning-german-tts/
I cloned the code, prepared data according to README, and just updated:
Hi, Thanks for your excellent work!
Could you possibly share your audio samples, pretrained models and loss curves with me?
Thanks so much for your help!
I following your command to run the code, but I get following error.
File "train.py", line 87, in main output = model(*(batch[2:])) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/ydc/anaconda3/envs/CD/lib/python3.8/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 1 has invalid shape [1, 474, 80], but expected [1, 302, 80]
Hello,
Has anybody been able to train with softdtw loss. It doesn't converge at all. I think there is a problem with the implementation but I could't spot it. When I train with the real alignments it works well
Great work. But I encounter one problems when train this model :(
The error message:
ImportError: cannot import name II form omegaconf
The version of fairseq is 0.10.2 (latest releaser version) and omegaconf is 1.4.1. How to fix it?
Thank you
Can someone share the weights file link? I couldn't synthesize it or use its inference. If I am wrong please tell me the correct method of using it. Thanks
Thanks for sharing the nice model implementation.
When I start training, the following warning appears, do you also get the same message?
I think it's a fairseq installation problem.
No module named 'lightconv_cuda'
And I'm training in batch size 5.... on 24G memory sized RTX 3090. Could the above problem be the cause?
Just wondering if we can train with LJS on this implementation thanks!
Hi, Thanks for implement.
I think Parallel TacoTron2 using same residual Encoder as parallel tacotron 1.
In parallel tacotron, using five 17 × 1 LConv blocks interleaved with strided 3 × 1 convolutions
But, in your implementation, Lconvblock doesn't have stride argument.
How did you handle this part?
Thanks.
I think your code did not add a network for predicting latent represetation during inference.
any inference speed?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.