Expressive Tacotron (implementation with Pytorch)
Introduction
The expressive Tacotron framework includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.
Available recipes
Expressive Mode
Attention Mode
Differences from Nvidia Tacotron
- More attention modes
- Reduction factor supported (Tacotron1)
- Feeding r-th features for reduction factor in Decoder (Tacotron1)
- Masked loss
Training
Single Tacotron2 with Forward Attention by defalut(r=2). If you want to train with expressive mode, you can reference Expressive Tacotron.
- transfer texts to phones, and save as "phones_path" in hparams.py and change phone dictionary in text.py
python train.py
for single GPUpython -m multiproc train.py
for multi GPUs
Inference Demo
python synthesis.py -w checkpoints/checkpoint_200k_steps.pyt -i "hello word" --vocoder gl
Default Griffin_Lim Vocoder. For other command line options, please refer to the synthesis.py
section.
Acknowledgements
This implementation uses code from the following repos: NVIDIA, MozillaTTS, ESPNet, ERISHA, ForwardAttention