rodrigokrosa / tacotron2-gl-brazillian-portuguese Goto Github PK

Repository to document results of an Tacotron 2 adaptation for brazilian portuguese.

License: MIT License

Jupyter Notebook 100.00%

tacotron2-gl-brazillian-portuguese's Introduction

Tacotron 2 for Brazilian Portuguese Using GL as a Vocoder and CommonVoice Dataset

"Conversão Texto-Fala para o Português Brasileiro Utilizando Tacotron 2 com Vocoder Griffin-Lim" Paper published on SBrT 2021.

Repository containing pretrained Tacotron 2 models for brazilian portuguese using open-source implementations from Rayhane-Mama and TensorflowTTS.

Forked Tacotron 2 implementations

To train both models, modifications to adapt to brazilian portuguese were made at the original source code. They can be seen at their forked repositories:

Dataset used

Dataset used was originated from Common Voice Corpus 4 portuguese dataset. Audio and text from the top speaker with around 6h of data was used and processed with the notebooks in this repository.

Audio samples

For synthesized audio avaliation purposes, it was used 200 phonetically balanced sentences from SEARA, 1994. And their synthesized audios, mel spectrograms and alignment plots are available at this google drive link

Trained models

The trained models can be found at this google drive link

Avaliation

The spreadsheet containing the avaliation of the 200 audio samples can be found here

Steps to synthesize sentences using Rayhane-Mama's Tacotron 2 implementation

First, clone forked repository

git clone https://github.com/kobarion/Tacotron-2

Create conda environment

conda create -y --name tacotron-2 python=3.6.9

Install needed dependencies

conda install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools

Install libraries

conda install --force-reinstall -y -q --name tacotron-2 -c conda-forge --file requirements.txt

Enter conda environment

conda activate tacotron-2

Changes made to the repository in the forked version are in the following files. Check those to adapt to your dataset.

Tacotron-2/datasets/preprocessor.py
Tacotron-2/tacotron/synthesize.py
Tacotron-2/tacotron/utils/symbols.py
Tacotron-2/wavenet_vocoder/models/modules.py
Tacotron-2/hparams.py
Tacotron-2/preprocess.py
Tacotron-2/requirements.txt

Create a new folder with the trained model

Tacotron2/logs-Tacotron/taco_best/{checkpoint/tacotron_model.ckpt-70000.index/data/meta}

Create a text file with sentences to be synthesized

sentences.txt

Synthesize new sentences using the following command

python synthesize.py --model=”Tacotron” --checkpoint=”best” --text_list=”sentences.txt”

To train TensorflowTTS's Tacotron 2 using the fork

With this notebook, resample CommonVoice samples to 22.05 kHz

resample_wavs.ipynb

The following files were modified to use brazilian portuguese

TensorFlowTTS/tensorflow_tts/processor/commonvoicebr.py
TensorFlowTTS/tensorflow_tts/configs/tacotron2.py
TensorFlowTTS/ttsexamples/tacotron2/conf/tacotron2.v1.yaml
TensorFlowTTS/tensorflow_tts/inference/auto_processor.py
TensorFlowTTS/preprocess/commonvoicebr_preprocess.yaml
TensorFlowTTS/notebooks/tacotron_synthesis.ipynb

Command to preprocess the dataset

tensorflow-tts-preprocess 
--rootdir ./commonvoicebr 
--outdir ./dump_commonvoicebr 
--config preprocess/commonvoicebr_preprocess.yaml 
--dataset commonvoicebr

Command to normalize the dataset

tensorflow-tts-normalize -
--rootdir ./commonvoicebr 
--outdir ./dump_commonvoicebr 
--config preprocess/commonvoicebr_preprocess.yaml 
--dataset commonvoicebr

Command to train TensorflowTTS's tacotron 2 model

CUDA_VISIBLE_DEVICES=0 python ttsexamples/tacotron2/train_tacotron2.py
 --train-dir ./dump_commonvoicebr_16/train/ 
--dev-dir ./dump_commonvoicebr_16/valid/ 
--outdir ./ttsexamples/tacotron2/exp/train.tacotron2_finetune_32_r_2_wo_mx.v1/ 
--config ./ttsexamples/tacotron2/conf/tacotron2.v1.yaml 
--use-norm 1 
--pretrained ./ttsexamples/tacotron2/exp/train.tacotron2_pretrained.v1/model-65000.h5 --resume ""

Command to decode

CUDA_VISIBLE_DEVICES=0 python ttsexamples/tacotron2/decode_tacotron2.py 
--rootdir ./dump_commonvoicebr_16/valid/
 --outdir ./prediction/tacotron2_commonvoicebr-70k/ 
--checkpoint ./ttsexamples/tacotron2/exp/train.tacotron2_16_mx.v1/checkpoints/model-70000.h5 
--config ./ttsexamples/tacotron2/conf/tacotron2.v1.yaml --batch-size 16

For inference, use the following jupyter notebook

tacotron_synthesis.ipynb

References

SEARA, I.. Estudo Estatístico dos Fonemas do Português Brasileiro Falado na Capital de Santa Catarina para elaboração de Frases Foneticamente Balanceadas. Dissertação de Mestrado, Universidade Federal de Santa Catarina, 1994.

SHEN, J. et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, 2018, pp. 4779-4783.

BibTeX

@inproceedings{Rosa2021,
  doi = {10.14209/sbrt.2021.1570727280},
  url = {https://doi.org/10.14209/sbrt.2021.1570727280},
  year = {2021},
  publisher = {Sociedade Brasileira de Telecomunica{\c{c}}{\~{o}}es},
  author = {Rodrigo K Rosa and Danilo Silva},
  title = {Convers{\~{a}}o Texto-Fala para o Portugu{\^{e}}s Brasileiro Utilizando Tacotron 2 com Vocoder Griffin-Lim},
  booktitle = {Anais do {XXXIX} Simp{\'{o}}sio Brasileiro de Telecomunica{\c{c}}{\~{o}}es e Processamento de Sinais}
}

tacotron2-gl-brazillian-portuguese's People

Contributors

Stargazers

Watchers

Forkers

luisfredgs rafaelogando intelmib flaviogoncalves ldodev

tacotron2-gl-brazillian-portuguese's Issues

Runtime error

Hello, I am trying to execute the example and I am getting the following errors:
Could you help me to understand and fix it?
Thank you!

ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Operation'>):
<tf.Operation 'Tacotron_model/inference/decoder/assert_greater/Assert/Assert' type=Assert>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\ops\check_ops.py", line 992, in assert_greater
return _binary_assert('>', 'assert_greater', math_ops.greater, np.greater, x, File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\ops\check_ops.py", line 463, in _binary_assert
with ops.name_scope(name, opname, [x, y, data]): File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 155, in error_handler
del filtered_tb File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\util\dispatch.py", line 1260, in op_dispatch_handler
return dispatch_target(*args, **kwargs) File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\util\tf_should_use.py", line 288, in wrapped
return _add_should_use_warning(fn(*args, **kwargs),

Traceback (most recent call last):
File "C:\Users\everton.aleixo\playground\linuxwell\ia-texto-fala\Tacotron-2\synthesize.py", line 102, in
main()
File "C:\Users\everton.aleixo\playground\linuxwell\ia-texto-fala\Tacotron-2\synthesize.py", line 96, in main
synthesize(args, hparams, taco_checkpoint, wave_checkpoint, sentences)
File "C:\Users\everton.aleixo\playground\linuxwell\ia-texto-fala\Tacotron-2\synthesize.py", line 38, in synthesize
wavenet_in_dir = tacotron_synthesize(args, hparams, taco_checkpoint, sentences)
File "C:\Users\everton.aleixo\playground\linuxwell\ia-texto-fala\Tacotron-2\tacotron\synthesize.py", line 135, in tacotron_synthesize
return run_eval(args, checkpoint_path, output_dir, hparams, sentences)
File "C:\Users\everton.aleixo\playground\linuxwell\ia-texto-fala\Tacotron-2\tacotron\synthesize.py", line 57, in run_eval
synth.load(checkpoint_path, hparams)
File "C:\Users\everton.aleixo\playground\linuxwell\ia-texto-fala\Tacotron-2\tacotron\synthesizer.py", line 30, in load
self.model.initialize(inputs, input_lengths, split_infos=split_infos)
File "C:\Users\everton.aleixo\playground\linuxwell\ia-texto-fala\Tacotron-2\tacotron\models\tacotron.py", line 169, in initialize
(frames_prediction, stop_token_prediction, ), final_decoder_state, _ = dynamic_decode(
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\typeguard_init.py", line 1033, in wrapper
retval = func(*args, **kwargs)
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow_addons\seq2seq\decoder.py", line 359, in dynamic_decode
zero_outputs = tf.nest.map_structure(
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\util\nest.py", line 631, in map_structure
return nest_util.map_structure(
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\util\nest_util.py", line 1066, in map_structure
return _tf_core_map_structure(func, *structure, **kwargs)
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\util\nest_util.py", line 1106, in _tf_core_map_structure
[func(*x) for x in entries],
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\util\nest_util.py", line 1106, in
[func(*x) for x in entries],
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow_addons\seq2seq\decoder.py", line 361, in
_prepend_batch(decoder.batch_size, shape), dtype=dtype
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow_addons\seq2seq\decoder.py", line 570, in _prepend_batch
return tf.concat(([batch_size], shape), axis=0)
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\everton.aleixo\AppData\Local\miniconda3\envs\tac\lib\site-packages\tensorflow\python\framework\ops.py", line 1020, in _create_c_op
raise ValueError(e.message)
ValueError: Shape must be rank 1 but is rank 0 for '{{node Tacotron_model/inference/decoder/concat}} = ConcatV2[N=2, T=DT_INT32, Tidx=DT_INT32](Tacotron_model/inference/decoder/concat/values_0, Tacotron_model/inference/decoder/concat/values_1, Tacotron_model/inference/decoder/concat/axis)' with input shapes: [1], [], [].

Problem to load weights

Hello,

I am having problem to load the pre-trained weights.
Is it all right?

It says that the graph has changed. I also noticed that in the pretrained weights are tensors of Adam optmizer.

Thank you.