The etos-tts from etosworld

ETOS TTS

ETOS TTS, aims to build a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. It is a PyTorch Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.

Usage

Requirements

python 3.6 or later
pytorch 0.4 is tested
for ubuntu, sudo apt install libsndfile1

you can use pip to install other requirements.

pip3 install -r requirements.txt

Testing

you can use pretrained model under models/may22 and run the tts web server:

python server.py -c server_conf.json

Then go to http://127.0.0.1:8000 and enjoy.

Data

Currently TTS provides data loaders for

LJ Speech

Training the network

To run your own training, you need to define a config.json file (simple template below) and call with the command.

train.py --config_path config.json

If you like to use specific set of GPUs.

CUDA_VISIBLE_DEVICES="0,1,4" train.py --config_path config.json

Each run creates an experiment folder with the corresponfing date and time, under the folder you set in config.json. And if there is no checkpoint yet under that folder, it is going to be removed when you press Ctrl+C.

You can also enjoy Tensorboard with couple of good training logs, if you point --logdir the experiment folder.

Example config.json:

{
  "num_mels": 80,
  "num_freq": 1025,
  "sample_rate": 22050,
  "frame_length_ms": 50,
  "frame_shift_ms": 12.5,
  "preemphasis": 0.97,
  "min_level_db": -100,
  "ref_level_db": 20,
  "embedding_size": 256,
  "text_cleaner": "english_cleaners",

  "epochs": 200,
  "lr": 0.002,
  "warmup_steps": 4000,
  "batch_size": 32,
  "eval_batch_size":32,
  "r": 5,
  "mk": 0.0,  // guidede attention loss weight. if 0 no use
  "priority_freq": true,  // freq range emphasis

  "griffin_lim_iters": 60,
  "power": 1.2,

  "dataset": "TWEB",
  "meta_file_train": "transcript_train.txt",
  "meta_file_val": "transcript_val.txt",
  "data_path": "/data/shared/BibleSpeech/",
  "min_seq_len": 0,
  "num_loader_workers": 8,

  "checkpoint": true,  // if save checkpoint per save_step
  "save_step": 200,
  "output_path": "/path/to/my_experiment",
}

TODO

wavenet vocoder for better quality
IAF or NAF for real time performance

etosworld / etos-tts Goto Github PK

etos-tts's Introduction

ETOS TTS

Usage

Requirements

Testing

Data

Training the network

TODO

References

Thanks

etos-tts's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent