Code Monkey home page Code Monkey logo

sc-vits's Introduction

(Ongoing) Zero-shot TTS based on VITS

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Note

  1. This repository aims to implement a VITS-based zero-shot TTS system varying with diverse style/speaker conditioning methods.
  2. To remove the secondary elements, we simply extract a style representation by jointly training a reference encoder from StyleSpeech. In detail, 1. we do not utilize pretrained models (e.g., Link1, Link2) as the reference encoder, 2. we do not apply meta-learning or speaker verification loss during training.
  3. LibriTTS dataset (train-clean-100 and train-clean-360) is used for training.
Model Text Encoder Flow Posterior Encoder Vocoder
master(YourTTS) Output addition Global conditioning Global conditioning Input addition
transfer(TransferTTS) None Global conditioning None None
s1(Proposed) SC-CNN Global Conditioning Global Conditioning Input addition
s2(Proposed) SC-CNN SC-CNN SC-CNN TBD
  • master
  • transfer
  • s1
  • s2

Pre-requisites

  1. Python >= 3.6
  2. Clone this repository
  3. Install python requirements. Please refer requirements.txt
    1. You may need to install espeak first: apt-get install espeak
  4. Download datasets
  5. Build Monotonic Alignment Search and run preprocessing if you use your own datasets.
# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace

Training Exmaple

python train_zs.py -c configs/libritts_base.json -m libritts_base

Inference Example

See inference.ipynb

sc-vits's People

Contributors

hcy71o avatar jaywalnut310 avatar jik876 avatar juheeuu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.