Code Monkey home page Code Monkey logo

snac's Introduction

SNAC : Speaker-normalized Affine Coupling Layer in Flow-based Architecture for Zero-Shot Multi-Speaker Text-to-Speech

This is the unofficial pytorch implementation of SNAC We built up our codes based on VITS

  1. VCTK dataset is used.
  2. LibriTTS dataset (train-clean-100 and train-clean-360) is also supported.
  3. This is the implementation of Proposed + REF + FLOW in the paper.
  4. Major modifications are applied in modules.py (SN/SDN transformation) and losses.py (Revised log determinant)
  5. We followed the same flow setting with VITS, using volume-preserving transformation with the Jacobian determinant of one.
Text Encoder Duration Predictor Flow Vocoder
None Input addition SNAC None

Prerequisites

  1. Clone this repository.
  2. Install python requirements. Please refer requirements.txt
    1. You may need to install espeak first: apt-get install espeak
  3. Download datasets
    1. Download and extract the VCTK dataset, and downsample wav files to 22050 Hz. Then rename or create a link to the dataset folder: ln -s /path/to/VCTK-Corpus/downsampled_wavs DUMMY3
    2. For LibriTTS dataset, downsample wav files to 22050 Hz and link to the dataset folder: ln -s /path/to/LibriTTS DUMMY2
  4. Build Monotonic Alignment Search and run preprocessing if you use your own datasets.
# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace

Training Exmaple

python train.py -c configs/vctk_base.json -m vctk_base

Inference Example

See inference.ipynb

snac's People

Contributors

hcy71o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

snac's Issues

Proposed+REF+FLOW ?

Hello, are there any intensions to implement "Proposed+REF+FLOW" along with the already implemented "Baseline + REF + FLOW"

Thanks in advance.

kl loss 발산 관련

안녕하세요. SNAC 논문 잘 봤습니다!
현재 저는 proposed + pre-trained + flow 형태로 만들고 있는데요.
snac관련 부분은 해당 코드를 그대로 가져와서 구현했습니다(loss도 변경완료했습니다)
그런데 kl loss가 계속 nan으로 발산하는데 혹시 추가적으로 이런현상을 제어하기위해 코드변경하신부분이 있으신가요?
아니면 이런 현상을 겪어 보신적이 있으시면 해결방법 알려주시면 감사하겠습니다.
감사합니다.

How to use this repository?

I have reviewed the paper, which asserts that there is no audio or information from unknown speakers during the training process. However, in your code, your validation set includes information from unknown speakers. Doesn't this imply that the model had already assimilated information from unknown speakers during training, and the reference speech used for inference corresponds to the speech encountered during training?

image
image
For speaker p261,the model already meet this speaker?

please help me!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.