Light

hcy71o / snac Goto Github PK

View Code? Open in Web Editor NEW

56.0 6.0 10.0 14.32 MB

Unofficial Pytorch implementation of SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

License: MIT License

Python 94.26% Jupyter Notebook 4.83% Cython 0.91%

text-to-speech zero-shot

snac's Introduction

SNAC : Speaker-normalized Affine Coupling Layer in Flow-based Architecture for Zero-Shot Multi-Speaker Text-to-Speech

This is the unofficial pytorch implementation of SNAC We built up our codes based on VITS

VCTK dataset is used.
LibriTTS dataset (train-clean-100 and train-clean-360) is also supported.
This is the implementation of Proposed + REF + FLOW in the paper.
Major modifications are applied in modules.py (SN/SDN transformation) and losses.py (Revised log determinant)
We followed the same flow setting with VITS, using volume-preserving transformation with the Jacobian determinant of one.

Text Encoder	Duration Predictor	Flow	Vocoder
None	Input addition	SNAC	None

Prerequisites

Clone this repository.
Install python requirements. Please refer requirements.txt
1. You may need to install espeak first: apt-get install espeak
Download datasets
1. Download and extract the VCTK dataset, and downsample wav files to 22050 Hz. Then rename or create a link to the dataset folder: ln -s /path/to/VCTK-Corpus/downsampled_wavs DUMMY3
2. For LibriTTS dataset, downsample wav files to 22050 Hz and link to the dataset folder: ln -s /path/to/LibriTTS DUMMY2
Build Monotonic Alignment Search and run preprocessing if you use your own datasets.

# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace

Training Exmaple

python train.py -c configs/vctk_base.json -m vctk_base

Inference Example

See inference.ipynb

snac's People

Contributors

Stargazers

Watchers

Forkers

ishine innnky maxmax2016 aixingxy awakingswings deyituo zhangziliang04 silyfox entn-at saber5433

snac's Issues

请问可以小样本克隆音乐吗？

请问可以小样本克隆音乐吗？

Have you got good result

In my experiments, the wav generated is not similar as the ref audio

Proposed+REF+FLOW ?

Hello, are there any intensions to implement "Proposed+REF+FLOW" along with the already implemented "Baseline + REF + FLOW"

Thanks in advance.

kl loss 발산 관련

안녕하세요. SNAC 논문 잘 봤습니다!
현재 저는 proposed + pre-trained + flow 형태로 만들고 있는데요.
snac관련 부분은 해당 코드를 그대로 가져와서 구현했습니다(loss도 변경완료했습니다)
그런데 kl loss가 계속 nan으로 발산하는데 혹시 추가적으로 이런현상을 제어하기위해 코드변경하신부분이 있으신가요?
아니면 이런 현상을 겪어 보신적이 있으시면 해결방법 알려주시면 감사하겠습니다.
감사합니다.

How to use this repository?

I have reviewed the paper, which asserts that there is no audio or information from unknown speakers during the training process. However, in your code, your validation set includes information from unknown speakers. Doesn't this imply that the model had already assimilated information from unknown speakers during training, and the reference speech used for inference corresponds to the speech encountered during training?

For speaker p261，the model already meet this speaker？

please help me！

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.