Comments (2)
It will be available by the end of this month. As for ASMR TTS, it probably needs more than the current framework of StyleTTS (or StyleTTS 2), because it is mostly unvoiced whisper (so F0 and energy do not make too much sense here). You may want to look for papers working on whisper speech synthesis and see if you can bring some ideas from there.
from styletts2.
What I infer is that duration accuracy is crucial to pick the speech style, it does not need to pick the exact whispered speech.
my experiments with finetuning styleTTS (with PL_BERT) using different datasets show that using CE loss, duration loss attains a value of 0.2 (starting from 1.2) during second stage training while distorting all other losses. Although I haven't really tried inferring all of them, but my desire is to match the speech duration with the ground truth, voice can be changed through a pipeline, since the inference speed of styleTTS is so fast.
with CE loss off, the best mel_loss I got was ~.23 during first stage. I keep changing lr from .00005 to .0001 as per the dataset size. Sharing all this info because I would want to meet the ideal scenario for the kind of speech I am looking to generate.
Also, I am keeping datasets with number of clips ranging for 1000-2000 for the purpose of finetuning
from styletts2.
Related Issues (20)
- Strange Loss Behavior During Stage Two Training - Not Decreasing after Diff Epoch HOT 2
- Finetune on ljspeech or libritts? HOT 1
- Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? HOT 3
- SLM Adversarial Training did not start when finetuning HOT 11
- Second stage training with smaller window size HOT 1
- Possible Bug in Style Diffusion Inference Code
- Issue with impropper pauses and random bursts of noise
- Cannot Convert float NaN to integer HOT 1
- HELP WANTED!!!!!!!!!!! HOT 3
- asr negative loss
- Resuming finetuning uses second to last epoch
- Help Wanted For Stage-1 HOT 2
- Inference with multilingual PL-BERT Model HOT 4
- During training, the graphics memory has been continuously increasing
- May be a bug? input parameters for model.predictor_encoder and model.style_encoder in train_finetune.py
- S_loss = 0 ... why? HOT 2
- Inference Error: context_features exists but no features provided HOT 1
- Speech conditioning like tortoise TTS HOT 1
- FP8 Fine Tuning Crashes HOT 1
- Error Message After Using a fine tuned ASR Model
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from styletts2.