Comments (16)
Hi, someone asked here if I would release a local Gradio GUI to run (the comment was later deleted for some reason, but it was still in my inbox).
I am planning to eventually release it and perhaps make a PR to the main repository, but the code quality is currently pretty... low. I'm going to clean it up a bit and then try to release it.
from styletts2.
A few more features that could be added:
- Now the limit is 300 characters, but the
max_length
of BERT encoder is 512, so I think a better way of checking this limit is first phonemize the input and then uselen()
on the phoneimzed texts and make sure it is less than 512. - Probably we can add duration control and pitch control as well. To control duration, we can do yl4579/StyleTTS#3. To control the pitch, we can do the same but scale the F0. Another more natural way is to change the pitch of the reference audio and use that to sample a style, then combine it with the original style.
- We can add emotion control with style transfer, though it may not be very obvious in LibriTTS dataset due to the data itself is not overly emotional.
Thanks again for your help in making the demo!
from styletts2.
Hi, I can try to implement this. 1 and 2 seem doable, but 3 seems a bit harder. I'll look into this later today! Thanks for the suggestions!
from styletts2.
Ok, I'll remove the long text feature in a couple minutes or add a character limit
from styletts2.
Hi, someone asked here if I would release a local Gradio GUI to run (the comment was later deleted for some reason, but it was still in my inbox).
I am planning to eventually release it and perhaps make a PR to the main repository, but the code quality is currently pretty... low. I'm going to clean it up a bit and then try to release it.
@fakerybakery thanks a lot for your reply , I'm looking forward for the local version ,I tested the huggingface demo and it looks awesome !
from styletts2.
Yeah, I'll start doing that. However I'm using macOS and can't figure out how to install espeak-ng for phonemizer (I tried MacPorts but it didn't work - maybe I'll develop it on a VM)
from styletts2.
I'm not familiar with Gradio. I did try it for StyleTTS but had no success. I would take a look at it later when I get time, but if anyone is interested in making a demo for now feel free to contribute!
from styletts2.
Someone is already working on it: #53, and we are figuring out some details of it.
I will let you know when it is ready.
from styletts2.
Hi @AK391. I’ve released a Gradio demo here with voice cloning, multi-speaker support, and LJSpeech support.
from styletts2.
@fakerybakery I think for the default voices, it would be great if you could find all the audio samples in the training data and compute the styles of each sample and take the average, then save it as the speaker embedding. This is probably more efficient than computing the style every time it is run, and also more accurate reflection of the speaker.
from styletts2.
Yes, you’re probably right. No wonder starting the demo took so long each time! Thank you, I’ll push a fix tomorrow :)
from styletts2.
Thanks to @AK391 for posting this solution on X/Twitter! Just realized you can run any Hugging Face space on Docker.
docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all \
registry.hf.space/styletts2-styletts2:latest python app.py
from styletts2.
Can you please remove the "Access code" in the "Long Text"?
It is a problem when the docker is run locally.
from styletts2.
Hi @yl4579, a couple things:
- I tried saving the speaker embeddings with pickle but it had an issue switching between CPU + GPU. Do you have any tips for resolving this?
- For
len
, do you mean just get the length of phonemes? Or tokens?
from styletts2.
- You can save it to CPU (or even as numpy array) and then do
.to('cuda')
. - Technically you should do it with tokens, but each character is a single token,
len
should be fine too.
from styletts2.
I am planning to eventually release it and perhaps make a PR to the main repository, but the code quality is currently pretty... low. I'm going to clean it up a bit and then try to release it.
Would you like to start with making a local copy of the current HF demo and then iterate over it to improve it? @fakerybakery
from styletts2.
Related Issues (20)
- Small bug in train_finetune
- Very high GPU memory usage in voice cloning after 10-15 runs. HOT 1
- Strange Loss Behavior During Stage Two Training - Not Decreasing after Diff Epoch HOT 2
- Finetune on ljspeech or libritts? HOT 1
- Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? HOT 3
- SLM Adversarial Training did not start when finetuning HOT 11
- Second stage training with smaller window size HOT 1
- Possible Bug in Style Diffusion Inference Code
- Issue with impropper pauses and random bursts of noise
- Cannot Convert float NaN to integer HOT 1
- HELP WANTED!!!!!!!!!!! HOT 3
- asr negative loss
- Resuming finetuning uses second to last epoch
- Help Wanted For Stage-1 HOT 2
- Inference with multilingual PL-BERT Model HOT 4
- During training, the graphics memory has been continuously increasing
- May be a bug? input parameters for model.predictor_encoder and model.style_encoder in train_finetune.py
- S_loss = 0 ... why? HOT 2
- Inference Error: context_features exists but no features provided HOT 1
- Speech conditioning like tortoise TTS HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from styletts2.