Comments (26)
I'd like to share a Tacotron2-DCA model and a Univnet model I trained on the Nancy corpus.
Here is a sample:
sample.mp4
The link to the models:
https://drive.google.com/drive/folders/1bMNOjjYxcCkgwkcYAlsPR3qM4hZQzAOR?usp=sharing
Thanks again for the great work!
from tts.
Feel free to ask specific question. I'd happy to share my experiences on recording a new dataset here.
- Find/Create a text corpus to record (one sentence = 1 recording)
- Replace numbers to text
- Create csv file from corpus
- Check Mimic-Recording-Studio from Mycroft as recording environment (https://github.com/MycroftAI/mimic-recording-studio)
- Start recording
- Constant speed while recordings
- Speak all chars clearly
- Speak in neutral voice
- Use good microphone equipment
- Find a recording place without random noise
from tts.
I trained Tacotron 2 for 130K steps with this code https://github.com/kaiidams/TTS/tree/kaiidams/kokoro which was forked from the latest main.
https://drive.google.com/drive/folders/1-1_HB-ogmvD-qYaHm8D5Xp1pWq9HKhB_?usp=sharing
The included sample.wav was generated with vocoder_models/universal/libri-tts/wavegrad.
The input of the model is Romanized Japanese text. It requires some dependencies like MeCab to convert texts from ordinary ones.
The dataset is the public domain and the reader knows about the dataset. I think I can provide Python code for text conversion.
from tts.
Hi @erogol , thank you for the amazing work, from Mozilla TTS to coqui-ai. Although Mozilla seemed perfect to me as it had wider community reach, just hope this grows even wider and faster than Mozilla. I am planning to share my models for Spanish and Italian using (Taco2 600k steps + WaveRNN). Audio quality seems to be good but I need to train it a bit more and also ask dataset providers if that would be okay if I make the models public.
Fingers crossed.
Let me know if I can contribute in any way I have Google Colab Pro resources laying around free.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 35C P0 24W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
from tts.
Any ELI5 tutorial/doc for creating a dataset for your own language/dialect?
from tts.
I really hope we can include your models, of course with the right attribution going to you.
I hope they allow me, otherwise I would see it as wasting my time and effort.
Just waiting for your signal.
I will let you know when I get the confirmation.
If you just like to train models, let me know we can also find new datasets to attack.
Training models on colab can be a bit annoying as sessions often get disconnected even with all the tricks in the book.
Nonetheless, I would love to train model on new datasets (if you have any) specially in the languages in which TTS models haven't been made public yet.
from tts.
To proceed, I'd like to know which branch and repo do you recommend for me to use? https://github.com/erogol/TTS_recipes seems a bit old.
Please use this https://github.com/coqui-ai/TTS instead of https://github.com/mozilla/TTS and use the latest main branch. @kaiidams
from tts.
@thorstenMueller Perfect timing, thank you
from tts.
You're welcome @zubairahmed-ai :-).
I'm currently finishing some recording stuff for my emotional dataset and train a Fullband-MelGAN vocoder. So i've no time left to look at other models like Align-TTS. But feel free to train a "Thorsten" model with Align-TTS ;-).
from tts.
Asking people to share their models can also be added to the CONTRIBUTING.md, since it is asking for contributions. I'd be up to doing that, if no one has taken it up yet?
from tts.
I would like to contribute my own model.. but I stuck in middle.. I have created dataset(LJSpeech) of my own voice . For training my model I need config.json file , so can anyone provide me the template of config.json file for LJSpeech dataset format required to train my model.
Thanks in Advance
from tts.
Not sure if it is ELI5, but there is this link https://github.com/coqui-ai/TTS/wiki/What-makes-a-good-TTS-dataset
Also, @thorstenMueller has created a TTS dataset from the gecko so he might have valuable comments if you have specific questions.
from tts.
@Sadam1195 thx for the amazing work 🚀🚀.
I really hope we can include your models, of course with the right attribution going to you.
Just waiting for your signal.
For general contribution, this is a nice place to start https://github.com/coqui-ai/TTS/blob/main/CONTRIBUTING.md
If you just like to train models, let me know we can also find new datasets to attack.
from tts.
Hello,
I've just started to train a public domain Japanese dataset https://github.com/kaiidams/Kokoro-Speech-Dataset with Tacotron 2 of the latest master of https://github.com/mozilla/TTS on Google Colab Free. After 19K steps, I can hear what he says, although it is metallic.
To proceed, I'd like to know which branch and repo do you recommend for me to use? https://github.com/erogol/TTS_recipes seems a bit old.
from tts.
@kaiidams if you can send a PR for text conversion something similar to the Chinese API we have, with the model, would be a great contribution.
from tts.
Feel free to ask specific question. I'd happy to share my experiences on recording a new dataset here.
Find/Create a text corpus to record (one sentence = 1 recording)
Replace numbers to text
Create csv file from corpus
Check Mimic-Recording-Studio from Mycroft as recording environment (https://github.com/MycroftAI/mimic-recording-studio)
Start recording
- Constant speed while recordings
- Speak all chars clearly
- Speak in neutral voice
- Use good microphone equipment
- Find a recording place without random noise
Any reason why this and this isn't in the readme?
I had to look up training to reach here
from tts.
Hi @zubairahmed-ai.
Here's a talk a made on how to record a voice dataset if that's helpful for you.
from tts.
Oh just realized this talk happened during recent Google I/O and I somehow didn't catch it while watching other videos :)
from tts.
@thorstenMueller Thanks so much for the great video explaining your process in details with some tips. I'll make sure I follow that, do you plan to give a try to other models besides Tacotron-2? like Align-TTS?
from tts.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
from tts.
yeah good point. Feel free to take on it.
from tts.
@ManoBharathi93 you can start from the LJSpeech recipes in the recipes
folder and change the config fields for your dataset specs. You can find more info here https://tts.readthedocs.io/en/latest/
from tts.
@erogol thanks a lot sir
from tts.
Hello folks, How can I add drop-down Menu to list available models(downloaded models) in WEB-UI and when I change the server.py file the web interface is not changing ? please mention which file name want to make changes impact in WEB-UI..
from tts.
@godspirit00 the quality is awesome.
from tts.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
from tts.
Related Issues (20)
- Finetuning for new language HOT 1
- [Bug] Wrong value for perceiver_cond_length_compression (256 instead of 1024) HOT 1
- [!] `train_step()` retuned `None` outputs. Skipping training step. HOT 1
- [Bug] fairseq fix missing dataset, model var initialization HOT 2
- Question: Why is the model size different when trained using train_gpt_xtts.py in xtts_v2 compared to the baseline model?
- cannot import name 'magphase' from 'librosa' HOT 2
- [Bug] Time taken to run TTS command far greater than actual processing time HOT 8
- [Bug] Unable to use xtts_v2 with mps device on Apple Silicon
- [Bug] Cannot use Docker image HOT 1
- [Bug] very longinstallation that ends up with error HOT 2
- [Feature request] Language Support ("Hindi") missing in XTTS on local machine. HOT 2
- [Bug] bug in tts_to_file HOT 1
- PermissionError: [WinError 32] The process cannot access the file because it is being used by another process. HOT 8
- [Bug] Install bug Failed to download the model file to tts_models--multilingual--multi-dataset--xtts_v2 HOT 1
- [Bug] Unable to install Coqui TTS HOT 5
- [Bug] compute_statistics.py isn't working.
- [Bug] Training xtts v2 with original dataset which is multilingual and multispeaker HOT 8
- [Bug] Voice lag and pronounce punctuation
- [Feature request] update doc for convert model to hugginface
- [Feature request] Add Recipe for all 3 Training stages - XTTS V2 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tts.