uberduck-ai / uberduck-ml-dev Goto Github PK

View Code? Open in Web Editor NEW

377.0 377.0 60.0 79.04 MB

ML models for Uberduck

License: Apache License 2.0

Python 19.90% Roff 79.99% Shell 0.02% Jupyter Notebook 0.09%

uberduck-ml-dev's People

Contributors

Stargazers

Watchers

Forkers

justinjohn0306 wtfcoderz alinccc joshuaword2alt a913440 mariusnarvalo darylvjones cardinalsystem256 1zu zurimokato dede2007 alejandrosuarez chris-rw cookieppp davincee funkgeek muhtar1907 raytrac3r sobsz dekirumonka etmyhome skilomlg krompro olusiowiec techthiyanes furansujin railsloes wily-coyote zombiekanapa johnpaulbin remi9martin isneil lcsouzamenezes ro-ol grantmike echallenge gacwr baitphish sabdn stephanabs ifgcguitarclub nonomjn yushi111 l4b4r4b4b4 daebowale pimax1 aborayan2022 sequpl nigelwu95 moriedan fai247 k4ly4s abdellahgoplatform ducktapedevops meshalal kingjmoney patrickultra casevcab

uberduck-ml-dev's Issues

Regarding Character Voice Cloning 3

"We appreciate the support. Unfortunately, sticking our heads out on this didn't make sense, but we are sure someone else will."

I'm sorry, but what do you mean by didn't make sense? In order to ensure the character voice cloning survival, the proposals I have provided, need to be taken on board in order to have all sides satisfied. It's very important.

And plus, what I will also propose, is to release the source code for character voice cloning in general.

I am also slightly annoyed because, despite sharing those proposals, they are not getting on board, as much as I would love to see fictional character voice cloning to eternally march on. To segway to FakeYou, all I got were comments from evasive fake account replies in Discord.

And I am also annoyed that rather than looking for compromises and ways so all sides are satisfied, like my own proposed ideas, the features are getting done away with without giving any sort of considerations. Besides, like I said, there are people who just use the programs for fun and that's about it, and not for some sort of projects. All I want, is for the proposals to be considered to ensure the success of fictional character voice cloning. Thank you.

FileNotFoundError: [Errno 2] No such file or directory: '(name-of-file).wav'

When I run the training script it seems to go well but then it says it cannot locate one of the wav files.

I've gone into the filelist and tried removing the entries but it would just keep listing another wav not being able yo be located.

I've made sure my config has the correct paths to everything and I've verified multiple times the wav files are there.

When I enter in the command this is what I get:

python -m uberduck_ml_dev.exec.train_tacotron2 --config "tacotron2_config.json"
TTSTrainer start 9218.209915733
Initializing trainer with hparams:
{'attention_dim': 128,
'attention_location_kernel_size': 31,
'attention_location_n_filters': 32,
'attention_rnn_dim': 1024,
'audio_encoder_dim': 192,
'audio_encoder_path': None,
'batch_size': 18,
'checkpoint_name': 'morgan_freeman',
'checkpoint_path': 'checkpoints',
'coarse_n_frames_per_step': None,
'config': 'tacotron2_config.json',
'cudnn_enabled': True,
'dataset_path': '.',
'debug': False,
'decoder_rnn_dim': 1024,
'distributed_run': False,
'encoder_embedding_dim': 512,
'encoder_kernel_size': 5,
'encoder_n_convolutions': 3,
'epochs': 5001,
'epochs_per_checkpoint': 10,
'filter_length': 1024,
'fp16_run': False,
'gate_threshold': 0.5,
'get_gst': None,
'grad_clip_thresh': 1.0,
'gst_dim': 2304,
'gst_type': 'torchmoji',
'has_speaker_embedding': True,
'hop_length': 256,
'ignore_layers': ['speaker_embedding.weight'],
'include_f0': False,
'is_validate': True,
'learning_rate': 0.0005,
'load_f0s': False,
'load_gsts': False,
'log_dir': 'runs',
'lr_decay_min': 1e-05,
'lr_decay_rate': 216000,
'lr_decay_start': 15000,
'mask_padding': True,
'max_decoder_steps': 1000,
'max_wav_value': 32768.0,
'mel_fmax': 8000.0,
'mel_fmin': 0.0,
'n_frames_per_step_initial': 1,
'n_mel_channels': 80,
'n_speakers': 1,
'num_heads': 8,
'num_workers': 1,
'p_arpabet': 0.0,
'p_attention_dropout': 0.1,
'p_decoder_dropout': 0.1,
'p_teacher_forcing': 1.0,
'pin_memory': True,
'pos_weight': None,
'postnet_embedding_dim': 512,
'postnet_kernel_size': 5,
'postnet_n_convolutions': 5,
'prenet_dim': 256,
'ref_enc_filters': [32, 32, 64, 64, 128, 128],
'ref_enc_gru_size': 128,
'ref_enc_pad': [1, 1],
'ref_enc_size': [3, 3],
'ref_enc_strides': [2, 2],
'sample_inference_speaker_ids': [0],
'sample_inference_text': 'That quick beige fox jumped in the air loudly over '
'the thin dog fence.',
'sample_rate': 22050,
'sampling_rate': 22050,
'seed': 123,
'speaker_embedding_dim': 128,
'steps_per_sample': 50,
'symbol_set': 'nvidia_taco2',
'symbols_embedding_dim': 512,
'text_cleaners': ['english_cleaners'],
'torchmoji_model_file': '/home/rage/CodingProjects/uberduck-ml-dev-master/pytorch_model.bin',
'torchmoji_vocabulary_file': '/home/rage/CodingProjects/uberduck-ml-dev-master/vocabulary.json',
'training_audiopaths_and_text': '/home/rage/CodingProjects/uberduck-ml-dev-master/project/wavs/filelist.txt',
'val_audiopaths_and_text': '/home/rage/CodingProjects/uberduck-ml-dev-master/project/wavs/filelist.txt',
'warm_start_name': '/home/rage/CodingProjects/uberduck-ml-dev-master/tacotron2_statedict.pt',
'weight_decay': 1e-06,
'win_length': 1024,
'with_audio_encoding': False,
'with_f0s': False,
'with_gsts': False}
start train 9219.320274948
Initialized Torchmoji GST
Starting warm_start 9220.987589312
WARNING! Attempting to load a model with out the speaker_embedding.weight layer. This could lead to unexpected results during evaluation.
WARNING! Attempting to load a model with out the spkr_lin.weight layer. This could lead to unexpected results during evaluation.
WARNING! Attempting to load a model with out the spkr_lin.bias layer. This could lead to unexpected results during evaluation.
WARNING! Attempting to load a model with out the gst_lin.weight layer. This could lead to unexpected results during evaluation.
WARNING! Attempting to load a model with out the gst_lin.bias layer. This could lead to unexpected results during evaluation.
Ending warm_start 9221.034127661
Error while getting data: index = 43
[Errno 2] No such file or directory: 'mf00-44.wav'
Exception raised while training: [Errno 2] No such file or directory: 'mf00-44.wav'
Traceback (most recent call last):
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/uberduck_ml_dev/exec/train_tacotron2.py", line 46, in
run(None, None, hparams)
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/uberduck_ml_dev/exec/train_tacotron2.py", line 27, in run
raise e
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/uberduck_ml_dev/exec/train_tacotron2.py", line 23, in run
trainer.train()
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/uberduck_ml_dev/trainer/tacotron2.py", line 446, in train
for batch_idx, batch in enumerate(train_loader):
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/uberduck_ml_dev/data/data.py", line 303, in getitem
data = self._get_data(self.audiopaths_and_text[idx])
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/uberduck_ml_dev/data/data.py", line 264, in _get_data
sampling_rate, wav_data = read(audiopath)
File "/home/rage/anaconda3/envs/test-env/lib/python3.10/site-packages/scipy/io/wavfile.py", line 647, in read
fid = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'mf00-44.wav'

What other potential solutions could I try?

Tacotron multispeaker training orders speakers unreliably

No idea what could be causing it, but it seems to rely on speaker count: an 8-speaker model of mine had the inference IDs match the training ones, whereas a 20-speaker one is jumbled up. dhama the llama on Discord has been able to (eventually) find all the training voices in their 61-speaker model, so it's likely that the IDs are only being shuffled and not discarded.

"here, here, here . . ."

Getting this while training:

It doesn't harm training it seems, it does resume loss, but it does happen pretty frequently. Any explanation?

Model inferencing failing- Gibberish or repetition.

After multiple contributors on the uberduck discord server posted their mellotron model to me, I've noticed that the inferenced audio seems to be repeating itself (and is coincidentally[?] the same audio from training audio in Tensorboard.)

Example-- this was inferenced on a spongebob model (~500+ wavs on ~5000 epochs*, batch size 24, with arpabet)

Audio

The audio doesn't seem to change from the usual repetition sound that it sticks to, no matter arpabet input or not and change of input. This seems to be similar to other runs, and was wondering if this is a cause of something in parameters, dataset, or simply the time it took to train.

* Gosmokeless28 claimed to train it for the full 5000 epochs

Disabling Validation Inference

There should be a way to disable the validation parts between each epoch in the colab. For massive datasets, it can take up to 20 minutes to generate those samples. Being able to disable it would save a lot of time.

Question about zero-shot TTS

Thank you for your open source work, but I seem to have not found the complete implementation of zero-shot TTS.

The default dataset for radtts in the tutorials does not include the file coqui_resnet_512_emb.pt. Where can I find this file or is there any related code to generate this file?
The zero-shot feature in radtts seems incomplete.
The zero-shot feature in vits requires loading a pretrained model, but it doesn't seem to be provided. Also, the training code for the corresponding encoder is not available as well.
Are there any examples or demos related to zero-shot TTS?
How to Inference the model generated by the radtts?

pip dependency errors in "install pipeline" step

Running this step:

#@title Install pipeline
#!pip install -q torch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0
!pip install -q git+https://github.com/uberduck-ai/uberduck-ml-dev.git

I get this error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.13.1+cu113 requires torch==1.12.1, but you have torch 1.9.0 which is incompatible.
torchtext 0.13.1 requires torch==1.12.1, but you have torch 1.9.0 which is incompatible.
torchaudio 0.12.1+cu113 requires torch==1.12.1, but you have torch 1.9.0 which is incompatible.

how to train my voice for tts then voice to voice ?

Hi guys, its a bit hard for me to figure out what are the steps to follow to achieve something great.

I would like to train a voice first for tts and then for voice to voice in order to match other singers samples.

how to process ? what do you use for the voice to voice RADTTS ?

thank you very much

website - text to speech does not work - neighter on Chrome nor Safari (Mac)

get a lot of errors in console:

What uberduck really needs is the voice of Bruce Willis

This is a personal issue close to my heart.
When I was a child growing up we had the voice of Bruce the fuck Willis in our car navi and it would talk exactly like die hard.
Unfortunately, never found that again.
Not sure how you add voices but Bruce Willis is surely a must have.

Regarding Character Voice Cloning

I recently came back to the Uberduck.ai website and its text-to-speech page, and the character voice clone feature of the site is all gone. Why is that?

Regarding Character Voice Cloning 2: Test in Courts, and Make Compromises If Possible

I know it might be meaningless to make an offshoot to the last conversation related to the character voice cloning feature, but I am pleading you, on the behalf of the character voice cloning community, to test this all in court, and to make compromises if this all can be feasible.

Because character voice cloning is very useful when creating fan based content, especially when it comes to Gmod animations. And besides, there are people like myself, who wants to use it for fun and that's about it.

What I would propose, is to create some sort of online virtual license if people are to use the programs, and to also increase accountability of those who use the program as well. The important obligations however, is that the personal information will NOT be shared to the authorities unless the person is abusing the program. The virtual license should also not be susceptible to expiration, and it must be free to obtain due to the program not being a real-life sort of thing.

And what I would also want to see in turn, is to have rule of rule within the community, where everybody is equal under the rules. Particularly the people running the program.

Datasets used

Are the datasets used available? Am interested in the transformers and the Ratatouille one

Pretrained models issue

Training a Mellotron model is pretty difficult if you don't have 10,000 wav files in your dataset... Are any current models being trained to be used for warm starting / transfer learning?

The pretrained models in Nvidia's official Mellotron implantation are adequate, but not successful in my tests. LJS (LJspeech) model is the only model I've gotten the best quality on, but the training times are still in the hours (5+ hours for 600 wavs to sound decently good) whilst on the other hand, regular Tacotron2 models take about 1 / 2 hours for the same amount of wavs.

Any info on any training / available models? If so, what is the dataset and ETA (if possible?)

Tests passing locally but not in github action

fft_window = pad_center(fft_window, filter_length)
TypeError: pad_center() takes 1 positional argument but 2 were given

I think could be a librosa versioning issue from the recent 0.10.0 release. The tests pass with librosa==0.8.0 so I updated the dependency. Lets see what happens.

Fix: NoneType for warm start

if the user wishes to warm_start() from a model that has optimizer, and their config has null for ignored_layers, they will be unable to start training. A fix is:

https://github.com/uberduck-ai/uberduck-ml-dev/blob/master/uberduck_ml_dev/trainer/base.py

line 139:
if "optimizer" in checkpoint and len(self.ignore_layers) == 0:

to:
if "optimizer" in checkpoint and self.ignore_layers == None:

Packaging model for release

Hi there,

Will there be an option in the future to package the model along with it's training configuration (HParams) data inside a zip archive? This could be helpful for organization and easier inferencing.

Inferencing the model

Is there any way to currently inference the model and create an output?

Windows compatibility issues: UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 925: character maps to <undefined>

Running: pip install git+https://github.com/uberduck-ai/uberduck-ml-dev.git

returns with: UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 925: character maps to

Keyboard interrupt catch

Sometimes, things don't go your way, and your already deep down the rabbit hole. Maybe you've spent 2 hours training a model, and you just realized now you want to change some config, but you have set your epoch checkpoint interval to be waay too much. Would it be possible to catch a keyboard interrupt while training, to do a graceful shutdown (save checkpoint n such?)