rvc-boss / gpt-sovits Goto Github PK
View Code? Open in Web Editor NEW1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
License: MIT License
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
License: MIT License
如题,在GPT_SoVITS/prepare_datasets/3-get-semantic.py", line 59, in vq_model.load_state_dict报错size mismatch for enc_p.text_embedding.weight: copying a param with shape torch.Size([322, 192]) from checkpoint, the shape in current model is torch.Size([151, 192]).
请问这是预训练模型不对的原因吗?
GPT 微调报错,但是 1Ba-SoVITS训练 是可以的。 GPT 的训练报错:
"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
semantic_data_len: 0
phoneme_data_len: 5
Empty DataFrame
Columns: [item_name, semantic_audio]
Index: []
Traceback (most recent call last):
File "/data/home/GPT-SoVITS/GPT_SoVITS/s1_train.py", line 171, in <module>
main(args)
File "/data/home/GPT-SoVITS/GPT_SoVITS/s1_train.py", line 147, in main
trainer.fit(model, data_module, ckpt_path=ckpt_path)
File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 950, in _run
call._call_setup_hook(self) # allow user to setup lightning_module in accelerator environment
File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 92, in _call_setup_hook
_call_lightning_datamodule_hook(trainer, "setup", stage=fn)
File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
return fn(*args, **kwargs)
File "/data/home/GPT-SoVITS/GPT_SoVITS/AR/data/data_module.py", line 29, in setup
self._train_dataset = Text2SemanticDataset(
File "/data/home/GPT-SoVITS/GPT_SoVITS/AR/data/dataset.py", line 107, in __init__
self.init_batch()
File "/data/home/GPT-SoVITS/GPT_SoVITS/AR/data/dataset.py", line 187, in init_batch
for _ in range(max(2, int(min_num / leng))):
ZeroDivisionError: division by zero
感觉前一步训练集格式化,开启 SSL 提取就有点问题,不知道和这个有关系没。
"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/2-get-hubert-wav32k.py
"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/2-get-hubert-wav32k.py
Some weights of the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of HubertModel were not initialized from the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/3-get-semantic.py
"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/3-get-semantic.py
/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
vitsvc知道吧 可以用text训 也可以用ppg训 比如原始论文里的vc是用text训的 sovits是用的比如whisper的ppg或者hubert+vq
但是直接从reference wav里提hubert再去vq推理的时候会有音色泄露 所以作者就用一个gpt模型来从text里预测hubert+vq 以reference音色作为prompt 这样推理阶段生成出来的hubert+vq就会少音色泄露 换句话说 你用类似的方案但预测whisper的ppg也是可以的
但由于整体的topline就是预训练的hubert+vq based vitsvc 从视频里可以看出zero-shot的能力并没有特别强 因为本身vitsvc就不是用来做zero-shot的 所以总体来讲这个不是一个大模型 但由于是vitsvc的方案的改进 音色泄露减小了 所以做few-shot是可以的 是一个比较实用的模型 如果vitsvc做成zero-shot的vitsvc 那就可以变成一个大模型 由于semantic based vc是可以用脏数据训练的 所以猛上大数据说不定可以变成一个大模型
执行asr 报错
KeyError: 'funasr-pipeline is not in the pipelines registry group auto-speech-recognition. Please make sure the correct version of ModelScope library is used.'
怎么解决
你好,我在點擊开启GPT训练之後出現了以下錯誤,請問該如何解決?
點擊开启SoVITS训练是正常的
我是windows10下運行完整包
"runtime\python" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
File "I:\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 138, in <module>
main(args)
File "I:\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 115, in main
trainer.fit(model, data_module, ckpt_path=ckpt_path)
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
self.strategy.setup_environment()
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 148, in setup_environment
self.setup_distributed()
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 199, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\distributed_c10d.py", line 888, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 245, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 176, in _create_c10d_store
return TCPStore(
RuntimeError: unmatched '}' in format string
切分的音频都是5s内,但是在执行开启ssl的时候,对应的4-cnhubert下无法生成目标文件。原因是代码执行到了: if np.isnan(ssl.detach().numpy()).sum()!= 0:return,跳出了。请问什么样的数据会导致这样的跳出,我该准备满足什么样的音频才可执行成功。
colab run webui.py
/content/GPT-SoVITS
Traceback (most recent call last):
File "/content/GPT-SoVITS/webui.py", line 17, in <module>
with open("%s/users.pth"%(site_packages_root),"w")as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/GPT-SoVITS/runtime/Lib/site-packages/users.pth'
能给个colab notebook吗?
pytorch2.1.0 py310 cu118 ubuntu22系统环境
文本内容和SSL自监督特征提取都能正常运行 但是进行语义token提取时直接报错
"/opt/conda/bin/python" GPT_SoVITS/prepare_datasets/3-get-semantic.py
"/opt/conda/bin/python" GPT_SoVITS/prepare_datasets/3-get-semantic.py
Traceback (most recent call last):
File "/root/GPT-SoVITS/GPT_SoVITS/prepare_datasets/3-get-semantic.py", line 42, in
hps = utils.get_hparams_from_file(s2config_path)
AttributeError: module 'utils' has no attribute 'get_hparams_from_file'
Traceback (most recent call last):
File "/root/GPT-SoVITS/GPT_SoVITS/prepare_datasets/3-get-semantic.py", line 42, in
hps = utils.get_hparams_from_file(s2config_path)
AttributeError: module 'utils' has no attribute 'get_hparams_from_file'
Does my data has only Chinese can few shot learning support both Chinese and English?
About 1min voice.
使用英文训练微调GPT15轮 在推理时, 输出的语音和参考语音一摸一样。将GPT换为底模,就没有这个问题了,减少GPT到5轮,输出语音会丢一些词
可以贴下训练好的语音和原始数据集语音,看看大家的效果如何吗?
最好说下样本时长多少,微调的参数怎么设置,比如多少轮这样。
I noticed that the author shared the explanation video, but it is about the principle sharing of clone.
Therefore I would like to ask if there will be any shared explanation and structure diagram of TTS part in the future.
Thank you very much!
rt,谢谢
modelscope - WARNING - ('PIPELINES', 'auto-speech-recognition', 'funasr-pipeline') not found in ast index file
Traceback (most recent call last):
File "D:\ai\GPT-SoVITS\tools\damo_asr\cmd-asr.py", line 9, in <module>
inference_pipeline = pipeline(
File "D:\software\miniconda\envs\GPTSoVits\lib\site-packages\modelscope\pipelines\builder.py", line 170, in pipeline
return build_pipeline(cfg, task_name=task)
File "D:\software\miniconda\envs\GPTSoVits\lib\site-packages\modelscope\pipelines\builder.py", line 65, in build_pipeline
return build_from_cfg(
File "D:\software\miniconda\envs\GPTSoVits\lib\site-packages\modelscope\utils\registry.py", line 198, in build_from_cfg
raise KeyError(
KeyError: 'funasr-pipeline is not in the pipelines registry group auto-speech-recognition. Please make sure the correct version of ModelScope library is used.'
Traceback (most recent call last):
File "/media/dell/work/workspaces/GPT-SoVITS/tools/damo_asr/cmd-asr.py", line 19, in
text = inference_pipeline(audio_in="%s/%s"%(dir,name))["text"]
File "/media/dell/work/workspaces/GPT-SoVITS/modelscope/package/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in call
output = self.model(*args, **kwargs)
File "/media/dell/work/workspaces/GPT-SoVITS/modelscope/package/modelscope/models/base/base_model.py", line 35, in call
return self.postprocess(self.forward(*args, **kwargs))
File "/media/dell/work/workspaces/GPT-SoVITS/modelscope/package/modelscope/models/audio/funasr/model.py", line 61, in forward
output = self.model.generate(*args, **kwargs)
TypeError: generate() missing 1 required positional argument: 'input'
输出如上,请问该如何解决?
Cmd output is
D:\GPT-SoVITS-main>runtime\python.exe webui.py
The system cannot find the path specified.
D:\GPT-SoVITS-main>pause
Press any key to continue . . .
Then slicer tool samples the input files at 32000 Hz, is this a preferred sample rate for finetuning the model? My original audio has sample rate at 44100 Hz, should I keep it as 44100 or resampled it to 32000 Hz?
GPT-SoVITS/tools/slice_audio.py
Lines 17 to 24 in 9619223
一键三联是报错
Some weights of HubertModel were not initialized from the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
我的模型是提前下载好的
合成语音的时候总进度条是1500,但是总是走到3、400的时候就结束了,然后现实
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "asyncio\events.py", line 80, in _run File "asyncio\proactor_events.py", line 162, in _call_connection_lost ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
但是语音转换还是完成了,总觉得效果不太好是因为这个进度条没走完吗
Thank you for your outstanding work. I am trying it, I can split audio normally in the new environment created by Readme, but I cannot use ASR. There are something wrong in mindscore?
KeyError: 'funasr-pipeline is not in the pipelines registry group auto-speech-recognition. Please make sure the correct version of ModelScope library is used.'
I would be extremely grateful if you could answer.
Hi! This project seems extremely promising. I was wondering whether you would in the future support scripts/instructions to train models to perform in other languages (non-english). For example, there are multiple bert versions in my native tongue, but none of them which could output mel tokens based on a trained dataset.
Would you recommend options on how to accomplish this in the scope of this project? Could try to figure out a code for it myself also and then share my findings also if i manage to figure it out. Do you have any discord server where there would be a possibility to i could discuss further how this TTS - pipeline process works?
Love what you guys have done with this project.
看日志好像是需要安装FFmpeg库,但我是在Windows上运行go-webui.bat启动的,目录下有ffmpeg.exe 和 ffprobe.exe:
"runtime\python" GPT_SoVITS/inference_webui.py
DEBUG:torchaudio._extension:Failed to initialize ffmpeg bindings
Traceback (most recent call last):
File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torchaudio\_extension\utils.py", line 85, in _init_ffmpeg
_load_lib("libtorchaudio_ffmpeg")
File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torchaudio\_extension\utils.py", line 61, in _load_lib
torch.ops.load_library(path)
File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torch\_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "ctypes\__init__.py", line 374, in __init__
FileNotFoundError: Could not find module 'C:\Users\userme\Downloads\GPT-SoVITS\runtime\Lib\site-packages\torchaudio\lib\libtorchaudio_ffmpeg.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torchaudio\_extension\__init__.py", line 67, in <module>
_init_ffmpeg()
File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torchaudio\_extension\utils.py", line 87, in _init_ffmpeg
raise ImportError("FFmpeg libraries are not found. Please install FFmpeg.") from err
ImportError: FFmpeg libraries are not found. Please install FFmpeg.
_IncompatibleKeys(missing_keys=['enc_q.pre.weight', 'enc_q.pre.bias', 'enc_q.enc.in_layers.0.bias', 'enc_q.enc.in_layers.0.weight_g', 'enc_q.enc.in_layers.0.weight_v', 'enc_q.enc.in_layers.1.bias', 'enc_q.enc.in_layers.1.weight_g', 'enc_q.enc.in_layers.1.weight_v', 'enc_q.enc.in_layers.2.bias', 'enc_q.enc.in_layers.2.weight_g', 'enc_q.enc.in_layers.2.weight_v', 'enc_q.enc.in_layers.3.bias', 'enc_q.enc.in_layers.3.weight_g', 'enc_q.enc.in_layers.3.weight_v', 'enc_q.enc.in_layers.4.bias', 'enc_q.enc.in_layers.4.weight_g', 'enc_q.enc.in_layers.4.weight_v', 'enc_q.enc.in_layers.5.bias', 'enc_q.enc.in_layers.5.weight_g', 'enc_q.enc.in_layers.5.weight_v', 'enc_q.enc.in_layers.6.bias', 'enc_q.enc.in_layers.6.weight_g', 'enc_q.enc.in_layers.6.weight_v', 'enc_q.enc.in_layers.7.bias', 'enc_q.enc.in_layers.7.weight_g', 'enc_q.enc.in_layers.7.weight_v', 'enc_q.enc.in_layers.8.bias', 'enc_q.enc.in_layers.8.weight_g', 'enc_q.enc.in_layers.8.weight_v', 'enc_q.enc.in_layers.9.bias', 'enc_q.enc.in_layers.9.weight_g', 'enc_q.enc.in_layers.9.weight_v', 'enc_q.enc.in_layers.10.bias', 'enc_q.enc.in_layers.10.weight_g', 'enc_q.enc.in_layers.10.weight_v', 'enc_q.enc.in_layers.11.bias', 'enc_q.enc.in_layers.11.weight_g', 'enc_q.enc.in_layers.11.weight_v', 'enc_q.enc.in_layers.12.bias', 'enc_q.enc.in_layers.12.weight_g', 'enc_q.enc.in_layers.12.weight_v', 'enc_q.enc.in_layers.13.bias', 'enc_q.enc.in_layers.13.weight_g', 'enc_q.enc.in_layers.13.weight_v', 'enc_q.enc.in_layers.14.bias', 'enc_q.enc.in_layers.14.weight_g', 'enc_q.enc.in_layers.14.weight_v', 'enc_q.enc.in_layers.15.bias', 'enc_q.enc.in_layers.15.weight_g', 'enc_q.enc.in_layers.15.weight_v', 'enc_q.enc.res_skip_layers.0.bias', 'enc_q.enc.res_skip_layers.0.weight_g', 'enc_q.enc.res_skip_layers.0.weight_v', 'enc_q.enc.res_skip_layers.1.bias', 'enc_q.enc.res_skip_layers.1.weight_g', 'enc_q.enc.res_skip_layers.1.weight_v', 'enc_q.enc.res_skip_layers.2.bias', 'enc_q.enc.res_skip_layers.2.weight_g', 'enc_q.enc.res_skip_layers.2.weight_v', 'enc_q.enc.res_skip_layers.3.bias', 'enc_q.enc.res_skip_layers.3.weight_g', 'enc_q.enc.res_skip_layers.3.weight_v', 'enc_q.enc.res_skip_layers.4.bias', 'enc_q.enc.res_skip_layers.4.weight_g', 'enc_q.enc.res_skip_layers.4.weight_v', 'enc_q.enc.res_skip_layers.5.bias', 'enc_q.enc.res_skip_layers.5.weight_g', 'enc_q.enc.res_skip_layers.5.weight_v', 'enc_q.enc.res_skip_layers.6.bias', 'enc_q.enc.res_skip_layers.6.weight_g', 'enc_q.enc.res_skip_layers.6.weight_v', 'enc_q.enc.res_skip_layers.7.bias', 'enc_q.enc.res_skip_layers.7.weight_g', 'enc_q.enc.res_skip_layers.7.weight_v', 'enc_q.enc.res_skip_layers.8.bias', 'enc_q.enc.res_skip_layers.8.weight_g', 'enc_q.enc.res_skip_layers.8.weight_v', 'enc_q.enc.res_skip_layers.9.bias', 'enc_q.enc.res_skip_layers.9.weight_g', 'enc_q.enc.res_skip_layers.9.weight_v', 'enc_q.enc.res_skip_layers.10.bias', 'enc_q.enc.res_skip_layers.10.weight_g', 'enc_q.enc.res_skip_layers.10.weight_v', 'enc_q.enc.res_skip_layers.11.bias', 'enc_q.enc.res_skip_layers.11.weight_g', 'enc_q.enc.res_skip_layers.11.weight_v', 'enc_q.enc.res_skip_layers.12.bias', 'enc_q.enc.res_skip_layers.12.weight_g', 'enc_q.enc.res_skip_layers.12.weight_v', 'enc_q.enc.res_skip_layers.13.bias', 'enc_q.enc.res_skip_layers.13.weight_g', 'enc_q.enc.res_skip_layers.13.weight_v', 'enc_q.enc.res_skip_layers.14.bias', 'enc_q.enc.res_skip_layers.14.weight_g', 'enc_q.enc.res_skip_layers.14.weight_v', 'enc_q.enc.res_skip_layers.15.bias', 'enc_q.enc.res_skip_layers.15.weight_g', 'enc_q.enc.res_skip_layers.15.weight_v', 'enc_q.enc.cond_layer.bias', 'enc_q.enc.cond_layer.weight_g', 'enc_q.enc.cond_layer.weight_v', 'enc_q.proj.weight', 'enc_q.proj.bias'], unexpected_keys=[])
Number of parameter: 77.49M
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): checkip.amazonaws.com:443
DEBUG:urllib3.connectionpool:https://checkip.amazonaws.com:443 "GET / HTTP/1.1" 200 16
DEBUG:charset_normalizer:Encoding detection: ascii is most likely the one.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.gradio.app:443
DEBUG:urllib3.connectionpool:https://api.gradio.app:443 "POST /gradio-initiated-analytics/ HTTP/1.1" 200 None
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:root:Using proactor: IocpProactor
DEBUG:root:Using proactor: IocpProactor
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:9872
DEBUG:urllib3.connectionpool:http://localhost:9872 "GET /startup-events HTTP/1.1" 200 4
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:9872
DEBUG:urllib3.connectionpool:http://localhost:9872 "HEAD / HTTP/1.1" 200 0
Running on local URL: http://0.0.0.0:9872
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.gradio.app:443
DEBUG:urllib3.connectionpool:https://api.gradio.app:443 "POST /gradio-launched-analytics/ HTTP/1.1" 200 None
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\userme\Downloads\GPT-SoVITS\TEMP\jieba.cache
DEBUG:jieba:Loading model from cache C:\Users\userme\Downloads\GPT-SoVITS\TEMP\jieba.cache
Loading model cost 0.717 seconds.
DEBUG:jieba:Loading model cost 0.717 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
19%|██████████████▋ | 279/1500 [00:07<00:32, 37.07it/s]T2S Decoding EOS [92 -> 373]
19%|██████████████▊ | 281/1500 [00:07<00:33, 36.41it/s]
C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
1.941 1.060 7.724 2.086
DEBUG:httpx._client:HTTP Request: POST http://localhost:9872/api/predict "HTTP/1.1 200 OK"
DEBUG:httpx._client:HTTP Request: POST http://localhost:9872/api/predict "HTTP/1.1 200 OK"
DEBUG:httpx._client:HTTP Request: POST http://localhost:9872/reset "HTTP/1.1 200 OK"
ERROR:root:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "asyncio\events.py", line 80, in _run
File "asyncio\proactor_events.py", line 162, in _call_connection_lost
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
taskkill /t /f /pid 10664
Is there a simple data preprocess script and kick off training guide?
Currently I don't know how to train, either how to using my voice to clone
GPT训练正常。但SoVITS训练报错:
"python" GPT_SoVITS/s2_train.py --config "TEMP/tmp_s2.json"
INFO:meigui:{'train': {'log_interval': 100, 'eval_interval': 500, 'seed': 1234, 'epochs': 10, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 1, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 20480, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'text_low_lr_rate': 0.4, 'pretrained_s2G': 'GPT_SoVITS/pretrained_models/s2G488k.pth', 'pretrained_s2D': 'GPT_SoVITS/pretrained_models/s2D488k.pth', 'if_save_latest': True, 'if_save_every_weights': True, 'save_every_epoch': 5, 'gpu_numbers': '0'}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 32000, 'filter_length': 2048, 'hop_length': 640, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 300, 'cleaned_text': True, 'exp_dir': 'logs/meigui'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 512, 'semantic_frame_rate': '25hz', 'freeze_quantizer': True}, 's2_ckpt_dir': 'logs/meigui', 'content_module': 'cnhubert', 'save_weight_dir': 'SoVITS_weights', 'name': 'meigui', 'pretrain': None, 'resume_step': None}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
phoneme_data_len: 3
wav_data_len: 99
100%|████████████████████████████████████████| 99/99 [00:00<00:00, 24775.42it/s]
skipped_phone: 0 , skipped_dur: 0
total left: 99
ssl_proj.weight not requires_grad
ssl_proj.bias not requires_grad
INFO:meigui:loaded pretrained GPT_SoVITS/pretrained_models/s2G488k.pth
<All keys matched successfully>
INFO:meigui:loaded pretrained GPT_SoVITS/pretrained_models/s2D488k.pth
<All keys matched successfully>
/root/miniconda3/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f5c840ca710>
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1478, in __del__
self._shutdown_workers()
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1409, in _shutdown_workers
if not self._shutdown:
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown'
Traceback (most recent call last):
File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/s2_train.py", line 402, in <module>
main()
File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/s2_train.py", line 53, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/s2_train.py", line 172, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler,
File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/s2_train.py", line 195, in train_and_evaluate
for batch_idx, (ssl, ssl_lengths, spec, spec_lengths, y, y_lengths, text, text_lengths) in tqdm(enumerate(train_loader)):
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 436, in __iter__
self._iterator = self._get_iterator()
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 994, in __init__
super().__init__(loader)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 603, in __init__
self._sampler_iter = iter(self._index_sampler)
File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/module/data_utils.py", line 293, in __iter__
ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]
ZeroDivisionError: integer division or modulo by zero
Trying to train few shot but i get errors because it's not creating these files:
self.path2: logs/xxx/2-name2text.txt
self.path4: logs/xxx/4-cnhubert
self.path5: logs/xxx/5-wav32k
Traceback (most recent call last):
File "x:\sovits\GPT_SoVITS\s2_train.py", line 402, in <module>
main()
File "x:\sovits\GPT_SoVITS\s2_train.py", line 53, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
while not context.join():
File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "x:\sovits\GPT_SoVITS\s2_train.py", line 69, in run
train_dataset = TextAudioSpeakerLoader(hps.data)########
File "x:\sovits\GPT_SoVITS\module\data_utils.py", line 37, in __init__
assert os.path.exists(self.path2)
AssertionError
in addition, where exactly do i place the xxx.list file?
DEBUG:torchaudio._extension:Failed to initialize ffmpeg bindings
Traceback (most recent call last):
File "E:\GPT-SoVITS\runtime\lib\site-packages\torchaudio_extension\utils.py", line 85, in _init_ffmpeg
_load_lib("libtorchaudio_ffmpeg")
File "E:\GPT-SoVITS\runtime\lib\site-packages\torchaudio_extension\utils.py", line 61, in load_lib
torch.ops.load_library(path)
File "E:\GPT-SoVITS\runtime\lib\site-packages\torch_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "ctypes_init.py", line 374, in init
FileNotFoundError: Could not find module 'E:\GPT-SoVITS\runtime\Lib\site-packages\torchaudio\lib\libtorchaudio_ffmpeg.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\GPT-SoVITS\runtime\lib\site-packages\torchaudio_extension_init_.py", line 67, in
_init_ffmpeg()
File "E:\GPT-SoVITS\runtime\lib\site-packages\torchaudio_extension\utils.py", line 87, in _init_ffmpeg
raise ImportError("FFmpeg libraries are not found. Please install FFmpeg.") from err
ImportError: FFmpeg libraries are not found. Please install FFmpeg.
I have installed all the environment dependencies, which can be used on ubuntu system, It works well on ubuntu, but it can't be used on colab. I found that it can't find users.pth, but in the ubuntu, it is in the directory of python.
this is the error information in colab, how can i fix it.
with open("%s/users.pth"%(site_packages_root),"w")as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/GPT-SoVITS/runtime/Lib/site-packages/users.pth'
感谢开源。想了解一下当前底模训练相关的一些信息,如题大概是多大的数据集在什么配置规格下需要训练多久?可以自己训练更多语言的底模吗?
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/torchaudio/_extension/__init__.py", line 67, in <module>
_init_ffmpeg()
File "/root/miniconda3/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 87, in _init_ffmpeg
raise ImportError("FFmpeg libraries are not found. Please install FFmpeg.") from err
ImportError: FFmpeg libraries are not found. Please install FFmpeg.
但是ffmpeg -h是正常的,也尝试过
- sudo apt install ffmpeg
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 141 not upgraded.
6006端口被占用
手动改了下源码修改到其他端口解决了这个问题(?)
推理时出现error
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/gradio/routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "/root/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1389, in process_api
result = await self.call_function(
File "/root/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1108, in call_function
prediction = await utils.async_iteration(iterator)
File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 346, in async_iteration
return await iterator.__anext__()
File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 339, in __anext__
return await anyio.to_thread.run_sync(
File "/root/miniconda3/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/root/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 322, in run_sync_iterator_async
return next(iterator)
File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 691, in gen_wrapper
yield from f(*args, **kwargs)
File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/inference_webui.py", line 135, in get_tts_wav
bert = torch.cat([bert1, bert2], 1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 500, b'Internal Server Error', [(b'date', b'Wed, 17 Jan 2024 10:14:44 GMT'), (b'server', b'uvicorn'), (b'content-length', b'14'), (b'content-type', b'application/json')])
INFO:httpx:HTTP Request: POST http://localhost:7896/api/predict "HTTP/1.1 500 Internal Server Error"
这部分我看没有弹出url可以直接访问,我使用了AUTODL提供的端口转发,7896端口(修改了源码中的6006,因为被占用了)到本地进行操作。(点击推理后出现error)
此外,问问目前是否有办法支持使用中文以外的语料吗
音频分割目前支持的应该还可以
标注方面,我本地通过fast-wisper可以标注
但是后面的BERT和SSL是否有模型可以直接使用
(其他卡留待验证,autodl没卡了,有机会再试试3090)
还有一些问题我没进一步研究,先询问一下
每一步的操作应该不会依赖前面的文件吧,比如我直接提供标注好的数据(.wav和.list),然后从后面的步骤进行(input之类的文件夹就留空了),应该不会影响吧
另外是推理应该只依赖于预训练模型和GPT(.pth)、SOVIT(.ckpt)的两个权重文件吧,如果只想进行推理,仅拷贝这两个文件到指定文件夹启动即可?
I currently ran into a bit of a problem that may have something to do with my CUDA. I'm a MacBook M1 user, so naturally, I don't have a GPU that fits CUDA. Normally I would expect setting CPU to the device as an alternative, which, for the record, I did see the codes, but it did not work smoothly on my device. Torch has launched mps for Apple Silicon users as an alternative to CUDA, I was wondering when the developer can update this.
The following is the Error I received when I was formatting the train set(1-训练集格式化工具). Maybe I got it all wrong why this error happen, please kindly help solve this.
"/Users/improvise/miniconda/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
"/Users/improvise/miniconda/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
Traceback (most recent call last):
File "/Users/improvise/Desktop/GPT-SoVITS-main/GPT_SoVITS/prepare_datasets/1-get-text.py", line 53, in <module>
bert_model = bert_model.half().to(device)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2460, in to
return super().to(*args, **kwargs)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1160, in to
Traceback (most recent call last):
File "/Users/improvise/Desktop/GPT-SoVITS-main/GPT_SoVITS/prepare_datasets/1-get-text.py", line 53, in <module>
bert_model = bert_model.half().to(device)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2460, in to
return super().to(*args, **kwargs)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
return self._apply(convert)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 833, in _apply
module._apply(fn)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
param_applied = fn(param)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/sit module._apply(fn)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
e-packages/torch/nn/modules/module.py", line 1158, in convert
module._apply(fn)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 833, in _apply
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
param_applied = fn(param)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1158, in convert
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Traceback (most recent call last):
File "/Users/improvise/Desktop/GPT-SoVITS-main/webui.py", line 529, in open1abc
with open(txt_path, "r",encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'logs/test01/2-name2text-0.txt'
ps. The output in the folder logs
is like this: /Users/improvise/Desktop/GPT-SoVITS-main/logs/test01/3-bert
, the folder is empty.
训练和推理,能否加入韩语支持,谢谢~
Can you add Korean, thank you~
semantic_data_len: 0
phoneme_data_len: 2474
Traceback (most recent call last):
File "F:\AI\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 139, in
main(args)
File "F:\AI\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 116, in main
trainer.fit(model, data_module, ckpt_path=ckpt_path)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 950, in _run
call._call_setup_hook(self) # allow user to setup lightning_module in accelerator environment
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 92, in _call_setup_hook
_call_lightning_datamodule_hook(trainer, "setup", stage=fn)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 179, in _call_lightning_datamodule_hook
return fn(*args, **kwargs)
File "F:\AI\GPT-SoVITS\GPT_SoVITS\AR\data\data_module.py", line 22, in setup
self._train_dataset = Text2SemanticDataset(
File "F:\AI\GPT-SoVITS\GPT_SoVITS\AR\data\dataset.py", line 96, in init
self.init_batch()
File "F:\AI\GPT-SoVITS\GPT_SoVITS\AR\data\dataset.py", line 170, in init_batch
for _ in range(max(2,int(min_num/leng))):
ZeroDivisionError: division by zero
originally the importerror was deprecated but after following tiangolo/fastapi#9808 it became doc instead
Thank you for your open-source contributions.
Could you develop a guide to support mixed language? For example, a sentence existing both Chinese and English.
Thanks for your open-source sharing. It is a greatness project!
However, It might not be the correct version because some critical files are missing.
Traceback (most recent call last):
File "D:\0GitHubtest\GPT-SoVITS-main\GPT_SoVITS\inference_webui.py", line 355, in
app.queue(concurrency_count=511, max_size=1022).launch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\10029.conda\envs\svits\Lib\site-packages\gradio\blocks.py", line 1715, in queue
raise DeprecationWarning(
DeprecationWarning: concurrency_count has been deprecated. Set the concurrency_limit directly on event listeners e.g. btn.click(fn, ..., concurrency_limit=10) or gr.Interface(concurrency_limit=10). If necessary, the total number of workers can be configured via max_threads
in launch().
但是这不重要了,因为我已经把concurrency_count=511 remove了......
经常会出现有的字词没念,有的字词重复的样子。是我一个人有这样的问题吗?
您好,请教一下,我使用的是mac系统,按照安装说明搭建完环境以后,执行webui.py出现以下的错误提示,请问这是哪里出现了问题呢?希望您抽空回覆,不胜感激。(另使用您在B站提供的整合包,运行依然是提示如下的错误。)
By default, 'file' is written in the MIFF image format. To
specify a particular image format, precede the filename with an image
format name and a colon (i.e. ps:image) or specify the image type as
the filename suffix (i.e. image.ps). Specify 'file' as '-' for
standard input or output.
import: delegate library support not built-in '' (X11) @ error/import.c/ImportImageCommand/1302.
: command not found
./webui.py: line 4: syntax error near unexpected token `"ignore"'
'/webui.py: line 4: `warnings.filterwarnings("ignore")
对于新手来说,自己搭建配置环境太复杂,要是有大大能出一个整合包,感激不尽
在某些特定的参考音频和prompt text的组合下,t2s_model会输出bad zero prediction情况,并返回idx=0。这会导致inference_webui.py的第211-213行的
pred_semantic = pred_semantic[:, -idx:].unsqueeze(
0
) # .unsqueeze(0)#mq要多unsqueeze一次
处的pred_semantic[:, -0:]错误地将输入序列包含在输出中。
这里是否应该做一个错误检查?
zeroshot机械音比较重,基本不可用,fewshot有人试了效果如何吗
D:\GPT-SoVITS>runtime\python.exe webui.py
Running on local URL: http://0.0.0.0:9874
"D:\GPT-SoVITS\runtime\python.exe" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
File "D:\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 171, in <module>
main(args)
File "D:\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 147, in main
trainer.fit(model, data_module, ckpt_path=ckpt_path)
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
self.strategy.setup_environment()
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 148, in setup_environment
self.setup_distributed()
File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 199, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File "D:\GPT-SoVITS\runtime\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\distributed_c10d.py", line 888, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 245, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 176, in _create_c10d_store
return TCPStore(
RuntimeError: unmatched '}' in format string
操作:一键三连后点击开始Sovits训练
"runtime\python" GPT_SoVITS/s2_train.py --config "TEMP/tmp_s2.json"
INFO:shun:{'train': {'log_interval': 100, 'eval_interval': 500, 'seed': 1234, 'epochs': 15, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 12, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 20480, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'text_low_lr_rate': 0.4, 'pretrained_s2G': 'GPT_SoVITS/pretrained_models/s2G488k.pth', 'pretrained_s2D': 'GPT_SoVITS/pretrained_models/s2D488k.pth', 'if_save_latest': True, 'if_save_every_weights': True, 'save_every_epoch': 2, 'gpu_numbers': '0'}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 32000, 'filter_length': 2048, 'hop_length': 640, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 300, 'cleaned_text': True, 'exp_dir': 'logs/shun'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 512, 'semantic_frame_rate': '25hz', 'freeze_quantizer': True}, 's2_ckpt_dir': 'logs/shun', 'content_module': 'cnhubert', 'save_weight_dir': 'SoVITS_weights', 'name': 'shun', 'pretrain': None, 'resume_step': None}
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [TopPC4090]:54275 (system error: 10049 - 在其上下文中,该请求的地址无效。).
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [TopPC4090]:54275 (system error: 10049 - 在其上下文中,该请求的地址无效。).
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
Traceback (most recent call last):
File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s2_train.py", line 402, in <module>
main()
File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s2_train.py", line 53, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
while not context.join():
File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s2_train.py", line 69, in run
train_dataset = TextAudioSpeakerLoader(hps.data)########
File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\module\data_utils.py", line 54, in __init__
for _ in range(max(2, int(min_num / leng))):
ZeroDivisionError: division by zero
hubert特征的RVQ模块 请问是预训练的吗? 还是跟着sovits模型,从头开始训的?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.