rvc-boss / gpt-sovits Goto Github PK

View Code? Open in Web Editor NEW

29.3K 29.3K 3.4K 6.39 MB

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

License: MIT License

Python 97.87% Batchfile 0.01% Shell 0.17% Dockerfile 0.14% PowerShell 0.01% Jupyter Notebook 1.79%

text-to-speech tts vits voice-clone voice-cloneai voice-cloning

gpt-sovits's People

Contributors

Stargazers

Watchers

Forkers

wendongj nangongmujd bekidd clumsyroot ricecakey06 akito-uzukip uwpmok islenkao ishine pengoosedev zhang9song hardwaylinka xaiat blaisewf geekwish copperdong tps-f mafuyu33 minearchive miuzarte anyacoder dotweb3 rafaelgodoyebert oldtan2020 ljyic alexzhou1995 macroustc sx-tts voice-spark atuxhe hyzhan aiwzx wxsong20 shaun95 d3lik zqlsnr tongbh ikaros-521 markyfsun akko77 liuzl edustack yuan-manx tivojn andrewddc ai-awe leepaul2008 nzb15555196162 hwangkop cnhack3r toverlight lesca momu-2016 ai-hobbyist bonjomondo zpdsherlock z2labplus hito0512 cyborgparadisum cellinlab tonykong-2002 edison-angel sheldonzipingchen jackdiy ahuohuo78 vshanyiao ideas4u cyang0227 kitasanyuu lkaly qq137321 derek-zl thestingerx itsaquestion meogoo dongbeixiaohuo martjay cantoblanco skyarony keyman9848 heyxk hhtao shiyukonghui shiyukonghuidecangku bonbinker dstansice randybutters snake-konginchrist chenchy kerwinchina yinyangfs qlink5 kenwaytis jackyken szad670401 fdsalbj lesliewang06 majiajue albertzzzzzzzzzzzz wyh0626

gpt-sovits's Issues

加载预训练SoVITS-G模型s2G488k.pth报错

如题，在GPT_SoVITS/prepare_datasets/3-get-semantic.py", line 59, in vq_model.load_state_dict报错size mismatch for enc_p.text_embedding.weight: copying a param with shape torch.Size([322, 192]) from checkpoint, the shape in current model is torch.Size([151, 192]).
请问这是预训练模型不对的原因吗？

GPT 微调: ZeroDivisionError: division by zero

GPT 微调报错，但是 1Ba-SoVITS训练是可以的。 GPT 的训练报错：

"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml" 
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

semantic_data_len: 0
phoneme_data_len: 5
Empty DataFrame
Columns: [item_name, semantic_audio]
Index: []
Traceback (most recent call last):
  File "/data/home/GPT-SoVITS/GPT_SoVITS/s1_train.py", line 171, in <module>
    main(args)
  File "/data/home/GPT-SoVITS/GPT_SoVITS/s1_train.py", line 147, in main
    trainer.fit(model, data_module, ckpt_path=ckpt_path)
  File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 950, in _run
    call._call_setup_hook(self)  # allow user to setup lightning_module in accelerator environment
  File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 92, in _call_setup_hook
    _call_lightning_datamodule_hook(trainer, "setup", stage=fn)
  File "/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "/data/home/GPT-SoVITS/GPT_SoVITS/AR/data/data_module.py", line 29, in setup
    self._train_dataset = Text2SemanticDataset(
  File "/data/home/GPT-SoVITS/GPT_SoVITS/AR/data/dataset.py", line 107, in __init__
    self.init_batch()
  File "/data/home/GPT-SoVITS/GPT_SoVITS/AR/data/dataset.py", line 187, in init_batch
    for _ in range(max(2, int(min_num / leng))):
ZeroDivisionError: division by zero

感觉前一步训练集格式化，开启 SSL 提取就有点问题，不知道和这个有关系没。

"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/2-get-hubert-wav32k.py
"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/2-get-hubert-wav32k.py
Some weights of the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of HubertModel were not initialized from the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/3-get-semantic.py
"/data/home/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/3-get-semantic.py
/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/data/home/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")

视频里说的有点乱，一句话告诉你这模型是干啥的

vitsvc知道吧可以用text训也可以用ppg训比如原始论文里的vc是用text训的 sovits是用的比如whisper的ppg或者hubert+vq

但是直接从reference wav里提hubert再去vq推理的时候会有音色泄露所以作者就用一个gpt模型来从text里预测hubert+vq 以reference音色作为prompt 这样推理阶段生成出来的hubert+vq就会少音色泄露换句话说你用类似的方案但预测whisper的ppg也是可以的

但由于整体的topline就是预训练的hubert+vq based vitsvc 从视频里可以看出zero-shot的能力并没有特别强因为本身vitsvc就不是用来做zero-shot的所以总体来讲这个不是一个大模型但由于是vitsvc的方案的改进音色泄露减小了所以做few-shot是可以的是一个比较实用的模型如果vitsvc做成zero-shot的vitsvc 那就可以变成一个大模型由于semantic based vc是可以用脏数据训练的所以猛上大数据说不定可以变成一个大模型

执行asr 报错 funasr-pipeline

执行asr 报错
KeyError: 'funasr-pipeline is not in the pipelines registry group auto-speech-recognition. Please make sure the correct version of ModelScope library is used.'
怎么解决

eRuntimeError: unmatched '}' in format string

你好，我在點擊开启GPT训练之後出現了以下錯誤，請問該如何解決?
點擊开启SoVITS训练是正常的
我是windows10下運行完整包

"runtime\python" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
  File "I:\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 138, in <module>
    main(args)
  File "I:\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 115, in main
    trainer.fit(model, data_module, ckpt_path=ckpt_path)
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
    self.strategy.setup_environment()
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 148, in setup_environment
    self.setup_distributed()
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 199, in setup_distributed
    _init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
    torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\distributed_c10d.py", line 888, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 245, in _env_rendezvous_handler
    store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
  File "I:\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 176, in _create_c10d_store
    return TCPStore(
RuntimeError: unmatched '}' in format string

执行开启ssl提取的时候，np.isnan(ssl.detach().numpy()).sum()不为0导致后续无法训练了

切分的音频都是5s内，但是在执行开启ssl的时候，对应的4-cnhubert下无法生成目标文件。原因是代码执行到了： if np.isnan(ssl.detach().numpy()).sum()!= 0:return，跳出了。请问什么样的数据会导致这样的跳出，我该准备满足什么样的音频才可执行成功。

where is users.pth?

colab run webui.py

/content/GPT-SoVITS
Traceback (most recent call last):
  File "/content/GPT-SoVITS/webui.py", line 17, in <module>
    with open("%s/users.pth"%(site_packages_root),"w")as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/GPT-SoVITS/runtime/Lib/site-packages/users.pth'

能给个colab notebook吗?

进行语义token提取时报错

pytorch2.1.0 py310 cu118 ubuntu22系统环境
文本内容和SSL自监督特征提取都能正常运行但是进行语义token提取时直接报错

"/opt/conda/bin/python" GPT_SoVITS/prepare_datasets/3-get-semantic.py
"/opt/conda/bin/python" GPT_SoVITS/prepare_datasets/3-get-semantic.py
Traceback (most recent call last):
File "/root/GPT-SoVITS/GPT_SoVITS/prepare_datasets/3-get-semantic.py", line 42, in
hps = utils.get_hparams_from_file(s2config_path)
AttributeError: module 'utils' has no attribute 'get_hparams_from_file'
Traceback (most recent call last):
File "/root/GPT-SoVITS/GPT_SoVITS/prepare_datasets/3-get-semantic.py", line 42, in
hps = utils.get_hparams_from_file(s2config_path)
AttributeError: module 'utils' has no attribute 'get_hparams_from_file'

Does my data has only Chinese can few shot learning support both Chinese and English?

About 1min voice.

英文训练GPT模型，推理结果不如底模

使用英文训练微调GPT15轮在推理时, 输出的语音和参考语音一摸一样。将GPT换为底模，就没有这个问题了，减少GPT到5轮，输出语音会丢一些词

训练参数，结果讨论

可以贴下训练好的语音和原始数据集语音，看看大家的效果如何吗？

最好说下样本时长多少，微调的参数怎么设置，比如多少轮这样。

The model structure of TTS

I noticed that the author shared the explanation video, but it is about the principle sharing of clone.
Therefore I would like to ask if there will be any shared explanation and structure diagram of TTS part in the future.
Thank you very much!

m1芯片的macbook可以安装模型吗？

rt，谢谢

ASR 报错 KeyError: 'funasr-pipeline is not in the pipelines registry group auto-speech-recognition.

我按照下面放了模型，但是运行仍然报错

modelscope - WARNING - ('PIPELINES', 'auto-speech-recognition', 'funasr-pipeline') not found in ast index file
Traceback (most recent call last):
  File "D:\ai\GPT-SoVITS\tools\damo_asr\cmd-asr.py", line 9, in <module>
    inference_pipeline = pipeline(
  File "D:\software\miniconda\envs\GPTSoVits\lib\site-packages\modelscope\pipelines\builder.py", line 170, in pipeline
    return build_pipeline(cfg, task_name=task)
  File "D:\software\miniconda\envs\GPTSoVits\lib\site-packages\modelscope\pipelines\builder.py", line 65, in build_pipeline
    return build_from_cfg(
  File "D:\software\miniconda\envs\GPTSoVits\lib\site-packages\modelscope\utils\registry.py", line 198, in build_from_cfg
    raise KeyError(
KeyError: 'funasr-pipeline is not in the pipelines registry group auto-speech-recognition. Please make sure the correct version of ModelScope library is used.'

执行ASR任务时发生missing 1 required positional argument的错误

Traceback (most recent call last):
File "/media/dell/work/workspaces/GPT-SoVITS/tools/damo_asr/cmd-asr.py", line 19, in
text = inference_pipeline(audio_in="%s/%s"%(dir,name))["text"]
File "/media/dell/work/workspaces/GPT-SoVITS/modelscope/package/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in call
output = self.model(*args, **kwargs)
File "/media/dell/work/workspaces/GPT-SoVITS/modelscope/package/modelscope/models/base/base_model.py", line 35, in call
return self.postprocess(self.forward(*args, **kwargs))
File "/media/dell/work/workspaces/GPT-SoVITS/modelscope/package/modelscope/models/audio/funasr/model.py", line 61, in forward
output = self.model.generate(*args, **kwargs)
TypeError: generate() missing 1 required positional argument: 'input'

输出如上，请问该如何解决？

windows enviroment can't open with double-clicked go-webui.bat

Cmd output is
D:\GPT-SoVITS-main>runtime\python.exe webui.py
The system cannot find the path specified.

D:\GPT-SoVITS-main>pause
Press any key to continue . . .

Preferred sample rate?

Then slicer tool samples the input files at 32000 Hz, is this a preferred sample rate for finetuning the model? My original audio has sample rate at 44100 Hz, should I keep it as 44100 or resampled it to 32000 Hz?

GPT-SoVITS/tools/slice_audio.py

Lines 17 to 24 in 9619223

    
           slicer = Slicer( 
        
               sr=32000,  # 长音频采样率 
        
               threshold=      int(threshold),  # 音量小于这个值视作静音的备选切割点 
        
               min_length=     int(min_length),  # 每段最小多长，如果第一段太短一直和后面段连起来直到超过这个值 
        
               min_interval=   int(min_interval),  # 最短切割间隔 
        
               hop_size=       int(hop_size),  # 怎么算音量曲线，越小精度越大计算量越高（不是精度越大效果越好） 
        
               max_sil_kept=   int(max_sil_kept),  # 切完后静音最多留多长 
        
           )

Many thanks!

一键三联报错GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0'

一键三联是报错
Some weights of HubertModel were not initialized from the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
我的模型是提前下载好的

推理过程中“远程主机强迫关闭了一个现有的连接”

合成语音的时候总进度条是1500，但是总是走到3、400的时候就结束了，然后现实
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "asyncio\events.py", line 80, in _run File "asyncio\proactor_events.py", line 162, in _call_connection_lost ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
但是语音转换还是完成了，总觉得效果不太好是因为这个进度条没走完吗

Problem in ASR environment.

Thank you for your outstanding work. I am trying it, I can split audio normally in the new environment created by Readme, but I cannot use ASR. There are something wrong in mindscore?

KeyError: 'funasr-pipeline is not in the pipelines registry group auto-speech-recognition. Please make sure the correct version of ModelScope library is used.'

I would be extremely grateful if you could answer.

multilingual training

Hi! This project seems extremely promising. I was wondering whether you would in the future support scripts/instructions to train models to perform in other languages (non-english). For example, there are multiple bert versions in my native tongue, but none of them which could output mel tokens based on a trained dataset.

Would you recommend options on how to accomplish this in the scope of this project? Could try to figure out a code for it myself also and then share my findings also if i manage to figure it out. Do you have any discord server where there would be a possibility to i could discuss further how this TTS - pipeline process works?

Love what you guys have done with this project.

[BUG]最后合成的没声音，按照视频教程步骤来的

看日志好像是需要安装FFmpeg库，但我是在Windows上运行go-webui.bat启动的，目录下有ffmpeg.exe 和 ffprobe.exe：

"runtime\python" GPT_SoVITS/inference_webui.py
DEBUG:torchaudio._extension:Failed to initialize ffmpeg bindings
Traceback (most recent call last):
  File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torchaudio\_extension\utils.py", line 85, in _init_ffmpeg
    _load_lib("libtorchaudio_ffmpeg")
  File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torchaudio\_extension\utils.py", line 61, in _load_lib
    torch.ops.load_library(path)
  File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torch\_ops.py", line 643, in load_library
    ctypes.CDLL(path)
  File "ctypes\__init__.py", line 374, in __init__
FileNotFoundError: Could not find module 'C:\Users\userme\Downloads\GPT-SoVITS\runtime\Lib\site-packages\torchaudio\lib\libtorchaudio_ffmpeg.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torchaudio\_extension\__init__.py", line 67, in <module>
    _init_ffmpeg()
  File "C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torchaudio\_extension\utils.py", line 87, in _init_ffmpeg
    raise ImportError("FFmpeg libraries are not found. Please install FFmpeg.") from err
ImportError: FFmpeg libraries are not found. Please install FFmpeg.
_IncompatibleKeys(missing_keys=['enc_q.pre.weight', 'enc_q.pre.bias', 'enc_q.enc.in_layers.0.bias', 'enc_q.enc.in_layers.0.weight_g', 'enc_q.enc.in_layers.0.weight_v', 'enc_q.enc.in_layers.1.bias', 'enc_q.enc.in_layers.1.weight_g', 'enc_q.enc.in_layers.1.weight_v', 'enc_q.enc.in_layers.2.bias', 'enc_q.enc.in_layers.2.weight_g', 'enc_q.enc.in_layers.2.weight_v', 'enc_q.enc.in_layers.3.bias', 'enc_q.enc.in_layers.3.weight_g', 'enc_q.enc.in_layers.3.weight_v', 'enc_q.enc.in_layers.4.bias', 'enc_q.enc.in_layers.4.weight_g', 'enc_q.enc.in_layers.4.weight_v', 'enc_q.enc.in_layers.5.bias', 'enc_q.enc.in_layers.5.weight_g', 'enc_q.enc.in_layers.5.weight_v', 'enc_q.enc.in_layers.6.bias', 'enc_q.enc.in_layers.6.weight_g', 'enc_q.enc.in_layers.6.weight_v', 'enc_q.enc.in_layers.7.bias', 'enc_q.enc.in_layers.7.weight_g', 'enc_q.enc.in_layers.7.weight_v', 'enc_q.enc.in_layers.8.bias', 'enc_q.enc.in_layers.8.weight_g', 'enc_q.enc.in_layers.8.weight_v', 'enc_q.enc.in_layers.9.bias', 'enc_q.enc.in_layers.9.weight_g', 'enc_q.enc.in_layers.9.weight_v', 'enc_q.enc.in_layers.10.bias', 'enc_q.enc.in_layers.10.weight_g', 'enc_q.enc.in_layers.10.weight_v', 'enc_q.enc.in_layers.11.bias', 'enc_q.enc.in_layers.11.weight_g', 'enc_q.enc.in_layers.11.weight_v', 'enc_q.enc.in_layers.12.bias', 'enc_q.enc.in_layers.12.weight_g', 'enc_q.enc.in_layers.12.weight_v', 'enc_q.enc.in_layers.13.bias', 'enc_q.enc.in_layers.13.weight_g', 'enc_q.enc.in_layers.13.weight_v', 'enc_q.enc.in_layers.14.bias', 'enc_q.enc.in_layers.14.weight_g', 'enc_q.enc.in_layers.14.weight_v', 'enc_q.enc.in_layers.15.bias', 'enc_q.enc.in_layers.15.weight_g', 'enc_q.enc.in_layers.15.weight_v', 'enc_q.enc.res_skip_layers.0.bias', 'enc_q.enc.res_skip_layers.0.weight_g', 'enc_q.enc.res_skip_layers.0.weight_v', 'enc_q.enc.res_skip_layers.1.bias', 'enc_q.enc.res_skip_layers.1.weight_g', 'enc_q.enc.res_skip_layers.1.weight_v', 'enc_q.enc.res_skip_layers.2.bias', 'enc_q.enc.res_skip_layers.2.weight_g', 'enc_q.enc.res_skip_layers.2.weight_v', 'enc_q.enc.res_skip_layers.3.bias', 'enc_q.enc.res_skip_layers.3.weight_g', 'enc_q.enc.res_skip_layers.3.weight_v', 'enc_q.enc.res_skip_layers.4.bias', 'enc_q.enc.res_skip_layers.4.weight_g', 'enc_q.enc.res_skip_layers.4.weight_v', 'enc_q.enc.res_skip_layers.5.bias', 'enc_q.enc.res_skip_layers.5.weight_g', 'enc_q.enc.res_skip_layers.5.weight_v', 'enc_q.enc.res_skip_layers.6.bias', 'enc_q.enc.res_skip_layers.6.weight_g', 'enc_q.enc.res_skip_layers.6.weight_v', 'enc_q.enc.res_skip_layers.7.bias', 'enc_q.enc.res_skip_layers.7.weight_g', 'enc_q.enc.res_skip_layers.7.weight_v', 'enc_q.enc.res_skip_layers.8.bias', 'enc_q.enc.res_skip_layers.8.weight_g', 'enc_q.enc.res_skip_layers.8.weight_v', 'enc_q.enc.res_skip_layers.9.bias', 'enc_q.enc.res_skip_layers.9.weight_g', 'enc_q.enc.res_skip_layers.9.weight_v', 'enc_q.enc.res_skip_layers.10.bias', 'enc_q.enc.res_skip_layers.10.weight_g', 'enc_q.enc.res_skip_layers.10.weight_v', 'enc_q.enc.res_skip_layers.11.bias', 'enc_q.enc.res_skip_layers.11.weight_g', 'enc_q.enc.res_skip_layers.11.weight_v', 'enc_q.enc.res_skip_layers.12.bias', 'enc_q.enc.res_skip_layers.12.weight_g', 'enc_q.enc.res_skip_layers.12.weight_v', 'enc_q.enc.res_skip_layers.13.bias', 'enc_q.enc.res_skip_layers.13.weight_g', 'enc_q.enc.res_skip_layers.13.weight_v', 'enc_q.enc.res_skip_layers.14.bias', 'enc_q.enc.res_skip_layers.14.weight_g', 'enc_q.enc.res_skip_layers.14.weight_v', 'enc_q.enc.res_skip_layers.15.bias', 'enc_q.enc.res_skip_layers.15.weight_g', 'enc_q.enc.res_skip_layers.15.weight_v', 'enc_q.enc.cond_layer.bias', 'enc_q.enc.cond_layer.weight_g', 'enc_q.enc.cond_layer.weight_v', 'enc_q.proj.weight', 'enc_q.proj.bias'], unexpected_keys=[])
Number of parameter: 77.49M
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): checkip.amazonaws.com:443
DEBUG:urllib3.connectionpool:https://checkip.amazonaws.com:443 "GET / HTTP/1.1" 200 16
DEBUG:charset_normalizer:Encoding detection: ascii is most likely the one.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.gradio.app:443
DEBUG:urllib3.connectionpool:https://api.gradio.app:443 "POST /gradio-initiated-analytics/ HTTP/1.1" 200 None
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.code:entering code: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.fence:entering fence: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.blockquote:entering blockquote: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.hr:entering hr: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.list:entering list: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.reference:entering reference: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.html_block:entering html_block: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.heading:entering heading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.lheading:entering lheading: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:markdown_it.rules_block.paragraph:entering paragraph: StateBlock(line=0,level=0,tokens=0), 0, 1, False
DEBUG:root:Using proactor: IocpProactor
DEBUG:root:Using proactor: IocpProactor
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:9872
DEBUG:urllib3.connectionpool:http://localhost:9872 "GET /startup-events HTTP/1.1" 200 4
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:9872
DEBUG:urllib3.connectionpool:http://localhost:9872 "HEAD / HTTP/1.1" 200 0
Running on local URL:  http://0.0.0.0:9872
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.gradio.app:443
DEBUG:urllib3.connectionpool:https://api.gradio.app:443 "POST /gradio-launched-analytics/ HTTP/1.1" 200 None
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\userme\Downloads\GPT-SoVITS\TEMP\jieba.cache
DEBUG:jieba:Loading model from cache C:\Users\userme\Downloads\GPT-SoVITS\TEMP\jieba.cache
Loading model cost 0.717 seconds.
DEBUG:jieba:Loading model cost 0.717 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
 19%|██████████████▋                                                                | 279/1500 [00:07<00:32, 37.07it/s]T2S Decoding EOS [92 -> 373]
 19%|██████████████▊                                                                | 281/1500 [00:07<00:33, 36.41it/s]
C:\Users\userme\Downloads\GPT-SoVITS\runtime\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
1.941   1.060   7.724   2.086
DEBUG:httpx._client:HTTP Request: POST http://localhost:9872/api/predict "HTTP/1.1 200 OK"
DEBUG:httpx._client:HTTP Request: POST http://localhost:9872/api/predict "HTTP/1.1 200 OK"
DEBUG:httpx._client:HTTP Request: POST http://localhost:9872/reset "HTTP/1.1 200 OK"
ERROR:root:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "asyncio\events.py", line 80, in _run
  File "asyncio\proactor_events.py", line 162, in _call_connection_lost
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
taskkill /t /f /pid 10664

User guide please

Is there a simple data preprocess script and kick off training guide?

Currently I don't know how to train, either how to using my voice to clone

大佬，后续TTS会整理出API接口吗

内存占用可以优化下？

训练好模型后，只是开页面，gpu 显存也都占用 3 个多 G，这里理论上可以不占用显存？

中文训练好，会丢失里面混杂的英语

这里训练好的模型，对于ms，response 这些都是直接不说的。虽然训练预料里也有英文部分。

SoVITS训练报错ZeroDivisionError

GPT训练正常。但SoVITS训练报错：

"python" GPT_SoVITS/s2_train.py --config "TEMP/tmp_s2.json"
INFO:meigui:{'train': {'log_interval': 100, 'eval_interval': 500, 'seed': 1234, 'epochs': 10, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 1, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 20480, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'text_low_lr_rate': 0.4, 'pretrained_s2G': 'GPT_SoVITS/pretrained_models/s2G488k.pth', 'pretrained_s2D': 'GPT_SoVITS/pretrained_models/s2D488k.pth', 'if_save_latest': True, 'if_save_every_weights': True, 'save_every_epoch': 5, 'gpu_numbers': '0'}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 32000, 'filter_length': 2048, 'hop_length': 640, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 300, 'cleaned_text': True, 'exp_dir': 'logs/meigui'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 512, 'semantic_frame_rate': '25hz', 'freeze_quantizer': True}, 's2_ckpt_dir': 'logs/meigui', 'content_module': 'cnhubert', 'save_weight_dir': 'SoVITS_weights', 'name': 'meigui', 'pretrain': None, 'resume_step': None}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
phoneme_data_len: 3
wav_data_len: 99
100%|████████████████████████████████████████| 99/99 [00:00<00:00, 24775.42it/s]
skipped_phone:  0 , skipped_dur:  0
total left:  99
ssl_proj.weight not requires_grad
ssl_proj.bias not requires_grad
INFO:meigui:loaded pretrained GPT_SoVITS/pretrained_models/s2G488k.pth
<All keys matched successfully>
INFO:meigui:loaded pretrained GPT_SoVITS/pretrained_models/s2D488k.pth
<All keys matched successfully>
/root/miniconda3/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f5c840ca710>
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1478, in __del__
    self._shutdown_workers()
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1409, in _shutdown_workers
    if not self._shutdown:
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown'
Traceback (most recent call last):
  File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/s2_train.py", line 402, in <module>
    main()
  File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/s2_train.py", line 53, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/s2_train.py", line 172, in run
    train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler,
  File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/s2_train.py", line 195, in train_and_evaluate
    for batch_idx, (ssl, ssl_lengths, spec, spec_lengths, y, y_lengths, text, text_lengths) in tqdm(enumerate(train_loader)):
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 436, in __iter__
    self._iterator = self._get_iterator()
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 994, in __init__
    super().__init__(loader)
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 603, in __init__
    self._sampler_iter = iter(self._index_sampler)
  File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/module/data_utils.py", line 293, in __iter__
    ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]
ZeroDivisionError: integer division or modulo by zero

Training errors out

Trying to train few shot but i get errors because it's not creating these files:

self.path2: logs/xxx/2-name2text.txt
self.path4: logs/xxx/4-cnhubert
self.path5: logs/xxx/5-wav32k

Traceback (most recent call last):
  File "x:\sovits\GPT_SoVITS\s2_train.py", line 402, in <module>
    main()
  File "x:\sovits\GPT_SoVITS\s2_train.py", line 53, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
    while not context.join():
  File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "x:\sovits\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "x:\sovits\GPT_SoVITS\s2_train.py", line 69, in run
    train_dataset = TextAudioSpeakerLoader(hps.data)########
  File "x:\sovits\GPT_SoVITS\module\data_utils.py", line 37, in __init__
    assert os.path.exists(self.path2)
AssertionError

in addition, where exactly do i place the xxx.list file?

开启TTS推理界面WEBUI的时候报错了，麻烦帮忙看下

DEBUG:torchaudio._extension:Failed to initialize ffmpeg bindings
Traceback (most recent call last):
File "E:\GPT-SoVITS\runtime\lib\site-packages\torchaudio_extension\utils.py", line 85, in _init_ffmpeg
_load_lib("libtorchaudio_ffmpeg")
File "E:\GPT-SoVITS\runtime\lib\site-packages\torchaudio_extension\utils.py", line 61, in load_lib
torch.ops.load_library(path)
File "E:\GPT-SoVITS\runtime\lib\site-packages\torch_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "ctypes_init.py", line 374, in init
FileNotFoundError: Could not find module 'E:\GPT-SoVITS\runtime\Lib\site-packages\torchaudio\lib\libtorchaudio_ffmpeg.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "E:\GPT-SoVITS\runtime\lib\site-packages\torchaudio_extension_init_.py", line 67, in
_init_ffmpeg()
File "E:\GPT-SoVITS\runtime\lib\site-packages\torchaudio_extension\utils.py", line 87, in _init_ffmpeg
raise ImportError("FFmpeg libraries are not found. Please install FFmpeg.") from err
ImportError: FFmpeg libraries are not found. Please install FFmpeg.

Colab meet FileNotFoundError

I have installed all the environment dependencies, which can be used on ubuntu system, It works well on ubuntu, but it can't be used on colab. I found that it can't find users.pth, but in the ubuntu, it is in the directory of python.

this is the error information in colab, how can i fix it.

    with open("%s/users.pth"%(site_packages_root),"w")as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/GPT-SoVITS/runtime/Lib/site-packages/users.pth'

TTS推理阶段出现全nan情况

代码中打印decode得到的audio tensor，但是全都是nan，然后输出的音频都是空的，直接用pretrain的模型也会出现这样的问题，请问需要修改什么

当前底模是多大的数据集训练的？

感谢开源。想了解一下当前底模训练相关的一些信息，如题大概是多大的数据集在什么配置规格下需要训练多久？可以自己训练更多语言的底模吗？

使用4090（在AUTODL上）出现了以下几个问题

FFmpeg报错

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.10/site-packages/torchaudio/_extension/__init__.py", line 67, in <module>
    _init_ffmpeg()
  File "/root/miniconda3/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 87, in _init_ffmpeg
    raise ImportError("FFmpeg libraries are not found. Please install FFmpeg.") from err
ImportError: FFmpeg libraries are not found. Please install FFmpeg.

但是ffmpeg -h是正常的，也尝试过

- sudo apt install ffmpeg

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 141 not upgraded.

6006端口被占用
手动改了下源码修改到其他端口解决了这个问题（？）
推理时出现error

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.10/site-packages/gradio/routes.py", line 442, in run_predict
    output = await app.get_blocks().process_api(
  File "/root/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1389, in process_api
    result = await self.call_function(
  File "/root/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1108, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 346, in async_iteration
    return await iterator.__anext__()
  File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 339, in __anext__
    return await anyio.to_thread.run_sync(
  File "/root/miniconda3/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/root/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/root/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 322, in run_sync_iterator_async
    return next(iterator)
  File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 691, in gen_wrapper
    yield from f(*args, **kwargs)
  File "/root/autodl-tmp/workdir/GPT-SoVITS/GPT_SoVITS/inference_webui.py", line 135, in get_tts_wav
    bert = torch.cat([bert1, bert2], 1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 500, b'Internal Server Error', [(b'date', b'Wed, 17 Jan 2024 10:14:44 GMT'), (b'server', b'uvicorn'), (b'content-length', b'14'), (b'content-type', b'application/json')])
INFO:httpx:HTTP Request: POST http://localhost:7896/api/predict "HTTP/1.1 500 Internal Server Error"

这部分我看没有弹出url可以直接访问，我使用了AUTODL提供的端口转发，7896端口（修改了源码中的6006，因为被占用了）到本地进行操作。（点击推理后出现error）

此外，问问目前是否有办法支持使用中文以外的语料吗
音频分割目前支持的应该还可以
标注方面，我本地通过fast-wisper可以标注
但是后面的BERT和SSL是否有模型可以直接使用

（其他卡留待验证，autodl没卡了，有机会再试试3090）

还有一些问题我没进一步研究，先询问一下
每一步的操作应该不会依赖前面的文件吧，比如我直接提供标注好的数据（.wav和.list），然后从后面的步骤进行（input之类的文件夹就留空了），应该不会影响吧
另外是推理应该只依赖于预训练模型和GPT（.pth）、SOVIT（.ckpt）的两个权重文件吧，如果只想进行推理，仅拷贝这两个文件到指定文件夹启动即可？

Can we have a mps?

I currently ran into a bit of a problem that may have something to do with my CUDA. I'm a MacBook M1 user, so naturally, I don't have a GPU that fits CUDA. Normally I would expect setting CPU to the device as an alternative, which, for the record, I did see the codes, but it did not work smoothly on my device. Torch has launched mps for Apple Silicon users as an alternative to CUDA, I was wondering when the developer can update this.

The following is the Error I received when I was formatting the train set(1-训练集格式化工具). Maybe I got it all wrong why this error happen, please kindly help solve this.

"/Users/improvise/miniconda/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
"/Users/improvise/miniconda/envs/GPTSoVits/bin/python" GPT_SoVITS/prepare_datasets/1-get-text.py
Traceback (most recent call last):
  File "/Users/improvise/Desktop/GPT-SoVITS-main/GPT_SoVITS/prepare_datasets/1-get-text.py", line 53, in <module>
    bert_model = bert_model.half().to(device)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2460, in to
    return super().to(*args, **kwargs)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1160, in to
Traceback (most recent call last):
  File "/Users/improvise/Desktop/GPT-SoVITS-main/GPT_SoVITS/prepare_datasets/1-get-text.py", line 53, in <module>
    bert_model = bert_model.half().to(device)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2460, in to
    return super().to(*args, **kwargs)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
    return self._apply(convert)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 833, in _apply
    module._apply(fn)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
    param_applied = fn(param)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/sit    module._apply(fn)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 810, in _apply
e-packages/torch/nn/modules/module.py", line 1158, in convert
    module._apply(fn)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 833, in _apply
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
    param_applied = fn(param)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1158, in convert
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/Users/improvise/miniconda/envs/GPTSoVits/lib/python3.9/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Traceback (most recent call last):
  File "/Users/improvise/Desktop/GPT-SoVITS-main/webui.py", line 529, in open1abc
    with open(txt_path, "r",encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'logs/test01/2-name2text-0.txt'

ps. The output in the folder logs is like this: /Users/improvise/Desktop/GPT-SoVITS-main/logs/test01/3-bert, the folder is empty.

Can you add Korean?

训练和推理，能否加入韩语支持，谢谢~
Can you add Korean, thank you~

开始GPT训练时报错system error: 10049和ZeroDivisionError

前面的步骤都没问题，SoVITS模型都训练完了，然后点GPT模型训练时出了问题
"runtime\python" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:56564 (system error: 10049 - 在其上下文中，该请求的地址无效。).
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:56564 (system error: 10049 - 在其上下文中，该请求的地址无效。).

distributed_backend=gloo
All distributed processes registered. Starting with 1 processes

semantic_data_len: 0
phoneme_data_len: 2474
Traceback (most recent call last):
File "F:\AI\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 139, in
main(args)
File "F:\AI\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 116, in main
trainer.fit(model, data_module, ckpt_path=ckpt_path)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 950, in _run
call._call_setup_hook(self) # allow user to setup lightning_module in accelerator environment
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 92, in _call_setup_hook
_call_lightning_datamodule_hook(trainer, "setup", stage=fn)
File "F:\AI\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 179, in _call_lightning_datamodule_hook
return fn(*args, **kwargs)
File "F:\AI\GPT-SoVITS\GPT_SoVITS\AR\data\data_module.py", line 22, in setup
self._train_dataset = Text2SemanticDataset(
File "F:\AI\GPT-SoVITS\GPT_SoVITS\AR\data\dataset.py", line 96, in init
self.init_batch()
File "F:\AI\GPT-SoVITS\GPT_SoVITS\AR\data\dataset.py", line 170, in init_batch
for _ in range(max(2,int(min_num/leng))):
ZeroDivisionError: division by zero

ImportError: cannot import name 'deprecated' / 'Doc' from 'typing_extensions'

originally the importerror was deprecated but after following tiangolo/fastapi#9808 it became doc instead

写了个colab请各位过目debug

https://colab.research.google.com/drive/1_RSwX_fdiw_lst_FntfJ--vD6IrusQaH?usp=sharing
没什么问题的话我就开pr了

Mixed language

Thank you for your open-source contributions.
Could you develop a guide to support mixed language? For example, a sentence existing both Chinese and English.

No runtime file

Thanks for your open-source sharing. It is a greatness project!
However, It might not be the correct version because some critical files are missing.

app.queue(concurrency_count=511, max_size=1022)

Traceback (most recent call last):
File "D:\0GitHubtest\GPT-SoVITS-main\GPT_SoVITS\inference_webui.py", line 355, in
app.queue(concurrency_count=511, max_size=1022).launch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\10029.conda\envs\svits\Lib\site-packages\gradio\blocks.py", line 1715, in queue
raise DeprecationWarning(
DeprecationWarning: concurrency_count has been deprecated. Set the concurrency_limit directly on event listeners e.g. btn.click(fn, ..., concurrency_limit=10) or gr.Interface(concurrency_limit=10). If necessary, the total number of workers can be configured via max_threads in launch().
但是这不重要了，因为我已经把concurrency_count=511 remove了......

训练完成，呑字比较严重过

经常会出现有的字词没念，有的字词重复的样子。是我一个人有这样的问题吗？

mac下执行webui.py出现错误提示

您好，请教一下，我使用的是mac系统，按照安装说明搭建完环境以后，执行webui.py出现以下的错误提示，请问这是哪里出现了问题呢？希望您抽空回覆，不胜感激。（另使用您在B站提供的整合包，运行依然是提示如下的错误。）

By default, 'file' is written in the MIFF image format.  To
specify a particular image format, precede the filename with an image
format name and a colon (i.e. ps:image) or specify the image type as
the filename suffix (i.e. image.ps).  Specify 'file' as '-' for
standard input or output.
import: delegate library support not built-in '' (X11) @ error/import.c/ImportImageCommand/1302.
: command not found 
./webui.py: line 4: syntax error near unexpected token `"ignore"'
'/webui.py: line 4: `warnings.filterwarnings("ignore")

新手期待出一个整合包

对于新手来说，自己搭建配置环境太复杂，要是有大大能出一个整合包，感激不尽

【Webui问题咨询】推理时出现第一句话被吞

已经确保参考音频和打标是对的
经常出现这么一种情况：

原文本：反正。。你们也不需要这么做对吧
推理输出音频：你们也不需要这么做对吧

在原文本前面加个“啊，”

原文本：啊，反正。。你们也不需要这么做对吧
推理输出音频：反正。。你们也不需要这么做对吧

不知道为什么第一个停顿的句子有时候会被吞，有时候却不会

在特定情况下t2s_model产生bad zero prediction导致输入序列被错误地作为输出处理，引起参考音频泄露

在某些特定的参考音频和prompt text的组合下，t2s_model会输出bad zero prediction情况，并返回idx=0。这会导致inference_webui.py的第211-213行的

        pred_semantic = pred_semantic[:, -idx:].unsqueeze(
            0
        )  # .unsqueeze(0)#mq要多unsqueeze一次

处的pred_semantic[:, -0:]错误地将输入序列包含在输出中。
这里是否应该做一个错误检查？

有试了zeroshot的朋友，反馈一下效果如何吗？

zeroshot机械音比较重，基本不可用，fewshot有人试了效果如何吗

整合包訓練GPT時出現錯誤:RuntimeError: unmatched '}' in format string

D:\GPT-SoVITS>runtime\python.exe webui.py
Running on local URL:  http://0.0.0.0:9874
"D:\GPT-SoVITS\runtime\python.exe" GPT_SoVITS/s1_train.py --config_file "TEMP/tmp_s1.yaml"
Seed set to 1234
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<All keys matched successfully>
ckpt_path: None
[rank: 0] Seed set to 1234
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
  File "D:\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 171, in <module>
    main(args)
  File "D:\GPT-SoVITS\GPT_SoVITS\s1_train.py", line 147, in main
    trainer.fit(model, data_module, ckpt_path=ckpt_path)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
    self.strategy.setup_environment()
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 148, in setup_environment
    self.setup_distributed()
  File "D:\GPT-SoVITS\runtime\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 199, in setup_distributed
    _init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
    torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\distributed_c10d.py", line 888, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 245, in _env_rendezvous_handler
    store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
  File "D:\GPT-SoVITS\runtime\lib\site-packages\torch\distributed\rendezvous.py", line 176, in _create_c10d_store
    return TCPStore(
RuntimeError: unmatched '}' in format string

尝试训练日文数据时出错

操作：一键三连后点击开始Sovits训练

"runtime\python" GPT_SoVITS/s2_train.py --config "TEMP/tmp_s2.json"
INFO:shun:{'train': {'log_interval': 100, 'eval_interval': 500, 'seed': 1234, 'epochs': 15, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 12, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 20480, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'text_low_lr_rate': 0.4, 'pretrained_s2G': 'GPT_SoVITS/pretrained_models/s2G488k.pth', 'pretrained_s2D': 'GPT_SoVITS/pretrained_models/s2D488k.pth', 'if_save_latest': True, 'if_save_every_weights': True, 'save_every_epoch': 2, 'gpu_numbers': '0'}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 32000, 'filter_length': 2048, 'hop_length': 640, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 300, 'cleaned_text': True, 'exp_dir': 'logs/shun'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 512, 'semantic_frame_rate': '25hz', 'freeze_quantizer': True}, 's2_ckpt_dir': 'logs/shun', 'content_module': 'cnhubert', 'save_weight_dir': 'SoVITS_weights', 'name': 'shun', 'pretrain': None, 'resume_step': None}
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [TopPC4090]:54275 (system error: 10049 - 在其上下文中，该请求的地址无效。).
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [TopPC4090]:54275 (system error: 10049 - 在其上下文中，该请求的地址无效。).
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
Traceback (most recent call last):
  File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s2_train.py", line 402, in <module>
    main()
  File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s2_train.py", line 53, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
    while not context.join():
  File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\s2_train.py", line 69, in run
    train_dataset = TextAudioSpeakerLoader(hps.data)########
  File "D:\AI workflow\Sound\GPT-SoVITS\GPT-SoVITS\GPT-SoVITS\GPT_SoVITS\module\data_utils.py", line 54, in __init__
    for _ in range(max(2, int(min_num / leng))):
ZeroDivisionError: division by zero

打标数据是之前bertvits2里面使用的，直接搬过来改成了绝对路径

RVQ模块是预训练的吗

hubert特征的RVQ模块请问是预训练的吗？还是跟着sovits模型，从头开始训的？

	slicer = Slicer(
	sr=32000, # 长音频采样率
	threshold= int(threshold), # 音量小于这个值视作静音的备选切割点
	min_length= int(min_length), # 每段最小多长，如果第一段太短一直和后面段连起来直到超过这个值
	min_interval= int(min_interval), # 最短切割间隔
	hop_size= int(hop_size), # 怎么算音量曲线，越小精度越大计算量越高（不是精度越大效果越好）
	max_sil_kept= int(max_sil_kept), # 切完后静音最多留多长
	)

rvc-boss / gpt-sovits Goto Github PK

gpt-sovits's People

Contributors

Stargazers

Watchers

Forkers

gpt-sovits's Issues

distributed_backend=gloo All distributed processes registered. Starting with 1 processes

Recommend Projects

Recommend Topics

Recommend Org

distributed_backend=gloo
All distributed processes registered. Starting with 1 processes