Code Monkey home page Code Monkey logo

retrieval-based-voice-conversion-webui's Introduction

Retrieval-based-Voice-Conversion-WebUI

一个基于VITS的简单易用的变声框架

madewithlove


RVC v1 RVC v2 Licence Huggingface

Discord

更新日志 | 常见问题解答 | AutoDL·5毛钱训练AI歌手 | 对照实验记录 | 在线演示

English | 中文简体 | 日本語 | 한국어 (韓國語) | Français | Türkçe | Português

底模使用接近50小时的开源高质量VCTK训练集训练,无版权方面的顾虑,请大家放心使用

请期待RVCv3的底模,参数更大,数据集更大,效果更好,基本持平的推理速度,需要训练数据量更少。

由于某些地区无法直连Hugging Face,即使设法成功访问,速度也十分缓慢,特推出模型/整合包/工具的一键下载器,欢迎试用:RVC-Models-Downloader

训练推理界面 实时变声界面
go-web.bat go-realtime-gui.bat
可以自由选择想要执行的操作。 我们已经实现端到端170ms延迟。如使用ASIO输入输出设备,已能实现端到端90ms延迟,但非常依赖硬件驱动支持。

简介

本仓库具有以下特点

  • 使用top1检索替换输入源特征为训练集特征来杜绝音色泄漏
  • 即便在相对较差的显卡上也能快速训练
  • 使用少量数据进行训练也能得到较好结果(推荐至少收集10分钟低底噪语音数据)
  • 可以通过模型融合来改变音色(借助ckpt处理选项卡中的ckpt-merge)
  • 简单易用的网页界面
  • 可调用UVR5模型来快速分离人声和伴奏
  • 使用最先进的人声音高提取算法InterSpeech2023-RMVPE根绝哑音问题,效果更好,运行更快,资源占用更少
  • A卡I卡加速支持

点此查看我们的演示视频 !

环境配置

Python 版本限制

建议使用 conda 管理 Python 环境

版本限制原因参见此bug

python --version # 3.8 <= Python < 3.11

Linux/MacOS 一键依赖安装启动脚本

执行项目根目录下run.sh即可一键配置venv虚拟环境、自动安装所需依赖并启动主程序。

sh ./run.sh

手动安装依赖

  1. 安装pytorch及其核心依赖,若已安装则跳过。参考自: https://pytorch.org/get-started/locally/
    pip install torch torchvision torchaudio
  2. 如果是 win 系统 + Nvidia Ampere 架构(RTX30xx),根据 #21 的经验,需要指定 pytorch 对应的 CUDA 版本
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
  3. 根据自己的显卡安装对应依赖
  • N卡
     pip install -r requirements.txt
  • A卡/I卡
     pip install -r requirements-dml.txt
  • A卡ROCM(Linux)
     pip install -r requirements-amd.txt
  • I卡IPEX(Linux)
     pip install -r requirements-ipex.txt

其他资源准备

1. assets

RVC需要位于assets文件夹下的一些模型资源进行推理和训练。

自动检查/下载资源(默认)

默认情况下,RVC可在主程序启动时自动检查所需资源的完整性。

即使资源不完整,程序也将继续启动。

  • 如果您希望下载所有资源,请添加--update参数
  • 如果您希望跳过启动时的资源完整性检查,请添加--nocheck参数

手动下载资源

所有资源文件均位于Hugging Face space

你可以在tools文件夹找到下载它们的脚本

你也可以使用模型/整合包/工具的一键下载器:RVC-Models-Downloader

以下是一份清单,包括了所有RVC所需的预模型和其他文件的名称。

  • ./assets/hubert/hubert_base.pt
     rvcmd assets/hubert # RVC-Models-Downloader command
  • ./assets/pretrained
     rvcmd assets/v1 # RVC-Models-Downloader command
  • ./assets/uvr5_weights
     rvcmd assets/uvr5 # RVC-Models-Downloader command

想使用v2版本模型的话,需要额外下载

  • ./assets/pretrained_v2
     rvcmd assets/v2 # RVC-Models-Downloader command

2. 安装 ffmpeg 工具

若已安装ffmpegffprobe则可跳过此步骤。

Ubuntu/Debian 用户

sudo apt install ffmpeg

MacOS 用户

brew install ffmpeg

Windows 用户

下载后放置在根目录。

rvcmd tools/ffmpeg # RVC-Models-Downloader command

3. 下载 rmvpe 人声音高提取算法所需文件

如果你想使用最新的RMVPE人声音高提取算法,则你需要下载音高提取模型参数并放置于assets/rmvpe

  • 下载rmvpe.pt
     rvcmd assets/rmvpe # RVC-Models-Downloader command

下载 rmvpe 的 dml 环境(可选, A卡/I卡用户)

  • 下载rmvpe.onnx
     rvcmd assets/rmvpe # RVC-Models-Downloader command

4. AMD显卡Rocm(可选, 仅Linux)

如果你想基于AMD的Rocm技术在Linux系统上运行RVC,请先在这里安装所需的驱动。

若你使用的是Arch Linux,可以使用pacman来安装所需驱动:

pacman -S rocm-hip-sdk rocm-opencl-sdk

对于某些型号的显卡,你可能需要额外配置如下的环境变量(如:RX6700XT):

export ROCM_PATH=/opt/rocm
export HSA_OVERRIDE_GFX_VERSION=10.3.0

同时确保你的当前用户处于rendervideo用户组内:

sudo usermod -aG render $USERNAME
sudo usermod -aG video $USERNAME

开始使用

直接启动

使用以下指令来启动 WebUI

python infer-web.py

Linux/MacOS 用户

./run.sh

对于需要使用IPEX技术的I卡用户(仅Linux)

source /opt/intel/oneapi/setvars.sh
./run.sh

使用整合包 (Windows 用户)

下载并解压RVC-beta.7z,解压后双击go-web.bat即可一键启动。

rvcmd packs/general/latest # RVC-Models-Downloader command

参考项目

感谢所有贡献者作出的努力

retrieval-based-voice-conversion-webui's People

Contributors

blaise-tk avatar cccraim avatar chenxvb avatar cnchtu avatar dependabot[bot] avatar dogayagcizeybek avatar entropyriser avatar fumiama avatar github-actions[bot] avatar hironow avatar hiwanz avatar junityzhan avatar l4ph avatar mrhan1993 avatar ms903x1 avatar nadare881 avatar naozumi520 avatar narusemioshirakana avatar natoboram avatar pengoosedev avatar ricecakey06 avatar rsxdalv avatar rvc-boss avatar sgsavu avatar shaotz avatar sonphantrung avatar spice-z avatar tarepan avatar tps-f avatar yxlllc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

retrieval-based-voice-conversion-webui's Issues

Linux 下无模型加载时一直报错 FileNotFoundError: [Errno 2] No such file or directory: 'weights/[]'

OS: linux

python 执行 webui,功能正常,但会不停报错以下报文:

loading weights/[]
Traceback (most recent call last):
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/gradio/routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/gradio/blocks.py", line 1024, in process_api
result = await self.call_function(
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/gradio/blocks.py", line 836, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/mnt/E/AI/Mytts/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 167, in get_vc
cpt = torch.load(person, map_location="cpu")
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'weights/[]'

直到加载模型后解决,不知道能不能修复这个 bug

What is torchgen used for?

Hi,

torchgen is in the project's dependencies, but I couldn't find any information on how to use it.
How is this useful?

Question: The details of Pretraining

Hello,

Thank you for providing such great code. I would like to know more about the pretraining process to better utilize this code. Specifically, I would like to know the following information:

・The dataset used for pretraining
・Techniques used during training

For the second point, I would like information such as "gradually increasing the number of speakers" that is necessary for reproducing the training.

Thank you.

Accelerating Faiss retrieval using FastScan in Faiss

Thank you for the amazing software. I am particularly interested in the interesting applications of vector search. I am still in the process of setting up, but I plan to try running it soon.

While reading the source code, I noticed a point of concern in the faiss part and created an issue.

Currently, IVF512 is used in retrieval.
While I think this is simple and effective as a baseline on the GPU, I believe there are better index factory options when running on the CPU.
https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/6c7c1d933ffe2217edc74afadff7eec0078d6d16/infer/train-index.py#L19

This can be done using the FastScan method, by simply changing the index factory from "IVF512,Flat" to "IVF512PQ128x4fsr,Rflat" (512 is the original IVF's parameter, PQ128 indicates half of 256 dimention).

Since I haven't been able to run RVC yet, I'm not sure if this parameter is effective, but in most cases, it works effectively on both the CPU and GPU.
Once I run it and find it effective, I will report back in this issue.

Pitch wobble effect with speech generator

Hi, your software is pretty amazing, I can get good results in just 1000 epochs. However, I have some issue on some output files. I'm changing Man voice -> Woman voice (trained model). However, the results often show an unsteady pitch, a slight wobble effect. I was wondering if you could add a slider to smooth out the output pitch (or the input). When I'm using Voice Generator Gui (svcg), there is no such a problem. Maybe because they use Pad and chunk ? or maybe because I use Crepe prediction method?
However, your fork seems very promising, the sound quality is really nice, I just hope there is a way to make the pitch more natural for spoken voice.

pip install -r requirements.txt报错

请问是故意的还是不小心的←_←为什么会有googleads


Collecting googleads==3.8.0
  Using cached https://mirrors.aliyun.com/pypi/packages/fa/f8/f84ad483afaa29bfc807ab6e8a06b6712ee494a2aad7db545865655bdf99/googleads-3.8.0.tar.gz (23 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      error in googleads setup command: use_2to3 is invalid.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Document request: Training from scratch

Summary

There seems to be no document of training from scratch (base model training).
Can you share the method?
Once I successfully reproduce the result with shared info, I am grad to write document (make PR).

Current Status

Current RVC repository contain enough information for fine-tuning.
But, there is only little info about training from scratch (base model training).

README suggests that we can train base model with VCTK.
/train_nsf_sim_cache_sid_load_pretrain.py is training code, but it seems to be for fine-tuning.
/train/ contain some command txt, but it specify missing file (train_nsf_sim_cache_sid.py).

As a result, I cannnot reproduce base model.

Request

Can you share method to train base model?

Proposal

If you kindly share the method, I am grad to write documentation for future developers.

can't find my 3060

the webui can't find my RTX 3060 12G
image
image

And I got this error when training.
image
if I add the path myself ,it still have error
"ValueError: need at least one array to concatenate"
image

I install webui from 7z

In Colab, the epochs progress strangely fast, and an unfinished voice synthesis is generated.

In Colab, when I run training with the current ipynb notebook, the epochs progress very quickly (about 1 epoch in 6 seconds with Tesla T4 and batch size 14), and even after training for about 300 epochs, a low-quality, grainy voice synthesis weight.pth file is generated. I have over 1000 training files totaling more than 50 minutes. Is everyone else experiencing the same issue?

步骤0加载不到音频提示File Not Found

Use Language: zh_CN
Running on local URL:  http://0.0.0.0:7865
start preprocess
['trainset_preprocess_pipeline_print.py', 'E:\\BaiduYunDownload\\ayakaVoice', '40000', '12', 'E:\\PycharmProjects\\Retrieval-based-Voice-Conversion-WebUI/logs/test', 'False']
E:\BaiduYunDownload\ayakaVoice/1.wav->Traceback (most recent call last):
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\my_utils.py", line 14, in load_audio
    ffmpeg.input(file, threads=0)
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\.venv\lib\site-packages\ffmpeg\_run.py", line 313, in run
    process = run_async(
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\.venv\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
    return subprocess.Popen(
  File "e:\anaconda3\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "e:\anaconda3\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\trainset_preprocess_pipeline_print.py", line 73, in pipeline
    audio = load_audio(path, self.sr)
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\my_utils.py", line 19, in load_audio
    raise RuntimeError(f"Failed to load audio: {e}")
RuntimeError: Failed to load audio: [WinError 2] 系统找不到指定的文件。

E:\BaiduYunDownload\ayakaVoice/20.wav->Traceback (most recent call last):
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\my_utils.py", line 14, in load_audio
    ffmpeg.input(file, threads=0)
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\.venv\lib\site-packages\ffmpeg\_run.py", line 313, in run
    process = run_async(
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\.venv\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
    return subprocess.Popen(
  File "e:\anaconda3\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "e:\anaconda3\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\trainset_preprocess_pipeline_print.py", line 73, in pipeline
    audio = load_audio(path, self.sr)
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\my_utils.py", line 19, in load_audio
    raise RuntimeError(f"Failed to load audio: {e}")
RuntimeError: Failed to load audio: [WinError 2] 系统找不到指定的文件。

.........以下省略

一开始以为是我带了中文路径,但是把名字改了之后似乎还是不行..

训练结束后报错

训练结束生成模型文件但是报错
Traceback (most recent call last): File "train_nsf_sim_cache_sid_load_pretrain.py", line 684, in <module> main() File "train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main mp.spawn( File "E:\User\Voice\VoiceEnv\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "E:\User\Voice\VoiceEnv\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes while not context.join(): File "E:\User\Voice\VoiceEnv\lib\site-packages\torch\multiprocessing\spawn.py", line 149, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 2333333

pitch editor

do you think you can add a pitch editor to fix imperfections or upload your own pitch

Process 0 terminated with the following error:

I use RTX3060ti.

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\RVC-beta\RVC-beta\runtime\lib\site-packages\torch\serialization.py", line 441, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "C:\RVC-beta\RVC-beta\runtime\lib\site-packages\torch\serialization.py", line 668, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\caffe2\serialize\inline_container.cc:476] . PytorchStreamWriter failed writing file data/2229: file write failed

RuntimeError: [enforce fail at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\caffe2\serialize\inline_container.cc:337] . unexpected pos 256653056 vs 256652948

I tried several times but all cases made like this error.(like 256653056 vs 256652948,this number is changed by in cace.)

train speaker id info

What role does the speaker play on train or model interence ? how can i get speak info

Artefacting when speech has breath / Quality improvement ?

Hi, great work ! I'm so excited about your future updates. I noticed that the outputs usually don't handle well the Breaths. It creates artefacting most of the time. I was wondering if during training Breaths should be removed or kept to get the best results?
Also, would it be possible to use a high quality mode? like training and generating in 48Khz 24bits? is going over 1000 epochs also could get better and more natural results ?
The new generations of GPUs got more and more Vram, so it would be great to be able to use it at full power (for example 24gb of Vram).
Thanks for the great work!

灵活选择gpu

在infer阶段,如果有两个以上的gpu,程序会默认选择所有gpu进行推理。
这时如果有gpu已经被占用,就会发生内存不足的错误。
希望在infer阶段加入选择gpu的选项。

伴奏人声分离时报错:FileNotFoundError: [Errno 2] No such file or directory: './uvr5_pack/data.json'

以下是报错信息:

Traceback (most recent call last):
File "D:\AI\soundtrain\RVC\Retrieval-based-Voice-Conversion-WebUI\infer-web.py", line 224, in uvr
pre_fun = audio_pre(
File "D:\AI\soundtrain\RVC\Retrieval-based-Voice-Conversion-WebUI\infer_uvr5.py", line 47, in init
param_name, model_params_d = _get_name_params(model_path, model_hash)
File "D:\AI\soundtrain\RVC\Retrieval-based-Voice-Conversion-WebUI\uvr5_pack\utils.py", line 102, in _get_name_params
data = load_data()
File "D:\AI\soundtrain\RVC\Retrieval-based-Voice-Conversion-WebUI\uvr5_pack\utils.py", line 8, in load_data
with open(file_name, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: './uvr5_pack/data.json'

本地运行 `infer-web.py` 报错

Stacktrace:

Use Language: en_US
Running on local URL:  http://0.0.0.0:7865
tcgetpgrp failed: Not a tty
2023-04-19 23:51:45 | INFO | fairseq.tasks.hubert_pretraining | current directory is /......./Retrieval-based-Voice-Conversion-WebUI
2023-04-19 23:51:45 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-04-19 23:51:45 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
  File "/......./infer-web.py", line 141, in vc_single
    if_f0 = cpt.get("f0", 1)
NameError: name 'cpt' is not defined

我pull的是master branch的code,本地可以训练,训练后想试试voice conversion的时候报错。看了下 cpt 貌似确实在那个function里没有被定义

Preparation of tutorials in English

I would like to create a tutorial in /docs in markdown format. In the tutorial, I would like to write:

  • How to tune faiss for developers
  • Explanation of learning and inference parameters for beginners

First, I will write the tuning method of the former faiss and create a PR by ~4/19.

提取特征时在检测过去checkpoint会出错

File "./Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", lin
e 148, in run
utils.latest_checkpoint_path(hps.model_dir, "D_*.pth"), net_d, optim_d
File "./Retrieval-based-Voice-Conversion-WebUI/train/utils.py", line 206, in latest_checkpoin
t_path
x = f_list[-1]
IndexError: list index out of range

issue running, app.py file missing

hello it seems that the app.py file was not included in the download and thus i am unable to run the program . Do you think that the file goes under a different name?

Robotic / metallic noise on S letters

Hi, this is really a wonderful project, so far it is the one that has the best quality compared to its alternatives, incredible how well it works with small datasets (in my case 9 minutes of clean singing), and how well it recreates the voice timbre.
I only have one question, is it normal that in the S letters or breaths there is a metallic noise? is it produced by the vocoder? I have trained my model for a longer time thinking that the noise would go away, but it sounds exactly the same, either with 20 minutes of training or more than 1 hour. Do I need to do a longer training? How many epochs do you recommend?
Thanks and congratulations for the project!

No such file or directory: RVC-beta/preprocess.log'

I downloaded and unzipped RVC-Beta.7z, then installed the latest version from there under releases and put it inside.
However, when I was trying to run it.
I was trying to do step2a of the training and it came up.

runtime\python.exe trainset_preprocess_pipeline_print.py C:\Users\xxxxxx\Documents\VC 48000 16 D:\RVC-beta (1)\RVC-beta/logs/xxxxxxFalse
Traceback (most recent call last):.
File "D:\RVC-beta (1)\RVC-beta\trainset_preprocess_pipeline_print.py", line 20, in <module
f = open("%s/preprocess.log" % exp_dir, "a+")
FileNotFoundError: [Errno 2] No such file or directory: 'D:\RVC-beta/preprocess.log'
The following message is displayed.

What should I do?
Past versions were working.
(2023/04/10 version)
image

Audio does not load properly

When I started step2a with the version, there should be 1801 files in that file, but when I check the command prompt, the actual file, only 23 files are loaded.
It seems to work fine in other speakers' files.
However, only certain speakers do not seem to be working properly.
Is it just taking a long time?

Readme had the wrong command for torch installation on windows

Command was written on current readme:
pip install torch torchvision torchaudio
This is for Linux platform only
Correct command written on PyTorch website for CUDA 11.7:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

Doing this fixed my problem with RVC webui not detecing my RTX 3070

怎么解决的

          训练也跑不起来:

start preprocess
['trainset_preprocess_pipeline_print.py', 'I:\VoiceConversionWebUI\traning\input', '40000', '16', 'I:\VoiceConversionWebUI/logs/gemikovoice', 'False']
Fail. Traceback (most recent call last):
File "I:\VoiceConversionWebUI\trainset_preprocess_pipeline_print.py", line 90, in pipeline_mp_inp_dir
p.start()
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

end preprocess
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 107, in spawn_main
new_handle = reduction.duplicate(pipe_handle,
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\reduction.py", line 79, in duplicate
return _winapi.DuplicateHandle(
OSError: [WinError 6] 句柄无效。
start preprocess
['trainset_preprocess_pipeline_print.py', 'I:\VoiceConversionWebUI\traning\input', '40000', '16', 'I:\VoiceConversionWebUI/logs/gemikovoice', 'False']
Fail. Traceback (most recent call last):
File "I:\VoiceConversionWebUI\trainset_preprocess_pipeline_print.py", line 90, in pipeline_mp_inp_dir
p.start()
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

end preprocess

['extract_feature_print.py', '1', '0', 'I:\VoiceConversionWebUI/logs/gemikovoice']
I:\VoiceConversionWebUI/logs/gemikovoice
load model(s) from hubert_base.pt
2023-04-06 16:52:05 | INFO | fairseq.tasks.hubert_pretraining | current directory is I:\VoiceConversionWebUI
2023-04-06 16:52:05 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-04-06 16:52:05 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
no-feature-todo
['extract_feature_print.py', '1', '0', 'I:\VoiceConversionWebUI/logs/gemikovoice']
I:\VoiceConversionWebUI/logs/gemikovoice
load model(s) from hubert_base.pt
no-feature-todo

INFO:gemikovoice:{'train': {'log_interval': 200, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 4, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs\gemikovoice/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs\gemikovoice', 'experiment_dir': './logs\gemikovoice', 'save_every_epoch': 5, 'name': 'gemikovoice', 'total_epoch': 10, 'pretrainG': 'pretrained/G40k.pth', 'pretrainD': 'pretrained/D40k.pth', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 0, 'if_latest': 0, 'if_cache_data_in_gpu': 0}
WARNING:gemikovoice:I:\VoiceConversionWebUI\train is not a git repository, therefore hash value comparison will be ignored.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
gin_channels: 256 self.spk_embed_dim: 109
Traceback (most recent call last):
File "I:\VoiceConversionWebUI\train_nsf_sim_cache_sid_load_pretrain.py", line 121, in run
_, _, , epoch_str = utils.load_checkpoint(utils.latest_checkpoint_path(hps.model_dir, "D*.pth"), net_d, optim_d) # D多半加载没事
File "I:\VoiceConversionWebUI\train\utils.py", line 163, in latest_checkpoint_path
x = f_list[-1]
IndexError: list index out of range
INFO:gemikovoice:loaded pretrained pretrained/G40k.pth pretrained/D40k.pth


I:\VoiceConversionWebUI\venv\lib\site-packages\torch\cuda\amp\grad_scaler.py:120: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn("torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.")
INFO:gemikovoice:====> Epoch: 1
I:\VoiceConversionWebUI\venv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
INFO:gemikovoice:====> Epoch: 2
INFO:gemikovoice:====> Epoch: 3
INFO:gemikovoice:====> Epoch: 4
INFO:gemikovoice:Saving model and optimizer state at iteration 5 to ./logs\gemikovoice\G_0.pth
INFO:gemikovoice:Saving model and optimizer state at iteration 5 to ./logs\gemikovoice\D_0.pth
INFO:gemikovoice:====> Epoch: 5
INFO:gemikovoice:====> Epoch: 6
INFO:gemikovoice:====> Epoch: 7
INFO:gemikovoice:====> Epoch: 8
INFO:gemikovoice:====> Epoch: 9
INFO:gemikovoice:Saving model and optimizer state at iteration 10 to ./logs\gemikovoice\G_0.pth
INFO:gemikovoice:Saving model and optimizer state at iteration 10 to ./logs\gemikovoice\D_0.pth
INFO:gemikovoice:====> Epoch: 10
INFO:gemikovoice:Training is done. The program is closed.
saving final ckpt: Success.
Traceback (most recent call last):
File "I:\VoiceConversionWebUI\train_nsf_sim_cache_sid_load_pretrain.py", line 515, in
main()
File "I:\VoiceConversionWebUI\train_nsf_sim_cache_sid_load_pretrain.py", line 42, in main
mp.spawn(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "I:\VoiceConversionWebUI\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
while not context.join():
File "I:\VoiceConversionWebUI\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 2333333
Traceback (most recent call last):
File "I:\VoiceConversionWebUI\venv\lib\site-packages\gradio\routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\gradio\blocks.py", line 1108, in process_api
result = await self.call_function(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\gradio\blocks.py", line 929, in call_function
prediction = await anyio.to_thread.run_sync(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "I:\VoiceConversionWebUI\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "I:\VoiceConversionWebUI\venv\lib\site-packages\gradio\utils.py", line 490, in async_iteration
return next(iterator)
File "I:\VoiceConversionWebUI\infer-web.py", line 421, in train1key
big_npy = np.concatenate(npys, 0)
File "<array_function internals>", line 180, in concatenate
ValueError: need at least one array to concatenate

Originally posted by @NaughtDZ in #18 (comment)

RuntimeError: Given groups=1, weight of size [192, 513, 1], expected input[1, 1025, 368] to have 513 channels, but got 1025 channels instead

显卡是1050Ti,就按照教学视频跑了一遍,上传了总长约12分钟的音频,wav格式,32kHz和44100Hz采样率都试了试,报下面这个错。

Traceback (most recent call last):
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "E:\vits\RVC-beta\train_nsf_sim_cache_sid_load_pretrain.py", line 188, in run
train_and_evaluate(
File "E:\vits\RVC-beta\train_nsf_sim_cache_sid_load_pretrain.py", line 316, in train_and_evaluate
) = net_g(phone, phone_lengths, spec, spec_lengths, sid)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\parallel\distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\parallel\distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\infer_pack\models.py", line 644, in forward
z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\infer_pack\models.py", line 163, in forward
x = self.pre(x) * x_mask
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [192, 513, 1], expected input[1, 1025, 368] to have 513 channels, but got 1025 channels instead

Colab运行报错

环境: Google Colab
WebUI截图:
image
image
数据集是一些3秒左右的音频切片(mp3格式)

不知道是不是我的数据集导致的问题,如果是这样的话请教一下怎么调整我的数据集qwq

报错内容:

Traceback (most recent call last):
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 514, in <module>
    main()
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 42, in main
    mp.spawn(
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 150, in run
    train_and_evaluate(
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 195, in train_and_evaluate
    for batch_idx, info in enumerate(train_loader):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1326, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.9/dist-packages/torch/_utils.py", line 644, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 306, in __getitem__
    return self.get_audio_text_pair(self.audiopaths_and_text[index])
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 248, in get_audio_text_pair
    phone = self.get_labels(phone)
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 266, in get_labels
    phone = phone[:n_num, :]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

日志:

/content/Retrieval-based-Voice-Conversion-WebUI
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
Use Languane: en_US
2023-04-13 08:55:48.512021: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-13 08:55:49.817277: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-13 08:55:52 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://b2945b7454172d80c1.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
start preprocess
['trainset_preprocess_pipeline_print.py', '/content/drive/MyDrive/audiouploads', '40000', '2', '/content/Retrieval-based-Voice-Conversion-WebUI/logs/test', 'False']
/content/drive/MyDrive/audiouploads/dataset-1.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-11.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-14.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-16.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-18.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-2.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-21.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-23.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-25.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-27.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-29.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-30.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-32.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-35.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-37.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-39.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-40.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-42.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-44.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-46.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-48.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-6.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-8.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-10.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-13.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-15.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-17.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-19.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-20.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-22.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-24.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-26.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-28.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-3.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-31.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-34.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-36.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-38.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-4.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-41.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-43.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-45.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-47.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-5.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-7.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-9.mp3->Suc.
end preprocess

/content/drive/MyDrive/audiouploads/dataset-43.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-45.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-47.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-5.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-7.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-9.mp3->Suc.
end preprocess

2023-04-13 08:57:22.578351: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-13 08:57:24 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
['extract_feature_print.py', 'cuda:0', '1', '0', '0', '/content/Retrieval-based-Voice-Conversion-WebUI/logs/test']
/content/Retrieval-based-Voice-Conversion-WebUI/logs/test
load model(s) from hubert_base.pt
2023-04-13 08:57:24 | INFO | fairseq.tasks.hubert_pretraining | current directory is /content/Retrieval-based-Voice-Conversion-WebUI
2023-04-13 08:57:24 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-04-13 08:57:24 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
move model to cuda:0
all-feature-47
now-47,all-0,0_0.wav,(136, 256)
now-47,all-4,13_0.wav,(49, 256)
now-47,all-8,16_0.wav,(98, 256)
now-47,all-12,1_0.wav,(66, 256)
now-47,all-16,23_0.wav,(90, 256)
now-47,all-20,27_0.wav,(104, 256)
now-47,all-24,30_0.wav,(154, 256)
now-47,all-28,34_0.wav,(107, 256)
now-47,all-32,38_0.wav,(149, 256)
now-47,all-36,41_0.wav,(89, 256)
now-47,all-40,45_0.wav,(63, 256)
now-47,all-44,7_0.wav,(88, 256)
all-feature-done
['extract_feature_print.py', 'cuda:0', '1', '0', '0', '/content/Retrieval-based-Voice-Conversion-WebUI/logs/test']
/content/Retrieval-based-Voice-Conversion-WebUI/logs/test
load model(s) from hubert_base.pt
move model to cuda:0
all-feature-47
now-47,all-0,0_0.wav,(136, 256)
now-47,all-4,13_0.wav,(49, 256)
now-47,all-8,16_0.wav,(98, 256)
now-47,all-12,1_0.wav,(66, 256)
now-47,all-16,23_0.wav,(90, 256)
now-47,all-20,27_0.wav,(104, 256)
now-47,all-24,30_0.wav,(154, 256)
now-47,all-28,34_0.wav,(107, 256)
now-47,all-32,38_0.wav,(149, 256)
now-47,all-36,41_0.wav,(89, 256)
now-47,all-40,45_0.wav,(63, 256)
now-47,all-44,7_0.wav,(88, 256)
all-feature-done
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-04-13 08:57:34.028941: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-04-13 08:57:38.841118: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:test:{'train': {'log_interval': 200, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 4, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs/test/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs/test', 'experiment_dir': './logs/test', 'save_every_epoch': 5, 'name': 'test', 'total_epoch': 20, 'pretrainG': 'pretrained/G40k.pth', 'pretrainD': 'pretrained/D40k.pth', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 0, 'if_latest': 0, 'if_cache_data_in_gpu': 0}
WARNING:test:/content/Retrieval-based-Voice-Conversion-WebUI/train is not a git repository, therefore hash value comparison will be ignored.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
gin_channels: 256 self.spk_embed_dim: 109
Traceback (most recent call last):
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 121, in run
    _, _, _, epoch_str = utils.load_checkpoint(utils.latest_checkpoint_path(hps.model_dir, "D_*.pth"), net_d, optim_d)  # D多半加载没事
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/utils.py", line 163, in latest_checkpoint_path
    x = f_list[-1]
IndexError: list index out of range
INFO:test:loaded pretrained pretrained/G40k.pth pretrained/D40k.pth
<All keys matched successfully>
<All keys matched successfully>
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-04-13 08:57:54.293283: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-13 08:57:54.299536: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-04-13 08:57:54.309187: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
2023-04-13 08:57:54.576913: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/content/Retrieval-based-Voice-Conversion-WebUI/train/mel_processing.py:93: FutureWarning: Pass sr=40000, n_fft=2048, n_mels=125, fmin=0.0, fmax=None as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
/usr/local/lib/python3.9/dist-packages/torch/autograd/__init__.py:200: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [1, 21, 96], strides() = [43296, 96, 1]
bucket_view.sizes() = [1, 21, 96], strides() = [2016, 96, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:323.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
INFO:test:Train Epoch: 1 [0%]
INFO:test:[0, 0.0001]
INFO:test:loss_disc=3.231, loss_gen=2.107, loss_fm=9.420,loss_mel=30.646, loss_kl=5.000
DEBUG:matplotlib:matplotlib data path: /usr/local/lib/python3.9/dist-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/root/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
Traceback (most recent call last):
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 514, in <module>
    main()
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 42, in main
    mp.spawn(
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 150, in run
    train_and_evaluate(
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 195, in train_and_evaluate
    for batch_idx, info in enumerate(train_loader):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1326, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.9/dist-packages/torch/_utils.py", line 644, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 306, in __getitem__
    return self.get_audio_text_pair(self.audiopaths_and_text[index])
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 248, in get_audio_text_pair
    phone = self.get_labels(phone)
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 266, in get_labels
    phone = phone[:n_num, :]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Bug: `mel_spectrogram_torch` crash with argument error

Summary

mel_spectrogram_torch function cause argument error.
It is because of librosa version.
I made fix PR #133.

Status

When run mel_processing.mel_spectrogram_torch function, it cause error.
Core part of the error is

--> mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
...
TypeError: mel() takes 0 positional arguments but 5 were given

Env

Google Colaboratory @ 2023-04-23

!pip show librosa
# Version: 0.10.0.post2

Cause

It is bacause of librosa version.
Previsouly, librosa.filters.mel accept positional arguments, but in latest version it should be named arguments.

How to Fix

Use named argument instead of positional arguments.
I checked this fix resolve the error in librosa==0.10.0.post2.

Proposal

I made the pull request (#133).
Could you please review it?

推理后的歌曲长度和原长度有一点不一致,附可能的解决方案

使用的是0416updated版本,推理的歌曲长度,和原歌曲长度上有些许不一致,比如YOASOBI的“偶像”,原曲3:33:228,推理后是3:33:200,如果歌曲较长,节奏又较快的话,这种积累效应可能会听出来,影响成品品质。

我之前使用的一种切分方案,能精确到和原曲一致,供参考,主要思路是使用librosa.util.frame拆分,并为最后一部分做padding,最后一部分做特殊处理:
SAMPLE_RATE=48000
def main(args):
audio, sr = librosa.load(args.wave, sr=SR)
audio_length=len(audio)
pad_length = frame_length - (audio_length - frame_length) % hop_length # calculate the padding length
audio=np.pad(audio, (0, pad_length), mode='constant') # pad the array with zeros
# split the audio into frames of 30 seconds with zero overlap
frames = librosa.util.frame(audio, frame_length=frame_length, hop_length=hop_length)
frames = np.transpose(frames, (1, 0))
# initialize an empty list to store the processed frames
bwe_frames = []
for idx,frame in enumerate(frames):
# append the processed frame to the list
if idx==len(frames)-1:
bwe_frames.append(bwe_frame[:-pad_length*int(SAMPLE_RATE/SR)])
else:
bwe_frames.append(bwe_frame)
# concatenate the processed frames into a single array
bwe_audio = np.concatenate(bwe_frames)
write("svc_out_48k.wav", SAMPLE_RATE, bwe_audio)

probable bug in gui.py

I'm not at all familiar with python so its possible this isnt a bug but rather is an issue on my end.
I'm running this in a conda environment on linux mint 20. When I launch gui.py it gives an index error on line 220. I looked at the code in gui.py and found that when it runs default_value=input_devices[sd.default.device[0]] sd.default.device[0] is outside the range of input_devices. I replaced that line and the similar default output device line with default_value=input_devices[0] and default_value=output_devices[0] respectively and then it launched successfully.

执行一键包报错

Expecting value: line 1 column 1 (char 0)
提供webui截屏如下:
image
视频中指出说话人ID目前不需要进行改动:
image

Realtime Voice Conversion for RVC

A really really great job!!! I'm impressed with how short learning time was and how accurate your results are. I've developed a real-time voice conversion software using RVC, so if you don't mind I'd be really grateful if you could put a link on the Readme.

No such file or directory: 'ffmpeg'

Traceback (most recent call last):
File "/home/iot1/zhongzhilai/Retrieval-based-Voice-Conversion-WebUI/trainset_preprocess_pipeline_print.py", line 73, in pipeline
audio = load_audio(path, self.sr)
File "/home/iot1/zhongzhilai/Retrieval-based-Voice-Conversion-WebUI/my_utils.py", line 19, in load_audio
raise RuntimeError(f"Failed to load audio: {e}")
RuntimeError: Failed to load audio: [Errno 2] No such file or directory: 'ffmpeg'

/home/iot1/zhongzhilai/so-vits-svc/dataset_raw/speaker0/8_86.wav->Traceback (most recent call last):
File "/home/iot1/zhongzhilai/Retrieval-based-Voice-Conversion-WebUI/my_utils.py", line 14, in load_audio
ffmpeg.input(file, threads=0)
File "/home/iot1/anaconda3/envs/retrieval/lib/python3.9/site-packages/ffmpeg/_run.py", line 313, in run
process = run_async(
File "/home/iot1/anaconda3/envs/retrieval/lib/python3.9/site-packages/ffmpeg/_run.py", line 284, in run_async
return subprocess.Popen(
File "/home/iot1/anaconda3/envs/retrieval/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/iot1/anaconda3/envs/retrieval/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

Readme 里的ffmpeg相关的安装没看懂呢?

colab加个tensorboard吧

%load_ext tensorboard
%tensorboard --logdir /content/Retrieval-based-Voice-Conversion-WebUI/logs

在webUI那格子里,加在注释之前

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.