Code Monkey home page Code Monkey logo

lora-svc's Introduction

singing voice conversion based on whisper & maxgan

Open in Colab

基于人工智能三大巨头的黑科技:

来至OpenAI的whisper,68万小时多种语言

来至Nvidia的bigvgan,语音生成抗锯齿

来至Microsoft的adapter,高效率微调

插件式开源歌声声库模型,依据LoRA原理:

基于大量数据,从零开始训练模型,使用分支:lora-svc-for-pretrain

lora-svc-baker.mp4

下面是基于预训练模型定制专有音色

训练

  • 1 数据准备,将音频切分小于30S(推荐10S左右/可以不依照句子结尾), 转换采样率为16000Hz, 将音频数据放到 ./data_svc/waves

    这个我想你会~~~

  • 2 下载音色编码器: Speaker-Encoder by @mueller91, 解压文件,把 best_model.pth.tar 放到目录 speaker_pretrain/

    提取每个音频文件的音色

    python svc_preprocess_speaker.py ./data_svc/waves ./data_svc/speaker

  • 3 下载whisper模型 multiple language medium model, 确定下载的是medium.pt,把它放到文件夹 whisper_pretrain/ 中,提取每个音频的内容编码

    sudo apt update && sudo apt install ffmpeg

    python svc_preprocess_ppg.py -w ./data_svc/waves -p ./data_svc/whisper

  • 4 提取基音,同时生成训练文件 filelist/train.txt,剪切train的前5条用于制作filelist/eval.txt

    python svc_preprocess_f0.py

  • 5 取所有音频音色的平均作为目标发音人的音色,并完成声域分析

    python svc_preprocess_speaker_lora.py ./data_svc/

    生成 lora_speaker.npy 和 lora_pitch_statics.npy 两个文件

  • 6 从release页面下载预训练模型maxgan_pretrain_5L.pth,放到model_pretrain文件夹中,预训练模型中包含了生成器和判别器

    python svc_trainer.py -c config/maxgan.yaml -n lora -p model_pretrain/maxgan_pretrain_5L.pth

你的文件目录应该长这个样子~~~

data_svc/
│
└── lora_speaker.npy
│
└── lora_pitch_statics.npy
│
└── pitch
│     ├── 000001.pit.npy
│     ├── 000002.pit.npy
│     └── 000003.pit.npy
└── speakers
│     ├── 000001.spk.npy
│     ├── 000002.spk.npy
│     └── 000003.spk.npy
└── waves
│     ├── 000001.wav
│     ├── 000002.wav
│     └── 000003.wav
└── whisper
      ├── 000001.ppg.npy
      ├── 000002.ppg.npy
      └── 000003.ppg.npy

训练LoRA

设置开关

https://github.com/PlayVoice/lora-svc/blob/d3a1df57e6019c12513bb34e1bd5c8162d5e5055/config/maxgan.yaml#L16

https://github.com/PlayVoice/lora-svc/blob/d3a1df57e6019c12513bb34e1bd5c8162d5e5055/utils/train.py#L34-L35

使用场景

  • 极低资源,防止过拟合

  • 插件式声库开发

  • 其他场景,建议关闭

egs: 使用50句猫雷、训练十分钟的日志如下

maolei.mp4

推理

导出生成器,判别器只会在训练中用到

python svc_inference_export.py --config config/maxgan.yaml --checkpoint_path chkpt/lora/lora_0090.pt

导出的模型在当前文件夹maxgan_g.pth,文件大小为54.3M;maxgan_lora.pth为微调模块,文件大小为0.94M

python svc_inference.py --config config/maxgan.yaml --model maxgan_g.pth --spk ./data_svc/lora_speaker.npy --wave test.wav

生成文件在当前目录svc_out.wav;同时生成svc_out_pitch.wav,用于直观显示基音提取结果。

?生成的音色不太像!

  • 1 发音人音域统计

    训练第5步生成:lora_pitch_statics.npy

  • 2 推理音区偏移

    指定pitch参数:

    python svc_inference.py --config config/maxgan.yaml --model maxgan_g.pth --spk ./data_svc/lora_speaker.npy --statics ./data_svc/lora_pitch_statics.npy --wave test.wav

频率扩展:16K->48K

python svc_bandex.py -w svc_out.wav

在当前目录生成svc_out_48k.wav

音质增强

DiffSinger 社区声码器项目 下载基于预训练声码器的增强器,并解压至 nsf_hifigan_pretrain/ 文件夹。 注意:你应当下载名称中带有nsf_hifigan的压缩文件,而非nsf_hifigan_finetune

将频率扩张后生成的svc_out_48k.wav复制到path\to\input\wavs,运行

python svc_val_nsf_hifigan.py

在path\to\output\wavs生成增强后的文件

代码来源和参考文献

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

AdaSpeech: Adaptive Text to Speech for Custom Voice

https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/01-nsf

https://github.com/mindslab-ai/univnet [paper]

https://github.com/openai/whisper/ [paper]

https://github.com/NVIDIA/BigVGAN [paper]

https://github.com/brentspell/hifi-gan-bwe

https://github.com/openvpi/DiffSinger

https://github.com/chenwj1989/pafx

贡献者

注意事项

If you adopt the code or idea of this project, please list it in your project, which is the basic criterion for the continuation of the open source spirit.

如果你采用了本项目的代码或创意,请在你的项目中列出,这是开源精神得以延续的基本准则。

このプロジェクトのコードやアイデアを採用した場合は、オープンソースの精神が続く基本的なガイドラインであるプロジェクトにリストしてください。

lora-svc's People

Contributors

maxmax2016 avatar kakaruhayate avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.