The autovc from cyhuang-tw

autovc's Introduction

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

This is an unofficial implementation of AutoVC based on the official one. The D-Vector and vocoder are from yistlin/dvector and yistLin/universal-vocoder respectively.

This implementation supports torch.jit, so the full model can be loaded with simply one line:

model = torch.jit.load(model_path)

Pre-trained models are available here.

Preprocessing

python preprocess.py <data_dir> <save_dir> <encoder_path> [--seg_len seg] [--n_workers workers]

data_dir: The directory of speakers.
save_dir: The directory to save the processed files.
encoder_path: The path of pre-trained D-Vector.
seg: The length of segments for training.
workers: The number of workers for preprocessing.

Training

python train.py <config_path> <data_dir> <save_dir> [--n_steps steps] [--save_steps save] [--log_steps log] [--batch_size batch] [--seg_len seg]

config_path: The config file of model hyperparameters.
data_dir: The directory of preprocessed data.
save_dir: The directory to save the model.
steps: The number of training steps.
save: To save the model every save steps.
log: To record training information every log steps.
batch: The batch size.
seg: The length of segments for training.

Inference

python inference.py <model_path> <vocoder_path> <source> <target> <output>

model_path: The path of the model file.
vocoder_path: The path of the vocoder file.
source: The utterance providing linguistic content.
target: The utterance providing target speaker timbre.
output: The converted utterance.

Reference

Please cite the paper if you find it useful.

@InProceedings{pmlr-v97-qian19c,
  title = {{A}uto{VC}: Zero-Shot Voice Style Transfer with Only Autoencoder Loss},
  author = {Qian, Kaizhi and Zhang, Yang and Chang, Shiyu and Yang, Xuesong and Hasegawa-Johnson, Mark},
  pages = {5210--5219},
  year = {2019},
  editor = {Kamalika Chaudhuri and Ruslan Salakhutdinov},
  volume = {97},
  series = {Proceedings of Machine Learning Research},
  address = {Long Beach, California, USA},
  month = {09--15 Jun},
  publisher = {PMLR},
  pdf = {http://proceedings.mlr.press/v97/qian19c/qian19c.pdf},
  url = {http://proceedings.mlr.press/v97/qian19c.html}
}

autovc's People

Contributors

Stargazers

Watchers

autovc's Issues

some question with pretrained model

hi，Thanks for your very valuable work.
Can you tell me what data you use to train AutoVC, is it all the data of VCTK?
Or did you do some dataset segmentation?

Link to pre-trained models not working

I am getting 404 not found.

Can this work with models from this repository: https://github.com/auspicious3000/autovc?tab=readme-ov-file

About source and target files

Hi, I was trying to use the inference.py script with the pretrained models you mention in README. I'm not sure if source and target are meant to be wav files? I'm trying this on windows10 with anaconda and get lots of errors like:

TypeError: Invalid file: WindowsPath('source.wav')

The synthesis wav is bad

Hello Sir, thank you very much for your sharing.

I tested it with your pre_train model and it didn't work very well. Even if the two inputs use the same audio, the effect is still poor. Have you encountered such a problem?

Recommend Projects

cyhuang-tw / autovc Goto Github PK

autovc's Introduction

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Preprocessing

Training

Inference

Reference

autovc's People

Contributors

Stargazers

Watchers

Forkers

autovc's Issues

some question with pretrained model

Link to pre-trained models not working

About source and target files

The synthesis wav is bad

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent