Code Monkey home page Code Monkey logo

autovc's Introduction

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

This is an unofficial implementation of AutoVC based on the official one. The D-Vector and vocoder are from yistlin/dvector and yistLin/universal-vocoder respectively.

This implementation supports torch.jit, so the full model can be loaded with simply one line:

model = torch.jit.load(model_path)

Pre-trained models are available here.

Preprocessing

python preprocess.py <data_dir> <save_dir> <encoder_path> [--seg_len seg] [--n_workers workers]
  • data_dir: The directory of speakers.
  • save_dir: The directory to save the processed files.
  • encoder_path: The path of pre-trained D-Vector.
  • seg: The length of segments for training.
  • workers: The number of workers for preprocessing.

Training

python train.py <config_path> <data_dir> <save_dir> [--n_steps steps] [--save_steps save] [--log_steps log] [--batch_size batch] [--seg_len seg]
  • config_path: The config file of model hyperparameters.
  • data_dir: The directory of preprocessed data.
  • save_dir: The directory to save the model.
  • steps: The number of training steps.
  • save: To save the model every save steps.
  • log: To record training information every log steps.
  • batch: The batch size.
  • seg: The length of segments for training.

Inference

python inference.py <model_path> <vocoder_path> <source> <target> <output>
  • model_path: The path of the model file.
  • vocoder_path: The path of the vocoder file.
  • source: The utterance providing linguistic content.
  • target: The utterance providing target speaker timbre.
  • output: The converted utterance.

Reference

Please cite the paper if you find it useful.

@InProceedings{pmlr-v97-qian19c,
  title = {{A}uto{VC}: Zero-Shot Voice Style Transfer with Only Autoencoder Loss},
  author = {Qian, Kaizhi and Zhang, Yang and Chang, Shiyu and Yang, Xuesong and Hasegawa-Johnson, Mark},
  pages = {5210--5219},
  year = {2019},
  editor = {Kamalika Chaudhuri and Ruslan Salakhutdinov},
  volume = {97},
  series = {Proceedings of Machine Learning Research},
  address = {Long Beach, California, USA},
  month = {09--15 Jun},
  publisher = {PMLR},
  pdf = {http://proceedings.mlr.press/v97/qian19c/qian19c.pdf},
  url = {http://proceedings.mlr.press/v97/qian19c.html}
}

autovc's People

Contributors

auspicious3000 avatar barbany avatar cyhuang-tw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

autovc's Issues

some question with pretrained model

hi,Thanks for your very valuable work.
Can you tell me what data you use to train AutoVC, is it all the data of VCTK?
Or did you do some dataset segmentation?

About source and target files

Hi, I was trying to use the inference.py script with the pretrained models you mention in README. I'm not sure if source and target are meant to be wav files? I'm trying this on windows10 with anaconda and get lots of errors like:

TypeError: Invalid file: WindowsPath('source.wav')

The synthesis wav is bad

Hello Sir, thank you very much for your sharing.

I tested it with your pre_train model and it didn't work very well. Even if the two inputs use the same audio, the effect is still poor. Have you encountered such a problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.