Code Monkey home page Code Monkey logo

transferlearning-clvc's Introduction

TransferLearning-CLVC

Imlementation of "Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion."

We provide our pretrained monolingual uni-directional acoustic model (in the ppg/ directory) and speaker encoder (in the spk_embedder/ directory) for reproduction of our multispeaker VC model. It may not generate the best result, but it's good enough.

All the VC data are from Voice Conversion Challenge 2020 and all the generated speech are submitted to the challenge for listening review, including intra-lingual and cross-lingual VC tasks.

Audio samples of our best model can be found here. For more details, please refer to our paper.

Requirements

  • python 3.6
  • pytorch 1.1
  • librosa
  • h5py
  • scipy
  • tensorboardX
  • apex

Preprocessing

  1. Clone this repository.
  2. Access data from VCC 2020. Inside the "vcc2020_training" folder there should be 14 speakers, and in the "vcc2020_evaluation" folder there should be 4 source speakers.
  3. Prepare training data for Waveglow vocoder.
python prepare_h5.py --mode 0 -vcc "path_to_vcc2020_training" 

This would generate an h5 file that concatenates all the speech for each speaker.

  1. Prepare training data for the conversion model.
python prepare_h5.py --mode 1

This would convert the speech into input features, d-vectors, and mel-spectrograms.

Training Waveglow vocoder

  1. (Optional) Modify the config_24k.json for hyperparameters.
  2. Run the training script
python train.py -c config_24k.json

The training would take a few days. Please be patient.

Training the conversion model

  1. Modify common/hparams_spk.py for your desired checkpoint directory and hyperparameters. Be aware that the "n_symbols" can only be 72 or 514, depending on which feature you want to use.

  2. Run the training script

python train_ppg2mel_spk.py

Ideally it takes a few days. We stopped at the 30k to 50kth checkpoint.

Testing

  1. Run the testing script
python convert_speech_vcc.py -vcc "path_to_vcc2020_evaluation" -ch "checkpoint_of_conversion_model" -m "ppg_model_you_used" -wg "waveglow_checkpoint" -o "vcc2020_evaluation/output_directory/"

converted wav files are in the output directory in the format of "target_source_wavname.wav"

Reference

  1. guanlongzhao's fac-via-ppg
  2. NVIDIA's Waveglow and Tacotron2
  3. pytorch's audio

transferlearning-clvc's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.