Code Monkey home page Code Monkey logo

cvc's Introduction

Contrastive Voice Conversion (CVC)




This implementation is based on CUT, thanks Taesung and Junyan for sharing codes.

We provide a PyTorch implementation of non-parallel voice conversion based on patch-wise contrastive learning and adversarial learning. Compared to baseline CycleGAN-VC, CVC only requires one-way GAN training when it comes to non-parallel one-to-one voice conversion, while improving speech quality and reducing training time.

Prerequisites

  • Linux or macOS
  • Python 3
  • CPU or NVIDIA GPU + CUDA CuDNN

Kick Start

  • Clone this repo:
git clone https://github.com/Tinglok/CVC
cd CVC
  • Install PyTorch 1.6 and other dependencies.

    For pip users, please type the command pip install -r requirements.txt.

    For Conda users, you can create a new Conda environment using conda env create -f environment.yaml.

  • Download pre-trained Parallel WaveGAN vocoder to ./checkpoints/vocoder.

CVC Training and Test

  • Download the VCTK dataset
cd dataset
wget http://datashare.is.ed.ac.uk/download/DS_10283_2651.zip
unzip DS_10283_2651.zip
unzip VCTK-Corpus.zip
cp -r ./VCTK-Corpus/wav48/p* ./voice/trainA
cp -r ./VCTK-Corpus/wav48/p* ./voice/trainB

where the speaker folder could be any speakers (e.g. p256, and p270).

  • Train the CVC model:
python train.py --dataroot ./datasets/voice --name CVC

The checkpoints will be stored at ./checkpoints/CVC/.

  • Test the CVC model:
python test.py --dataroot ./datasets/voice --validation_A_dir ./datasets/voice/trainA --output_A_dir ./checkpoints/CVC/converted_sound

The converted utterance will be saved at ./checkpoints/CVC/converted_sound.

Baseline CycleGAN-VC Training and Test

  • Train the CycleGAN-VC model:
python train.py --dataroot ./datasets/voice --name CycleGAN --model cycle_gan
  • Test the CycleGAN-VC model:
python test.py --dataroot ./datasets/voice --validation_A_dir ./datasets/voice/trainA --output_A_dir ./checkpoints/CycleGAN/converted_sound --model cycle_gan

The converted utterance will be saved at ./checkpoints/CycleGAN/converted_sound.

Pre-trained CVC Model

Pre-trained models on p270-to-p256 and many-to-p249 are avaliable at this URL.

TensorBoard Visualization

To view loss plots, run tensorboard --logdir=./checkpoints and click the URL http://localhost:6006/.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{li2021cvc,
  author={Tingle Li and Yichen Liu and Chenxu Hu and Hang Zhao},
  title={{CVC: Contrastive Learning for Non-Parallel Voice Conversion}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1324--1328}
}

cvc's People

Contributors

tinglok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cvc's Issues

Could you upload the Cyclegan-vc3 baseline's code?

The cyclegan in CVC baseline is the baseline of CUT. Do you have any feature extraction for wavforms? Like mcep or filterbank? Could you upload the Cyclegan-vc3 baseline's code? Thanks for your help.

And when I running the cyclegan in CVC by VCC2020 database. There are some error:

"UserWarning: Using a target size (torch.Size([1, 80, 135])) that is different to the input size (torch.Size([1, 80, 136])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

The size of tensor a (136) must match the size of tensor b (135) at non-singleton dimension 2"

I found the single wavform feature tensor's size is different. Have you met this kind of error when you run it on VCTK? Thank you so much.

error "there should be a subclass of BaseModel with class name that...."

Hey, thank you for the implementation !!
Im getting the error on line 45 in init.py in models, when starting cvc train as described in readme
In models.cvc_model.py, there should be a subclass of BaseModel with class name that matches cvcmodel in lowercase.
also, could you please specify the parallelwavegan config you used ? the one in the parameters is hard coded to your folder ?

testing error

Hi, I tried to run the CVC testing and it shows the error message :

FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/vocoder/checkpoint-1000000steps.pkl'

is there some setting I missed? as I can't found the "checkpoint-1000000steps.pkl" file after training, there is not .pkl file, just .pth, thanks for your help.

Evaluation of CycleGAN-VC3 in your paper

Hi,
Thank you for sharing the paperwork!
I wonder if you trained the Resnet-9blocked generator in CycleGAN-VC3 for your paper evaluation.
Since I followed your command instruction for training CyleGAN-VC3, it uses the same "netG" as the CVC framework.

about python and cuda version

hi, may I ask what python and cuda version are required for this project? as I tried install it and it shows :

The NVIDIA driver on your system is too old (found version 10010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

I run on rtx 2080 with cuda 10.1, ubuntu 18.04, thanks.

Many-to-One implementation

Hi, the code for training provided seems to be for one-to-one conversion, whereas the associated paper suggests that the model should also be capable of many-to-one conversion. Is there any change required for training the model for many-to-one conversion on the VCTK dataset?

About Train time

Hi, thanks for implementing this in pytorch.
In the paper, CVC's training time was 518 minutes(1000epoch).
But when I ran the code, it took an hour per epoch.

I think the amount of dataset is the problem, because when I prepare dataset,
I copied all of speakers in VCTK to ./voice/trainA and ./voice/trainB.

Is it the right way to use all speakers in VCTK in training? or just use sample two person A, B?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.