Code Monkey home page Code Monkey logo

autopst's Introduction

Global Prosody Style Transfer Without Text Transcriptions

This repository provides a PyTorch implementation of AutoPST, which enables unsupervised global prosody conversion without text transcriptions.

This is a short video that explains the main concepts of our work. If you find this work useful and use it in your research, please consider citing our paper.

SpeechSplit

@InProceedings{pmlr-v139-qian21b,
  title = 	 {Global Prosody Style Transfer Without Text Transcriptions},
  author =       {Qian, Kaizhi and Zhang, Yang and Chang, Shiyu and Xiong, Jinjun and Gan, Chuang and Cox, David and Hasegawa-Johnson, Mark},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8650--8660},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  url = 	 {http://proceedings.mlr.press/v139/qian21b.html}
}

Audio Demo

The audio demo for AutoPST can be found here

Dependencies

  • Python 3.6
  • Numpy
  • Scipy
  • PyTorch == v1.6.0
  • librosa
  • pysptk
  • soundfile
  • wavenet_vocoder pip install wavenet_vocoder==0.1.1 for more information, please refer to https://github.com/r9y9/wavenet_vocoder

To Run Demo

Download pre-trained models to assets

Download the same WaveNet vocoder model as in AutoVC to assets

The fast and high-quality hifi-gan v1 (https://github.com/jik876/hifi-gan) pre-trained model is now available here.

Please refer to AutoVC if you have any problems with the vocoder part, because they share the same vocoder scripts.

Run demo.ipynb

To Train

Download training data to assets. The provided training data is very small for code verification purpose only. Please use the scripts to prepare your own data for training.

  1. Prepare training data: python prepare_train_data.py

  2. Train 1st Stage: python main_1.py

  3. Train 2nd Stage: python main_2.py

Final Words

This project is part of an ongoing research. We hope this repo is useful for your research. If you need any help or have any suggestions on improving the framework, please raise an issue and we will do our best to get back to you as soon as possible.

autopst's People

Contributors

auspicious3000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

autopst's Issues

Missing basic execution with different set of speakers.

Hi there,
I am trying to follow the code with my own dataset and could run Main_1.py and main_2.py to get xxx-A.ckpt and xxx-B.ckpt files.
Now I am not able to understand to run the demo file to prepare specific speakers dictionary to create and convert. Any help is appreciated with a little more direction to follow the steps.

KeyError when run prepare_train_data.py

image

Hi, I got an error like this, when run prepare_train_data.py
Is spk2emb has vctk16-train-wav key?

vctk16-train-wav
Traceback (most recent call last):
File "prepare_train_data.py", line 52, in
submeta.append(spk2emb[subdir])
KeyError: 'vctk16-train-wav'

ModuleNotFoundError: No module named 'onmt'

Hi,
I run into an error about onmt


ModuleNotFoundError Traceback (most recent call last)
in
5 import torch.nn.functional as F
6 from collections import OrderedDict
----> 7 from onmt.utils.misc import sequence_mask
8 from model_autopst import Generator_2 as Predictor
9 from hparams_autopst import hparams

ModuleNotFoundError: No module named 'onmt'

but I see folder onmt_modules exists
image

then I install onmt(pip install onmt) and notice it's installing torch 1.3.0 although the requirements say that PyTorch == v1.6.0

image

Could you help me with this issue? What is the best approach to solve this?

How to test AutoPST in onother languages?

I have not success to test final section of vocoder code in autopst. In conda envionment, all dependencies have installed.
Error something like"from synthessis cannot import build_model" allways shows.
I want to ask you, do I need to train in AutoVC speakers, cause I want to use for my own sounds, in another language, not English?

just want to clone voice of one recording to onother. for different speakers.
Do these recordings need to be same lenght and same sentences spoken to be compared during training?

BTW, I have RTX3060, and this card not supported by version of 1.6.0 of pytorch. I have installed fist onmt python package, than pytorch 1.7.0 with Cuda 11
Thank you

How to train SEA model

The pretrained model sea.ckpt just fit dataset which have 82 speaker, However, I have a huge dataset including 300 speaker at least. How could I train a corresponding SAE model?

Inference with new input audio

Hi and thank you for this amazing project!

I was trying to create a notebook in colab that would allow me to input an audio file, then select the speaker and produce an output accordingly.

Here the code, it works but I am missing the part on how to change speaker timbre.
Do you have any tips on that?

Thanks a lot in advance!

Issue with stop prediction for longer utterances.

Hi @auspicious3000,

First, thanks for releasing this repository! I've been trying to compare AutoPST to some upcoming work but I'm having an issue with the stop token prediction when converting utterances longer than 1 or 2 seconds. I noticed that you clipped some of the VCTK files for your demo page (and in the test dictionary you provided) so that they're much shorter. How did you use the test utterances in your evaluations? Do you have any recommendations so that I can make as fair a comparison as possible.

Thanks,
Benjamin

test_vctk.meta

Hi,

I am wondering how the "test_vctk.meta" is created in the demo file?

Thanks!

How to solve SEA model problem

Hello.
I have referred to your paper.
Based on your experiment, I conducted experiment on accent transformation using English accent data from different countries.
But the result is very unsatisfactory, I can't even hear the transformed voice clearly.

I think there may be a problem in the process of training SEA model.
But I don't know exactly where the problem is.

The images show my code for training SEA.
Could you help me with this issue? What is the best approach to solve this?
1
2

Unable to reproduce results

@auspicious3000 We tried reproducing results using your codebase and the dataset found here https://datashare.ed.ac.uk/handle/10283/3443 (that you use) but unfortunately, we were unable to. The outputs that we have so far are extremely noisy (even if source and target speakers are the same). Could you please share a working code with us that might help us reproduce the results? I would greatly appreciate your inputs!

How to find mean and std of MFCC?

The mean and std I created are different from the values in mfcc_stats.pkl you provided.

Can you please check if I am doing something wrong?

I attached a simple code below.

thanks.


mfcc_list = list()
for path in tqdm(wav_path):
        wav, sampling_rate = sf.read(path) 
        mfcc = librosa.feature.mfcc(y=wav, sr=sampling_rate, n_mfcc=80, n_fft=1024, hop_length=256) # [80, T] 
        mfcc_list.append(mfcc)

mfcc_list = np.concatenate(mfcc_list , axis=1) # [80, T]
mfcc_mean = mfcc_list.mean(axis=1) # [80]
mfcc_std = mfcc_list.std(axis=1) # [80]

dctmx = scipy.fftpack.dct(np.eye(80), type=2, axis=1, norm='ortho') # [80, 80] 

with open('assets/mfcc_stats.pkl', 'wb') as f:
        pickle.dump([mfcc_mean, mfcc_std, dctmx], f, pickle.HIGHEST_PROTOCOL)

請問我該如何解決 repeats has to be Long tensor 的問題?(How to solve a problem)

Sorry, I’m not familiar with English grammar, please forgive me if I offend.
I want to try to execute this Github project, but failed.
The only changed part of the program is (Because i don't have GPU)

(prepare_train_data.py)

device = 'cuda:0' (Change to the following line)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Run >python main_1.py
Problem Description:RuntimeError: repeats has to be Long tensor
圖片3

Would i ask for help?
I will be grateful for any help you can provide.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.