bigpon / qppwg Goto Github PK

View Code? Open in Web Editor NEW

46.0 5.0 6.0 147 KB

Quasi-Periodic Parallel WaveGAN Pytorch implementation

Home Page: https://bigpon.github.io/QuasiPeriodicParallelWaveGAN_demo/

License: MIT License

Shell 4.67% Python 94.85% Makefile 0.48%

speech-synthesis neural-vocoder wavenet parallel-wavenet-vocoder pytorch real-time

qppwg's People

Contributors

Stargazers

Watchers

Forkers

entn-at yyht agangzz wyp19930313 monster912 lhfazry

qppwg's Issues

Why the synthesized-speech is not better than WORLD?

Hello, Yi-Chiao WU!
I appreciate that if you can read the issue and give me some feedback.

I respectively used the WORLD and the model of QPPWGaf_20(checkpoint400000) from you to be a vocoder to synthesis speech based on my own speech file(from the corpus named LJSpeech-1.1).
The process is Speech→Extract feature→synthesis→Speech followed by readme.

But the output of QPPWGaf_20 is neither better nor worse than the output of WORLD.

Just because I didn't take the VCC-corpus to be the input?
Or there are other reasons?

Pysptk setup question,discussion

I tried to setup qppwg with python venv.
During which it pysptk seems to need window c build tools
What is you envvar used for the window c build tools?
I had a local copy of those c build tools,but I did not setup the env var.
So need your env var as a reference.
Side update,the java port does not seems to go too well.But,will try my best.

How to use this model for Speech Synthesis

As I understood from the paper and the provided code, f0 features are extracted from the ground-truth audiofile. However, for the task of speech synthesis it is impossible to get such features as acoustic model only outputs mel-spectrogram. So this vocoder is only applicable for cases when there is an input audio file, for ex. the Voice Conversion task. Am I right or I understand it totally wrong?

Support PyTorch >=1.7

When using PyTorch >=1.7, the following error would occur at the beginning of training:

RuntimeError: stft input and window must be on the same device but got self on cuda:0 and window on cpu

A solution to this is kan-bayashi/ParallelWaveGAN#225. However, this can be easily solved by using PyTorch <1.7, so this is rather a note for future users.

Questions about implementations

First of all thank you for creating those videos and articles explaining the details.
It is very useful for reference.

However after reading and looking for a while I still cannot confirm some detail.
1)Is the Acoustic feature in the generator,1d or 2d.
What is it,mel spectogram extracted from natural speech?Or text to mel spectogram(from other framework?)
I noticed there is some notes you used like:Conditioned on 1×F0.
From what i have seen it is like a processed mel spectogram.But i cannot confirm.

2)How to calculate pitch dependent dilated factor.
From the video and paper I see the explainations and the derivations of it.
It is from DCNN.There is a equation for it.just change the definition of d to be a runtime calculate variable
d'=1*ET.How to calculate ET,or pitch dependent dilated factor?
I think you mention it has some properties relates to the wave frequency and periodic,but i cannot visualize it.

Some inherited reference used details.
I know you specifically mentioned most of the changes you made if not all.
However,I am kinda dumb and are uncertain about some details.So just to ask it ahead of time to not fail.

1)The residual block.From the paper diagram is it also like quasi periodic?like adaptive/fixed?
Is it just a no edit copy from P.W.G.?
How does the residual block affect the generator?

2)Is the generated speech and discriminator exactly the same as P.W.G.?Just to confirm.
If it is the same then I will find PWG implementations and study on it.

After all these question you may be curious.Why need to know.Why not just clone repo setup it.
Sadly,I am kinda bored and decided to make it consistent and not dependent on python libraries.
So,I am porting it to java or javacpp and yes sadly i need to implement practically everything other than maybe FFTW or matrix calcs.
Any of your suggestions and time taken for reply is greatly appreciated and I hope you can have a nice day.

bigpon / qppwg Goto Github PK

qppwg's People

Contributors

Stargazers

Watchers

Forkers

qppwg's Issues

Why the synthesized-speech is not better than WORLD?

Pysptk setup question,discussion

How to use this model for Speech Synthesis

Support PyTorch >=1.7

Questions about implementations

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent