Code Monkey home page Code Monkey logo

qppwg's People

Contributors

bigpon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

qppwg's Issues

Why the synthesized-speech is not better than WORLD?

Hello, Yi-Chiao WU!
I appreciate that if you can read the issue and give me some feedback.

I respectively used the WORLD and the model of QPPWGaf_20(checkpoint400000) from you to be a vocoder to synthesis speech based on my own speech file(from the corpus named LJSpeech-1.1).
The process is Speech→Extract feature→synthesis→Speech followed by readme.

But the output of QPPWGaf_20 is neither better nor worse than the output of WORLD.

Just because I didn't take the VCC-corpus to be the input?
Or there are other reasons?

Pysptk setup question,discussion

I tried to setup qppwg with python venv.
During which it pysptk seems to need window c build tools
What is you envvar used for the window c build tools?
I had a local copy of those c build tools,but I did not setup the env var.
So need your env var as a reference.
Side update,the java port does not seems to go too well.But,will try my best.

How to use this model for Speech Synthesis

As I understood from the paper and the provided code, f0 features are extracted from the ground-truth audiofile. However, for the task of speech synthesis it is impossible to get such features as acoustic model only outputs mel-spectrogram. So this vocoder is only applicable for cases when there is an input audio file, for ex. the Voice Conversion task. Am I right or I understand it totally wrong?

Support PyTorch >=1.7

When using PyTorch >=1.7, the following error would occur at the beginning of training:

RuntimeError: stft input and window must be on the same device but got self on cuda:0 and window on cpu

A solution to this is kan-bayashi/ParallelWaveGAN#225. However, this can be easily solved by using PyTorch <1.7, so this is rather a note for future users.

Questions about implementations

First of all thank you for creating those videos and articles explaining the details.
It is very useful for reference.

However after reading and looking for a while I still cannot confirm some detail.
1)Is the Acoustic feature in the generator,1d or 2d.
What is it,mel spectogram extracted from natural speech?Or text to mel spectogram(from other framework?)
I noticed there is some notes you used like:Conditioned on 1×F0.
From what i have seen it is like a processed mel spectogram.But i cannot confirm.

2)How to calculate pitch dependent dilated factor.
From the video and paper I see the explainations and the derivations of it.
It is from DCNN.There is a equation for it.just change the definition of d to be a runtime calculate variable
d'=1*ET.How to calculate ET,or pitch dependent dilated factor?
I think you mention it has some properties relates to the wave frequency and periodic,but i cannot visualize it.

Some inherited reference used details.
I know you specifically mentioned most of the changes you made if not all.
However,I am kinda dumb and are uncertain about some details.So just to ask it ahead of time to not fail.

1)The residual block.From the paper diagram is it also like quasi periodic?like adaptive/fixed?
Is it just a no edit copy from P.W.G.?
How does the residual block affect the generator?

2)Is the generated speech and discriminator exactly the same as P.W.G.?Just to confirm.
If it is the same then I will find PWG implementations and study on it.

After all these question you may be curious.Why need to know.Why not just clone repo setup it.
Sadly,I am kinda bored and decided to make it consistent and not dependent on python libraries.
So,I am porting it to java or javacpp and yes sadly i need to implement practically everything other than maybe FFTW or matrix calcs.
Any of your suggestions and time taken for reply is greatly appreciated and I hope you can have a nice day.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.