Comments (7)
Another thing which is weird is that when I quantify pearson correlation of spectrograms of original and reconstructed waveform, I found the correlation coefficients are in a very small range. Why is the model so stable at reconstructing the waveform and its corresponding spectrogram?
from ddsp.
Another question I am really curious about is if we'd like to do human voice reconstruction from multiple sources(different people), should we consider timbre and include z in the model?
Also since the model is really doing a good job on waveform reconstruction. Have you considered to use it on TTS task? Can we use an encoder to generate some features like f0 and loudness from text of some other signal to generate waveform?
from ddsp.
Hi, glad it's working for you. I'd be happy to hear an example reconstruction if you want to share. My guess is that the model is probably overfitting quite a lot to a small dataset. In that case, segment of loudness and f0 corresponds to a specific phoneme because the dataset doesn't have enough variation. For a large dataset, there will be one to many mappings that the model can't handle without more conditioning (latent or labels). We don't use the latent "z" variables in the models in the timbre_transfer and train_autoencoder colabs, but the encoders and decoders are in the code base and used in models/nsynth_ae.gin as an example.
My intuition is that the model should work well for TTS (the sinusoidal model it's based off is used in audio codecs, so we know it should be able to fit it), but you just need to add grapheme or phoneme conditioning.
from ddsp.
Thanks a lot for your reply!
- I put the reconstruction result analysis here: https://drive.google.com/file/d/1DgjxlMLd-hYtYq4_O99oclqgfliL3Cqx/view
- For overfitting issues, I use SHTOOKA dataset which contains audio length around 1hour and 30 min, I think that is not too small for the model to overfit? I am still amazed that the model can handle the data so well, since I have tried parrotron model for spectrogram reconstruction on SHTOOKA dataset and the model could not converge…
- I am not sure if I understood more conditioning (latent or labels). here:
For a large dataset, there will be one to many mappings that the model can’t handle without more conditioning (latent or labels).
Do you mean we can add conditionings besides z, f0 and loudness? You also mentioned I could add grapheme or phoneme conditioning for TTS task, do you mean using an encoder to extract phoneme, grapheme or other conditioning and concat with z, f0 and loudness (do we have f0 and loudness in TTS task?) and then feed them to decoder?
- I am also curios if I can further improve the result by add z conditioning and use Resnet instead CREPE mode? Or it will be harder to train? Have you try some more complicated model like VAE or GAN using DDSP?
from ddsp.
There are a lot of options to try, we only have results based on our published work. If you want control over the output, you need to condition with variables that you know how to control. For instance, most TTS systems only use phonemes or text as conditioning, and then let the network figure out what to do with them. You can try to figure out how to interpret Z, but it is not trained to be interpretable as is.
from ddsp.
Thanks for your reply! For conditioning, do you mean the features after encoder part? If we want more conditioning, do you mean we could try to use some network to encode phoneme or graphene as conditioning? Should I try to make the conditioning similar to similar words? Is it a rule to follow(to find proper conditioning)?
from ddsp.
The tacotron papers (https://google.github.io/tacotron/) have extensively investigated different types of TTS conditioning. I suggest you check out some of their work.
from ddsp.
Related Issues (20)
- OnlineF0PowerPreprocessor cannot function with compute_power = False.
- No module crepe
- AttributeError: module 'hmmlearn.hmm' has no attribute 'CategoricalHMM'
- AttributeError: module 'hmmlearn.hmm' has no attribute 'CategoricalHMM'
- AttributeError: module 'hmmlearn.hmm' has no attribute 'CategoricalHMM'
- AttributeError: module 'collections' has no attribute 'Iterable'
- python environment Mac M1 HOT 1
- train_autoencoder.ipynb error I got HOT 1
- ImportError: cannot import name 'dtensor_api' from 'keras.dtensor' HOT 5
- vst notebook
- error when training !
- pip is repeatedly installing various versions of same packages HOT 9
- Question About Midi Autoencoder
- Failed building wheel for llvmlite, Could not build wheels for numba, llvmlite, which is required to install pyproject.toml-based projects HOT 1
- timbre_transfer.ipynb is broken on Colab
- train_autoencoder.ipynb is broken on Colab
- Installation Guide HOT 1
- pitch_detection.ipynb is broken in Colab
- VST3 file format no detected not working with fl studio
- Training Script is not starting
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ddsp.