I really enjoy tinkering with ddsp. It would be a bit more approachable if we could ex

I just noticed that compute_loudness in <code class="

Thanks for the follow up. We just merged <a class="issue-link js-issue-link" data-erro

Idea: sample_rate agnostic demo/tutorial about ddsp HOT 13 OPEN

magenta commented on August 16, 2024

Idea: sample_rate agnostic demo/tutorial

from ddsp.

Comments (13)

voodoohop commented on August 16, 2024 2

I just noticed that compute_loudness in spectral_ops.py outputs significantly lower loudness values when the sample rate is 48khz. I did not have time to figure out what is causing this, but increasing the FFT size didn't seem to help much.

from ddsp.

jesseengel commented on August 16, 2024 1

Just FYI, all the sample_rate agnostic preprocessing code should now be in, (you can check if it works for you), but we don't have a working 44kHz model up as a demo yet.

from ddsp.

jesseengel commented on August 16, 2024 1

Good catches! Yah we definitely haven't explored training many model configs at different rates yet. They seem like pretty straight-forward fixes, we'll try to get to them when we can.

from ddsp.

jesseengel commented on August 16, 2024

Hi Andras,

Good recommendation! This is actually something that Hanoi is currently looking into, so we're on the case :). The tying to 16kHz is actually more for CREPE f0 detection than anything else, but it's not a hard constraint. We'll just need to change the data processing pipeline and tweak some model parameters (# of harmonics, sizes of ffts).

from ddsp.

Forevian commented on August 16, 2024

Hi, I just wonder if there is any progress with this? I've seen some related pull request failing tests...

from ddsp.

jesseengel commented on August 16, 2024

Thanks for the follow up. We just merged #44 which creates a data pipeline for creating datasets at arbitrary sample rates (with f0 CREPE detection still at 16kHz). We're working now on hammering out some details of training configs for higher sample rates (48kHz), and will add some details and configs to the colab notebooks when we get that figured out.

from ddsp.

Forevian commented on August 16, 2024

Just wanted to add, that I have extensively tested 48 kHz training on the test branch that is waiting to be merged, and it works well (with some hyper-parameter tuning).

from ddsp.

jesseengel commented on August 16, 2024

That's great! Yah, sorry for all the delays. There's been with some COVID related bureaucracy slowing down our efforts in that direction so cool to hear that it's working for you.

Do you have an example gin config / example you could share of it working? It could be helpful for us and others I think.

In terms of the branch PR, @lamtharnhantrakul is back on the case just now actually. The old branch (#57) had gotten pretty stale so he's splitting it up into two PRs, the first of which is now (#102). So hopefully we should have the code in master soon.

from ddsp.

Forevian commented on August 16, 2024

Sorry for the slow answer, I don't have an example to show at this time, I am working on a different problem field compared to what your demo is doing, mostly percussive sound resynthesis with plenty of inharmonicity. I am trying an approach to generate lot of harmonically non-related tunable sine components + noise and reproduce single shot acoustic samples. I will let you know if I have anything cool to show.

from ddsp.

jesseengel commented on August 16, 2024

Okay, great, no worries. For what it's worth, I've also been developing sinusoidal + noise models (still focusing on harmonic-ish type instruments) but for self-supervised transcription.

I think we're going to do a code refactor to expose a lot of that internal code in the next week or two, so feel free to take a look :).

from ddsp.

samuel-clarke commented on August 16, 2024

It seems like the assumption of working with a 16kHz signal is still inextricably baked into this code in some places. A couple examples I've noticed:

MfccTimeDistributedRnnEncoder.z_time_steps is constrained to be chosen from a set of values that reflect the assumption that the input signal will be a 4 second clip with 16kHz sample rate.
spectral_ops.compute_mel(), a backbone to the other foundational functions in spectral_ops.py, is hardcoded to compute tf.signal.linear_to_mel_weight_matrix() with a 16kHz sample rate, putting the Nyquist frequency well below the upper bound of human hearing.

Please correct me if I'm wrong on these examples, since I'm still very much in the learning process. I'll put more examples here if I find them. And thank you for how helpful you've been @jesseengel

from ddsp.

PratikStar commented on August 16, 2024

@voodoohop I found the same problem, the loudness is too low at 44.1kHz audio! I am not sure of the status of the code for higher sample_rates, but I am trying to train on a custom guitar dataset at 44.1kHz and the results are quite poor!

Did you get a solution to this?

from ddsp.

PratikStar commented on August 16, 2024

@jesseengel In my case, I am training the model on a custom guitar monophonic dataset (44.1kHz) to learn the timbre embeddings. I have set the frame_rate=210 & 252 and got poor results. So I am going to try training with higher frame rates!

But I am not sure if the root cause of the problem is in the low frame rate I used or in other model hyperparameters like fft size, #harmonics, etc.

Below are my commands.

Data prep:
ddsp_prepare_tfrecord \ --input_audio_filepatterns='~/buckets/pratik-ddsp-data/monophonic/*wav' \ --output_tfrecord_path=~/tfrecord_441sr_700fr/train.tfrecord \ --chunk_secs=0.0 \ --num_shards=10 \ --frame_rate=700 \ --sample_rate=44100 \ --alsologtostderr

Below is for training process

ddsp_run \ --mode=train \ --gin_file=~/ddsp/ddsp/training/gin/models/ae_mfccRnnEncoder_last.gin \ --gin_file=~/ddsp/ddsp/training/gin/datasets/tfrecord.gin \ --gin_file=~/ddsp/ddsp/training/gin/eval/basic_f0_ld.gin \ --gin_param="TFRecordProvider.file_pattern='~/tfrecord_441sr_252fr/train.tfrecord*'" \ --gin_param="batch_size=16" \ --alsologtostderr \ --gin_param="TFRecordProvider.sample_rate=44100" \ --gin_param="Harmonic.sample_rate=44100" \ --gin_param="FilteredNoise.n_samples=176400" \ --gin_param="Harmonic.n_samples=176400" \ --gin_param="Reverb.reverb_length=176400" \ --gin_param='F0LoudnessPreprocessor.time_steps=2800' \ --gin_param='F0LoudnessPreprocessor.frame_rate=700' \ --gin_param='F0LoudnessPreprocessor.sample_rate=44100' \ --gin_param="TFRecordProvider.frame_rate=700"

Am I missing something?

from ddsp.

Idea: sample_rate agnostic demo/tutorial about ddsp HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent