Code Monkey home page Code Monkey logo

Comments (13)

voodoohop avatar voodoohop commented on August 16, 2024 2

I just noticed that compute_loudness in spectral_ops.py outputs significantly lower loudness values when the sample rate is 48khz. I did not have time to figure out what is causing this, but increasing the FFT size didn't seem to help much.

from ddsp.

jesseengel avatar jesseengel commented on August 16, 2024 1

Just FYI, all the sample_rate agnostic preprocessing code should now be in, (you can check if it works for you), but we don't have a working 44kHz model up as a demo yet.

from ddsp.

jesseengel avatar jesseengel commented on August 16, 2024 1

Good catches! Yah we definitely haven't explored training many model configs at different rates yet. They seem like pretty straight-forward fixes, we'll try to get to them when we can.

from ddsp.

jesseengel avatar jesseengel commented on August 16, 2024

Hi Andras,

Good recommendation! This is actually something that Hanoi is currently looking into, so we're on the case :). The tying to 16kHz is actually more for CREPE f0 detection than anything else, but it's not a hard constraint. We'll just need to change the data processing pipeline and tweak some model parameters (# of harmonics, sizes of ffts).

from ddsp.

Forevian avatar Forevian commented on August 16, 2024

Hi, I just wonder if there is any progress with this? I've seen some related pull request failing tests...

from ddsp.

jesseengel avatar jesseengel commented on August 16, 2024

Thanks for the follow up. We just merged #44 which creates a data pipeline for creating datasets at arbitrary sample rates (with f0 CREPE detection still at 16kHz). We're working now on hammering out some details of training configs for higher sample rates (48kHz), and will add some details and configs to the colab notebooks when we get that figured out.

from ddsp.

Forevian avatar Forevian commented on August 16, 2024

Just wanted to add, that I have extensively tested 48 kHz training on the test branch that is waiting to be merged, and it works well (with some hyper-parameter tuning).

from ddsp.

jesseengel avatar jesseengel commented on August 16, 2024

That's great! Yah, sorry for all the delays. There's been with some COVID related bureaucracy slowing down our efforts in that direction so cool to hear that it's working for you.

Do you have an example gin config / example you could share of it working? It could be helpful for us and others I think.

In terms of the branch PR, @lamtharnhantrakul is back on the case just now actually. The old branch (#57) had gotten pretty stale so he's splitting it up into two PRs, the first of which is now (#102). So hopefully we should have the code in master soon.

from ddsp.

Forevian avatar Forevian commented on August 16, 2024

Sorry for the slow answer, I don't have an example to show at this time, I am working on a different problem field compared to what your demo is doing, mostly percussive sound resynthesis with plenty of inharmonicity. I am trying an approach to generate lot of harmonically non-related tunable sine components + noise and reproduce single shot acoustic samples. I will let you know if I have anything cool to show.

from ddsp.

jesseengel avatar jesseengel commented on August 16, 2024

Okay, great, no worries. For what it's worth, I've also been developing sinusoidal + noise models (still focusing on harmonic-ish type instruments) but for self-supervised transcription.

I think we're going to do a code refactor to expose a lot of that internal code in the next week or two, so feel free to take a look :).

from ddsp.

samuel-clarke avatar samuel-clarke commented on August 16, 2024

It seems like the assumption of working with a 16kHz signal is still inextricably baked into this code in some places. A couple examples I've noticed:

  • MfccTimeDistributedRnnEncoder.z_time_steps is constrained to be chosen from a set of values that reflect the assumption that the input signal will be a 4 second clip with 16kHz sample rate.
  • spectral_ops.compute_mel(), a backbone to the other foundational functions in spectral_ops.py, is hardcoded to compute tf.signal.linear_to_mel_weight_matrix() with a 16kHz sample rate, putting the Nyquist frequency well below the upper bound of human hearing.

Please correct me if I'm wrong on these examples, since I'm still very much in the learning process. I'll put more examples here if I find them. And thank you for how helpful you've been @jesseengel

from ddsp.

PratikStar avatar PratikStar commented on August 16, 2024

@voodoohop I found the same problem, the loudness is too low at 44.1kHz audio! I am not sure of the status of the code for higher sample_rates, but I am trying to train on a custom guitar dataset at 44.1kHz and the results are quite poor!

Did you get a solution to this?

from ddsp.

PratikStar avatar PratikStar commented on August 16, 2024

@jesseengel In my case, I am training the model on a custom guitar monophonic dataset (44.1kHz) to learn the timbre embeddings. I have set the frame_rate=210 & 252 and got poor results. So I am going to try training with higher frame rates!

But I am not sure if the root cause of the problem is in the low frame rate I used or in other model hyperparameters like fft size, #harmonics, etc.

Below are my commands.

Data prep:
ddsp_prepare_tfrecord \ --input_audio_filepatterns='~/buckets/pratik-ddsp-data/monophonic/*wav' \ --output_tfrecord_path=~/tfrecord_441sr_700fr/train.tfrecord \ --chunk_secs=0.0 \ --num_shards=10 \ --frame_rate=700 \ --sample_rate=44100 \ --alsologtostderr

Below is for training process

ddsp_run \ --mode=train \ --gin_file=~/ddsp/ddsp/training/gin/models/ae_mfccRnnEncoder_last.gin \ --gin_file=~/ddsp/ddsp/training/gin/datasets/tfrecord.gin \ --gin_file=~/ddsp/ddsp/training/gin/eval/basic_f0_ld.gin \ --gin_param="TFRecordProvider.file_pattern='~/tfrecord_441sr_252fr/train.tfrecord*'" \ --gin_param="batch_size=16" \ --alsologtostderr \ --gin_param="TFRecordProvider.sample_rate=44100" \ --gin_param="Harmonic.sample_rate=44100" \ --gin_param="FilteredNoise.n_samples=176400" \ --gin_param="Harmonic.n_samples=176400" \ --gin_param="Reverb.reverb_length=176400" \ --gin_param='F0LoudnessPreprocessor.time_steps=2800' \ --gin_param='F0LoudnessPreprocessor.frame_rate=700' \ --gin_param='F0LoudnessPreprocessor.sample_rate=44100' \ --gin_param="TFRecordProvider.frame_rate=700"

Am I missing something?

from ddsp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.