Hi! Thanks for this amazing open-source work, I'm really enjoying using it. :)

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Support for drums/percussion? about basic-pitch HOT 3 OPEN

spotify commented on June 12, 2024

Support for drums/percussion?

from basic-pitch.

Comments (3)

jugoodma commented on June 12, 2024 6

@tripathiarpan20 -- I found your comment interesting, so I took a short dive into the literature.

There's a niche, and interesting, sub-sub-field of Music Information Retrieval (MIR) called Automatic Drum Transcription (ADT). Here's a literature review of ADT. The authors of that review describe different "drum transcription tasks" -- drum-only transcription (DTD) and drum-plus-accompaniment transcription (DTM) seem particularly relevant.

If you want to "solve" drum encoding, you could look at some of the methods in the more recently referenced papers in the mentioned literature review and give them a try! Ref 80 appeared to have high scoring metrics, but might not work for drum kits with more than a kick, snare, and hi-hat. The authors (of ref 80) also have a GitHub repo, and a demo site linked!

For another approach, you might find https://github.com/magenta/mt3 interesting/useful. Unfortunately, the related paper doesn't focus too heavily on drums, so you might find the mt3 model doesn't work that well for drum transcription.

Finally, perhaps we could make use of Facebook's demucs. This model is seemingly SOTA for demixing audio tracks, so we can use it to separate out the drums stem of a track. This turns a DTM task into a DTD task quite effectively (and thus, in my opinion, makes solving ADT easier). Unfortunately, this somewhat disregards the call-to-action in the NMP/basic-pitch paper -- to encourage low-resource models in future research. Maybe we can trim down the demucs model? Regardless, perhaps we could then train the NMP model on a drum-specific dataset, like E-GMD. We could then compose the architectures like so:

                demucs                   NMP(E-GMD)
original track -------> drum-only track -----------> drum-only MIDI

I'll give this a try, and post on the results. Luckily, since NMP is so light it probably trains much faster than huge models, And who knows, maybe demucs isn't even needed. Or, maybe this entire approach won't work! It's all part of the scientific method 😄

from basic-pitch.

rabitt commented on June 12, 2024 2

are there any future plans to add support for percussion instruments?

@tripathiarpan20 no plans at the moment, but will let you know if that changes. @jugoodma 's comment is great, and points to some open source drum transcription options. Here are two more open source systems I'm aware of:
(1) "Increasing Drum Transcription Vocabulary Using Data Synthesis" by Cartwright et. al [paper] [code]
(2) "Towards Multi-Instrument Drum Transcription" by Vogl et. al [paper] [code]

from basic-pitch.

tripathiarpan20 commented on June 12, 2024

Hi @jugoodma and @rabitt ,
Thank you for the amazing feedbacks!

To be frank I am not familiar with how the instrument class is predicted in the NMP pipeline, but if retraining the Basic Pitch's architecture on Drum dataset for DTD along with devising the suitable posteriorgram post-processing works, I believe that it would make the domain of instruments in this project truly whole (afaik).

Good luck on the process and keep us updated :D.
The DTD task seems to be the relevant one in the context of Basic Pitch (which deals with polyphonic recordings of a single instrument class), demucs shouldn't be required given its high inference time and the availability of the E-GMD dataset & conversion to drum audio tracks with suitable soundfonts and label preserving data augmentation.

Elsewhere, I also tried demucs on Psychosocial(Slipknot) & tried to use basic-pitch on the demixed drum track, and that's how I eventually raised the issue/question. Although demucs has amazing performance, the inference times are relatively higher (typically takes minutes).

Meanwhile, perhaps Spotify could develop a lightweight demixing model which might benefit from end-to-end deep learning that uses CQT for preprocessing (rather than Mel spectrograms as in past demixing methods) in the future?
It might be bit of a stretch as my understanding of the working of spectrograms, past Demixing models & NMP has missing pieces.
I would especially like to hear @rabitt 's thoughts on the feasibility of such a lightweight demixing model and whether there would be any benefits if it is formulated as an end-to-end (demixing + transcription) task.

Any feedback from anyone else is welcome too!

from basic-pitch.

Support for drums/percussion? about basic-pitch HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent