Code Monkey home page Code Monkey logo

Comments (3)

jugoodma avatar jugoodma commented on June 12, 2024 6

@tripathiarpan20 -- I found your comment interesting, so I took a short dive into the literature.

There's a niche, and interesting, sub-sub-field of Music Information Retrieval (MIR) called Automatic Drum Transcription (ADT). Here's a literature review of ADT. The authors of that review describe different "drum transcription tasks" -- drum-only transcription (DTD) and drum-plus-accompaniment transcription (DTM) seem particularly relevant.

If you want to "solve" drum encoding, you could look at some of the methods in the more recently referenced papers in the mentioned literature review and give them a try! Ref 80 appeared to have high scoring metrics, but might not work for drum kits with more than a kick, snare, and hi-hat. The authors (of ref 80) also have a GitHub repo, and a demo site linked!

For another approach, you might find https://github.com/magenta/mt3 interesting/useful. Unfortunately, the related paper doesn't focus too heavily on drums, so you might find the mt3 model doesn't work that well for drum transcription.

Finally, perhaps we could make use of Facebook's demucs. This model is seemingly SOTA for demixing audio tracks, so we can use it to separate out the drums stem of a track. This turns a DTM task into a DTD task quite effectively (and thus, in my opinion, makes solving ADT easier). Unfortunately, this somewhat disregards the call-to-action in the NMP/basic-pitch paper -- to encourage low-resource models in future research. Maybe we can trim down the demucs model? Regardless, perhaps we could then train the NMP model on a drum-specific dataset, like E-GMD. We could then compose the architectures like so:

                demucs                   NMP(E-GMD)
original track -------> drum-only track -----------> drum-only MIDI

I'll give this a try, and post on the results. Luckily, since NMP is so light it probably trains much faster than huge models, And who knows, maybe demucs isn't even needed. Or, maybe this entire approach won't work! It's all part of the scientific method 😄

from basic-pitch.

rabitt avatar rabitt commented on June 12, 2024 2

are there any future plans to add support for percussion instruments?

@tripathiarpan20 no plans at the moment, but will let you know if that changes. @jugoodma 's comment is great, and points to some open source drum transcription options. Here are two more open source systems I'm aware of:
(1) "Increasing Drum Transcription Vocabulary Using Data Synthesis" by Cartwright et. al [paper] [code]
(2) "Towards Multi-Instrument Drum Transcription" by Vogl et. al [paper] [code]

from basic-pitch.

tripathiarpan20 avatar tripathiarpan20 commented on June 12, 2024

Hi @jugoodma and @rabitt ,
Thank you for the amazing feedbacks!

To be frank I am not familiar with how the instrument class is predicted in the NMP pipeline, but if retraining the Basic Pitch's architecture on Drum dataset for DTD along with devising the suitable posteriorgram post-processing works, I believe that it would make the domain of instruments in this project truly whole (afaik).

Good luck on the process and keep us updated :D.
The DTD task seems to be the relevant one in the context of Basic Pitch (which deals with polyphonic recordings of a single instrument class), demucs shouldn't be required given its high inference time and the availability of the E-GMD dataset & conversion to drum audio tracks with suitable soundfonts and label preserving data augmentation.

Elsewhere, I also tried demucs on Psychosocial(Slipknot) & tried to use basic-pitch on the demixed drum track, and that's how I eventually raised the issue/question. Although demucs has amazing performance, the inference times are relatively higher (typically takes minutes).

Meanwhile, perhaps Spotify could develop a lightweight demixing model which might benefit from end-to-end deep learning that uses CQT for preprocessing (rather than Mel spectrograms as in past demixing methods) in the future?
It might be bit of a stretch as my understanding of the working of spectrograms, past Demixing models & NMP has missing pieces.
I would especially like to hear @rabitt 's thoughts on the feasibility of such a lightweight demixing model and whether there would be any benefits if it is formulated as an end-to-end (demixing + transcription) task.

Any feedback from anyone else is welcome too!

from basic-pitch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.