Comments (3)
@tripathiarpan20 -- I found your comment interesting, so I took a short dive into the literature.
There's a niche, and interesting, sub-sub-field of Music Information Retrieval (MIR) called Automatic Drum Transcription (ADT). Here's a literature review of ADT. The authors of that review describe different "drum transcription tasks" -- drum-only transcription (DTD) and drum-plus-accompaniment transcription (DTM) seem particularly relevant.
If you want to "solve" drum encoding, you could look at some of the methods in the more recently referenced papers in the mentioned literature review and give them a try! Ref 80 appeared to have high scoring metrics, but might not work for drum kits with more than a kick, snare, and hi-hat. The authors (of ref 80) also have a GitHub repo, and a demo site linked!
For another approach, you might find https://github.com/magenta/mt3 interesting/useful. Unfortunately, the related paper doesn't focus too heavily on drums, so you might find the mt3
model doesn't work that well for drum transcription.
Finally, perhaps we could make use of Facebook's demucs
. This model is seemingly SOTA for demixing audio tracks, so we can use it to separate out the drums stem of a track. This turns a DTM task into a DTD task quite effectively (and thus, in my opinion, makes solving ADT easier). Unfortunately, this somewhat disregards the call-to-action in the NMP
/basic-pitch
paper -- to encourage low-resource models in future research. Maybe we can trim down the demucs
model? Regardless, perhaps we could then train the NMP
model on a drum-specific dataset, like E-GMD. We could then compose the architectures like so:
demucs NMP(E-GMD)
original track -------> drum-only track -----------> drum-only MIDI
I'll give this a try, and post on the results. Luckily, since NMP is so light it probably trains much faster than huge models, And who knows, maybe demucs
isn't even needed. Or, maybe this entire approach won't work! It's all part of the scientific method 😄
from basic-pitch.
are there any future plans to add support for percussion instruments?
@tripathiarpan20 no plans at the moment, but will let you know if that changes. @jugoodma 's comment is great, and points to some open source drum transcription options. Here are two more open source systems I'm aware of:
(1) "Increasing Drum Transcription Vocabulary Using Data Synthesis" by Cartwright et. al [paper] [code]
(2) "Towards Multi-Instrument Drum Transcription" by Vogl et. al [paper] [code]
from basic-pitch.
Hi @jugoodma and @rabitt ,
Thank you for the amazing feedbacks!
To be frank I am not familiar with how the instrument class is predicted in the NMP pipeline, but if retraining the Basic Pitch's architecture on Drum dataset for DTD along with devising the suitable posteriorgram post-processing works, I believe that it would make the domain of instruments in this project truly whole (afaik).
Good luck on the process and keep us updated :D.
The DTD task seems to be the relevant one in the context of Basic Pitch (which deals with polyphonic recordings of a single instrument class), demucs
shouldn't be required given its high inference time and the availability of the E-GMD dataset & conversion to drum audio tracks with suitable soundfonts and label preserving data augmentation.
Elsewhere, I also tried demucs
on Psychosocial(Slipknot) & tried to use basic-pitch on the demixed drum track, and that's how I eventually raised the issue/question. Although demucs
has amazing performance, the inference times are relatively higher (typically takes minutes).
Meanwhile, perhaps Spotify could develop a lightweight demixing model which might benefit from end-to-end deep learning that uses CQT for preprocessing (rather than Mel spectrograms as in past demixing methods) in the future?
It might be bit of a stretch as my understanding of the working of spectrograms, past Demixing models & NMP has missing pieces.
I would especially like to hear @rabitt 's thoughts on the feasibility of such a lightweight demixing model and whether there would be any benefits if it is formulated as an end-to-end (demixing + transcription) task.
Any feedback from anyone else is welcome too!
from basic-pitch.
Related Issues (20)
- Программа завершает работу
- Can't detect vocals? HOT 3
- Crash when evaluating a particular recording HOT 14
- Training code HOT 1
- Documentation Append Request HOT 1
- Basic Pitch in an audio plugin HOT 5
- Feature Request: Some way to adjust granularity HOT 1
- error: metadata-generation-failed HOT 2
- Range Sliders on Website aren't accurate
- [Feature Request] Add GM Midi program number for output channel HOT 1
- Obtaining multi-pitch estimates HOT 1
- cannot import name 'dtensor' from 'tensorflow.compat.v2.experimental' HOT 1
- Realtime detection HOT 3
- Pitch bend returning to 1365 instead of 0 after initial pitch bend
- training with custom data
- Support for stream in/out? HOT 1
- Inconsistent minimum note length HOT 3
- cant produce wav and csv on cli
- nice to have multi instruments supported
- Question regarding evaluation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from basic-pitch.