Code Monkey home page Code Monkey logo

multipitch_architectures's Introduction

multipitch_architectures

This is a pytorch code repository accompanying the following paper:

Christof Weiß and Geoffroy Peeters
Comparing Deep Models and Evaluation Strategies for Multi-Pitch Estimation in Music Recordings
IEEE/ACM Transactions on Audio, Speech & Language Processing, 2022
https://ieeexplore.ieee.org/document/9865174

This repository only contains exemplary code and pre-trained models for most of the paper's experiments as well as some individual examples. All datasets used in the paper are publicly available (at least partially):

For details and references, please see the paper.

In addition, we provide information on version duplicates in MusicNet (MusicNet_stats.md) and detailed information on the different training-test splits used in our experiments (as JSON and Markdown files in folder dataset_splits).

Feature extraction and prediction (Jupyter notebooks)

In this top folder, two Jupyter notebooks (01_precompute_features and 02_predict_with_pretrained_model) demonstrate how to preprocess audio files for running our models and how to load a pretrained model for predicting pitches.

Experiments from the paper (Python scripts)

In the experiments folder, all experimental scripts as well as the log files (subfolder logs) and the filewise results (subfolder results_filewise) can be found. The folder models_pretrained contains pre-trained models for the main experiments. The subfolder predictions contains exemplary model predictions for two of the experiments. Plese note that re-training requires a GPU as well as the pre-processed training data (see the notebook 01_precompute_features for an example). Any script must be started from the repository top folder path in order to get the relative paths working correctly.

The experiment files' names relate to the paper's results in the following way:

Exp1_SectionIV-B

Experiments from Section IV.B (Table II / Fig. 4) - Model Architectures and Sizes. Suffix __ rerun denotes additional training/test runs of a model.

(a) CNN (simple)

  • CNN:XSexp126a_musicnet_cnn_basic
  • CNN:Sexp126b_musicnet_cnn_wide
  • CNN:Mexp126c_musicnet_cnn_verywide
  • CNN:Lexp126d_musicnet_cnn_extremelywide

(b) DCNN (deep)

  • DCNN:Sexp127a_musicnet_cnn_deepbasic
  • DCNN:Mexp127b_musicnet_cnn_deepwide
  • DCNN:Lexp127c_musicnet_cnn_deepverywide

(c) DRCNN (deep residual)

  • DRCNN:Sexp128a_musicnet_cnn_deepresnetbasic
  • DRCNN:Mexp128b_musicnet_cnn_deepresnetwide
  • DRCNN:Lexp128c_musicnet_cnn_deepresnetverywide
  •   —  exp128c_musicnet_cnn_deepresnetverywide_rerun1
  •   —  exp128c_musicnet_cnn_deepresnetverywide_rerun2

(d) Unet

  • Unet:Sexp160d2_musicnet_unet_large_bugfix
  • Unet:Mexp160g_musicnet_unet_medium_bugfix
  •   —  exp160g_musicnet_unet_medium_bugfix_rerun1
  •   —  exp160g_musicnet_unet_medium_bugfix_rerun2
  • Unet:Lexp160e3_musicnet_unet_verylarge_bugfix_scaled
  •   —  exp160e3_musicnet_unet_verylarge_bugfix_scaled_rerun1
  •   —  exp160e3_musicnet_unet_verylarge_bugfix_scaled_rerun2
  • Unet:XLexp160f_musicnet_unet_veryverylarge
  •   —  exp160f_musicnet_unet_veryverylarge_rerun1
  •   —  exp160f_musicnet_unet_veryverylarge_rerun2

(e) SAUnet (self-attention at bottleneck)

  • SAUnet:Mexp180b_musicnet_unet_verylarge_doubleselfattn
  • SAUnet:Lexp180d_musicnet_unet_extremelylarge_doubleselfattn
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun1
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun2
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun3
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun4
  • SAUnet:XLexp180e_musicnet_unet_insanelylarge_doubleselfattn
  •   —  exp180e_musicnet_unet_insanelylarge_doubleselfattn_rerun1
  •   —  exp180e_musicnet_unet_insanelylarge_doubleselfattn_rerun2
  • SAUnet:XXLexp180f_musicnet_unet_intermedlarge_doubleselfattn
  •   —  exp180f_musicnet_unet_intermedlarge_doubleselfattn_rerun

(f) SAUSnet (self-attention also at lowest skip connection)

  • SAUSnet:Mexp181b_musicnet_unet_verylarge_doubleselfattn_twolayers
  • SAUSnet:Lexp181d_musicnet_unet_verylarge_doubleselfattn_twolayers
  • SAUSnet:XLexp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers
  •   —  exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_rerun1
  •   —  exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_rerun2
  • SAUSnet:XXLexp181e_musicnet_unet_insanelylarge_doubleselfattn_twolayers

(g) BLUnet (BiLSTM at bottleneck)

  • BLUnet:Mexp186b_musicnet_unet_verylarge_blstm
  • BLUnet:Lexp186d_musicnet_unet_extremelylarge_blstm
  • BLUnet:XXLexp186e_musicnet_unet_insanelylarge_blstm

(h) PUnet (multi-task with degree-of-polyphony estimation)

  • PUnet:Mexp195g_musicnet_unet_extremelylarge_polyphony_softmax
  • PUnet:Lexp195e3_musicnet_unet_extremelylarge_polyphony_softmax
  • PUnet:XLexp195f_musicnet_unet_extremelylarge_polyphony_softmax
  •   —  exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun1
  •   —  exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun2

Exp2_SectionIV-C

Experiments from Section IV.C (Table IV) - Model Generalization (more training samples, other testsets). Suffix __ rerun denotes additional training/test runs of a model.

(a) Test set MuN-10a (more training samples)

  • Unet:XLexp160f_musicnet_unet_veryverylarge_moresamples
  •   —  exp160f_musicnet_unet_veryverylarge_moresamples_rerun1
  •   —  exp160f_musicnet_unet_veryverylarge_moresamples_rerun2
  • SAUnet:Lexp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
  • SAUSnet:XLexp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
  • PUnet:XLexp195f_musicnet_unet_extremelylarge_polyphony_softmax_moresamples

(b) Test set MuN-10 (original)

  • Unet:XLRETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples
  •   —  RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples_rerun1
  •   —  RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples_rerun2
  • SAUnet:LRETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
  •   —  RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
  •   —  RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
  • SAUSnet:XLRETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
  •   —  RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples_rerun1
  •   —  RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples_rerun2
  • PUnet:XLRETRAIN_exp195f_musicnet_unet_extremelylarge_polyphony_softmax

(c) Test set MuN-3 (90s)

  • see models from (a) Test set MuN-10a

(d) Test set MuN-10b (slow movements)

  • SAUnet:LRETRAIN2_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples

(e) Test set MuN-10c (fast movements)

  • SAUnet:LRETRAIN3_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples

(f) Test set MuN-10full (all movements of the ten work cycles)

  • CNN:MRETRAIN4_exp127c_musicnet_cnn_verywide_moresamples
  • DRCNN:LRETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples
  •   —  RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples_rerun1
  •   —  RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples_rerun2
  • Unet:MRETRAIN4_exp160f_musicnet_unet_veryverylarge_moresamples
  • Unet:XLRETRAIN4_exp160g_musicnet_unet_medium_moresamples
  • SAUnet:LRETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
  •   —  RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
  •   —  RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
  • SAUSnet:XLRETRAIN4_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
  • BLUnet:LRETRAIN4_exp186d_musicnet_unet_extremelylarge_blstm_moresamples
  • PUnet:XLRETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax
  •   —  RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun1
  •   —  RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun2

Exp3_SectionIV-D

Experiments from Section IV.D (Fig. 6) - Cross-Version Study on Schubert Winterreise.

CNN:M

  • Version split:exp200a_schubert_versionsplit_cnn_verywide
  • Song split:exp200b_schubert_songsplit_cnn_verywide
  • Neither split:exp200c_schubert_neithersplit_cnn_verywide

SAUnet:L

  • Version split:exp201a_schubert_versionsplit_unet_extremelylarge_doubleselfattn
  • Song split:exp201b_schubert_songsplit_unet_extremelylarge_doubleselfattn
  • Neither split:exp201c_schubert_neithersplit_unet_extremelylarge_doubleselfattn

Exp4_SectionIV-E

Experiments from Section IV.E (Fig. 7) - Cross-Dataset Study on Big Mix Dataset, compiled from all source datasets. Suffix __ rerun denotes additional training/test runs of a model.

  • CNN:Mexp216c_bigmix_cnn_verywide
  •   —  exp216c_bigmix_cnn_verywide_rerun1
  •   —  exp216c_bigmix_cnn_verywide_rerun2
  • DRCNN:Lexp214c_bigmix_cnn_deepresnetwide
  •   —  exp214c_bigmix_cnn_deepresnetwide_rerun1
  •   —  exp214c_bigmix_cnn_deepresnetwide_rerun2
  • Unet:Mexp213g_bigmix_unet_medium
  •   —  exp213g_bigmix_unet_medium_rerun1
  •   —  exp213g_bigmix_unet_medium_rerun2
  • Unet:XLexp212f_bigmix_unet_veryverylarge
  •   —  exp212f_bigmix_unet_veryverylarge_rerun1
  •   —  exp212f_bigmix_unet_veryverylarge_rerun2
  • SAUnet:Lexp210d_bigmix_unet_extremelylarge_doubleselfattn
  •   —  exp210d_bigmix_unet_extremelylarge_doubleselfattn_rerun1
  •   —  exp210d_bigmix_unet_extremelylarge_doubleselfattn_rerun2
  • SAUSnet:XLexp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers
  •   —  exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers_rerun1
  •   —  exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers_rerun2
  • BLUnet:Lexp217d_bigmix_unet_extremelylarge_blstm
  •   —  exp217d_bigmix_unet_extremelylarge_blstm_rerun1
  •   —  exp217d_bigmix_unet_extremelylarge_blstm_rerun2
  • PUnet:XLexp215f_bigmix_unet_extremelylarge_polyphony_softmax
  •   —  exp215f_bigmix_unet_extremelylarge_polyphony_softmax_rerun1
  •   —  exp215f_bigmix_unet_extremelylarge_polyphony_softmax_rerun2

Run scripts using e.g. the following commands:
conda activate multipitch_architectures
export CUDA_VISIBLE_DEVICES=1
python experiments/Exp1_SectionIV-B/exp126a_musicnet_cnn_basic.py

multipitch_architectures's People

Contributors

christofw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

multipitch_architectures's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.