This repository accompanies the publication: Francesco Foscarin*, Katharina Hoedt*, Verena Praher*, Arthur Flexer, and Gerhard Widmer, "Concept-Based Techniques for ``Musicologist-friendly'' Explanations in a Deep Music Classifier", published in Proceedings of the 23rd ISMIR 2022 (pdf).
(*) equal contribution.
This README describes how the results can be reproduced in the following steps:
- Setup
- Composer Classifier [1]
- Supervised Explanations
- Unsupervised Explanations
- Results of the Paper (check this out if you're simply here for reproduction purposes)
The packages we use in this work are defined (with version if necessary) in requirements.txt
.
If you have conda
installed, the easiest way to set up an environment for this project
is to run the following:
conda create -n py36_cc python=3.6.9
conda activate py36_cc
pip install -r requirements.txt
Before running any code, make sure to adapt all paths necessary in
config.py
. Most importantly, make sure to set the path to
the MAESTRO dataset and ASAP dataset correctly. More information on these datasets in the following section.
The data path will contain the preprocessed MIDI files; the results path will contains all results from our scripts; the concepts path points to the concepts; and the splits-root defines where the data-split-files are/will be stored. Allthese paths need not but can be changed.
The data we use in this work is the MAESTRO v2.0.0 (MIDI only) [3] which you can download here. For the unsupervised part, we use the subset of Maestro contained in the ASAP dataset, because it allows more control on selecting unique versions of pieces.
The composer classifier we use in our work is a reproduction of Kim et al. [1],
meaning that our code in classifier
is strongly based on the
original implementation.
Note that if you want to reproduce our work without training a new classifier, you can simply run the generator (see below) and refer to this part, and skip the rest of this section!
The proposed approach uses pieces with a single composer and only uses composers with at least
16 pieces. We manually removed the pieces from the original maestro-v2.0.0.csv
file and stored
it in meta/maestro-v2.0.0-reduced.csv.
If the classifier should be re-trained (or new train/validation splits generated), the first step is to run the generator, i.e.,
python -m classifier.generator
This transforms MIDI files to .npy
files (in our repository with
partitura). Next, the data split can
be generated by calling the splitter, e.g.,
python -m classifier.splitter
This automatically creates a random split with 70% of the data used for training, and the remaining 30% for validation. Finally, we need to run the converter:
python -m classifier.converter
Then we're all set for training the model!
We provide the already trained model, which we used to perform our experiments in the paper, here. If you want to train your own model, you can use the description below or of the original repository.
To train a ResNet-50 model with the default parameters for e.g., 10 epochs, run
python -m classifier.train_model --mode basetrain --model_name resnet50 --epochs 10 --onset False
and adapt the number of epochs as desired. To change the default arguments,
modify the according argument in arg_parser.
The used configuration will be stored in a txt
-file in the directory
of the model.
To compute the performance of our model (or compute confusion matrices with --cm
),
we provide a respective script that first needs the path to a trained model,
information on whether the onset is omitted or not, and a path to the data-split
files, e.g.,
python -m classifier.compute_performance <modelfile> --omit_onset --split-root classifier/meta
We provide the concept dataset presented in our work, consisting of
3 concepts: Alberti bass, difficult-to-play pieces and contrapuntal texture.
If you want to define your own concept datasets, first think of a concept and
generate/obtain MIDI files. If you have .mid
files available, they first
need to be converted to .npy
files, e.g., by running
python -m data_handling.convert_concept_midis_to_npy --concept_name <concept_name>
This script runs the same pre-processing as for the MIDI data used for training the model. By running this shell script all of our provided concepts and random datasets are pre-processed correctly.
Note that we fix the mapping of concept names to a fixed ID in concept_mapping.json to ensure that captum does not assign different IDs to a concept in different runs (as CAVs are not automatically) recomputed.
After we have pre-processed files of a concept, we can start to experiment with
TCAV. To run one experiment, you can run test_with_cavs.py
,
python -m supervised.test_with_cavs <modelfile> --omit-onset --layers <layer(s)>
--composer <composername> --save-path <savepath> --concepts <concept(s)>
--random-concepts <randomconcept(s)>
Note that this computes TCAV scores for a defined model, composer, for defined network-layers, and a list of concepts (where each is confronted with each of the defined random datasets). The results are stored to the according path. A more concrete example could be:
python -m supervised.test_with_cavs classifier/meta/2202180921/model/resnet50.pt --omit-onset --layers layer4 --composer mozart --save-path results/mozart_17 --concepts 1 7 --random-concepts 90
This performs testing with CAVs for pieces of mozart, the penultimate layer of the defined network, for the 'Alberti bass' and 'contrapuntal texture' concepts and the first random dataset.
A similar functionality is provided in test_tcav_significance.py
- here, however,
one concept can be defined for which a statistical significance will be run
(with all available random datasets). Calling it can be done very similarly, e.g.,
python -m supervised.test_tcav_significance <modelfile> --concept <concept>
--omit-onset --composer <composername> --save-dir <savedirectory>
Or, again a more concrete example, where we call the significance test for the Alberti bass concept and composer Mozart (for the penultimate layer):
python -m supervised.test_tcav_significance classifier/meta/2202180921/model/resnet50.pt --concept 1 --omit-onset --composer mozart --save-dir sign_mozart_1 --layers layer4
The code for the unsupervised explanation is heavily based on this code from Zhang et al. [4]. We expanded and readapted it to be able to handle MIDI files and the Non Negative Tucker factorization.
To generate an unsupervised explanation for two composer you can run generate_uns_explanation.py
python -m unsupervised.generate_uns_explanation
The main parameters to specify are:
- reducer: a string, either "NTD" (non negative Tucker decomposition) or "NMF" (non negative matrix factorization)
- dimension: an integer, either 3 or 4 to select the 3D or 4D Tucker decomposition respectively; this is not used if the reducer is set to NMF
- rank: a string, either containing an integer (e.g. "5", for NMF), or a list of integers (e.g., "[10, 13, 3, 375]" or "[10, 10, 375]" , one for each NTD dimension). Be sure to not select ranks higher than the original matrix dimension when running NTD, as this is extremely computationally costly
- layer: a string, e.g., "layer4" or "layer3".
- device : a string specifying the name of the device to use for the computation, e.g., cuda, or cpu.
- targets: a string containing the indices of the two composers to explain; the indices can be retrieve from the following dictionary; for example the explanation between Beethoven and Rachmaninoff will have targets "[9,11]"
0 : "Scriabin",
1 : "Debussy",
2 : "Scarlatti",
3 : "Liszt",
4 : "Schubert",
5 : "Chopin",
6 : "Bach",
7 : "Brahms",
8 : "Haydn",
9 : "Beethoven",
10 : "Schumann",
11 : "Rachmaninoff",
12 : "Mozart",
For example, the command
python -m unsupervised.generate_uns_explanation --reducer "NMF" --targets "[5,6]" --layer "layer4" --rank "3" --device cpu
produce 3 concepts for explaining the last layer of the classifier, focusing on Chopin and Bach, with non-negative-matrix factorization.
Other parameters and further parameter information can be visualize with:
python -m unsupervised.generate_uns_explanation --help
The results are interpreted as following:
- inspect the
summary.json
file inside the generated folder inresults\
- check the
fidelity
of your explanation to make sure it is high enough; the fidelity is computed for the two composers separately, according to the order specified in the parameter "targets", and reported in the fieldclasses
- check the
concept_sensitivity
; this is computed for each concept, for each composer; ideally you want to find a concept that has a high positive number for one composer (i.e., pushing the classifier to classify as this composer) and a low negative number for the other (i.e., pushing the classifier to not classify as this composer); this search is simplified by the following field - check the
suggested_CAVs
; this field is containing the concepts that are positive for one composer and negative for the other, sorted by the absolute value of the difference between the two values - once you selected an unsupervised concept you want to inspect, you can find a representation of the 5 piece excerpts where this is maximally activated in the folders
feature_imgs
(pianoroll interactive plots, open them with a browser like chrome) andfeature_midis
(midi files, open them with a midi file player, e.g. media player on windows); a representation of the 5 pieces where the concept is minimally activated is also available in the foldersfeature_contrast_imgs
andfeature_contrast_midis
These figures are used to explain some notions of our work but do not show results, which is why their reproduction is omitted here :)
To reproduce this table, the significance tests have to be run and saved for all concepts and all composers (for the penultimate layer). To simplify this process, you can use this script we provide which runs all tests automatically, e.g.,
sh scripts/run_significance.sh
Make sure that you have all according results available, because the script that helps in reproducing this table strongly relies on these; then you can simply run
python -m supervised.tcav_visualise
and you should be good to go! (Note that there can be slight variations in the final results due to randomness in the process of creating CAVs - this is not ideal at all, and shows some of the limitations of this approach :).
Figure 3 can be obtained by running:
python -m unsupervised.generate_uns_explanation --reducer "NTD" --dimension 4 --targets "[5,6]" --layer "layer4" --rank "[4, 13, 3, 375]"
double check that the first suggested CAV is the number 3, and open the picture located at results/layer4_r[4, 13, 3, 375][Chopin_Bach]/feature_imgs/3plotly.html
, and move the threshold cursor to 60%. The figure in the paper only contains the first 3 pieces, but there you can inspect the 5 pieces where the concept 3 is maximally activated.
If you use this approach in any research, please cite the relevant paper:
@inproceedings{concept_music2022,
title={Concept-Based Techniques for “Musicologist-friendly” Explanations in a Deep Music Classifier},
author={Foscarin, Francesco and Hoedt, Katharina and Praher, Verena and Flexer Arthur and Widmer Gerhard},
booktitle={International Society for Music Information Retrieval Conference {(ISMIR)}},
year={2022},
}
Licensed under the MIT License.
[1] S. Kim, H. Lee, S. Park, J. Lee, and K. Choi, “Deep Composer Classification Using Symbolic Representation,” ISMIR Late Breaking and Demo Papers, 2020.
[2] B. Kim, M. Wattenberg, J. Gilmer, C. J. Cai, J. Wexler, F. B. Viégas, and R. Sayres, “Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV),” in Proceedings of the 35th International Conference on Machine Learning, ICML. PMLR, 2018, pp. 2673–2682.
[3] C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C. A. Huang, S. Dieleman, E. Elsen, J. H. Engel, and D. Eck, “Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset,” in Proceedings of the 7th International Conference on Learning Representations, ICLR, 2019.
[4] R. Zhang, P. Madumal, T. Miller, K. A. Ehinger, and B. I. P. Rubinstein, “Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors,” in Proceedings of the 35th AAAI Conference on Artificial Intelligence, AAAI, 2021, pp. 11682–11690.