spijkervet / clmr Goto Github PK

View Code? Open in Web Editor NEW

301.0 8.0 47.0 79.37 MB

Official PyTorch implementation of Contrastive Learning of Musical Representations

Home Page: https://arxiv.org/abs/2103.09410

License: Apache License 2.0

Python 100.00%

self-supervised-learning contrastive-learning music-information-retrieval music-classification

clmr's People

Stargazers

Watchers

Forkers

kiminh chenchy hongwen-sun ishine illc-uva martinalwt michoemad db121 jiihyunk cdqncn agangzz pvdstel asano-o gonyrosenman sahar-shirmardi himmel94 anupsingh15 wendongj yaoodng7 dapeng2018 farazmah hotsunchip artlist-ltd shabashaash finerestaurant eldrin dhockaday sai-soum mxkrn yijiuzai hugofloresgarcia benjamin-von-barner-altenburg chymaera96 okwon78 recsys-tools reco-obiwan noamurphy iq-scm fducau waleedbinzafar hcynomo 2021mt13005-divesh neilkimn waegari pedrocg42 2724170230 gdamron

clmr's Issues

Weights for million song dataset?

Hi,

Thanks for releasing this nice repository!

Do you plan on releasing the weights for the linear classifier trained on the million song dataset too? I would be very happy to use it in my work because I would care about these more abstract "happy" or "sad" classes.

On another note: I might be too stupid to see it, but I could not find an easy way to assign the predictions of the linear classifier trained on the magnatagatune dataset to their corresponding labels. In the paper, you say that you choose the top 50 most common labels. Is there a list somewhere here in the repository for it or can I look it up on the dataset site? I do not want to mess up the order of the labels for obvious reasons...

Thanks again and kind regards,
Anton

[ INFO ] MusicExtractorSVM: no classifier models were configured by default

what does this info mean? could I disable this info? it's quite annoying.

P.S. your code is really amazing, clean and enjoyable.

moules.linear_evaluation is missing parameters when calling Accuracy and AveragePrecision

When running the LinearEvaluation module I get the following error:

TypeError: Accuracy.__new__() missing 1 required positional argument: 'task'

and similar when calling torchmetrics.AveragePrecision without the task argument

Proposed solution

        self.accuracy = torchmetrics.Accuracy(
            task="multilabel",
            num_labels=output_dim
        )
        self.average_precision = torchmetrics.AveragePrecision(
            task='multilabel',
            num_labels=output_dim,
            pos_label=1
        )

Does this project only run on linux platform?

Import Error in colab

Hello, I'm trying to run your colab-version source code. but it seems like some ERROR here:
when I run this code:

!git clone https://github.com/spijkervet/clmr %cd /content/clmr !pip install . -q

output:

Cloning into 'clmr'... remote: Enumerating objects: 3460, done. remote: Counting objects: 100% (817/817), done. remote: Compressing objects: 100% (546/546), done. remote: Total 3460 (delta 414), reused 579 (delta 245), pack-reused 2643 Receiving objects: 100% (3460/3460), 45.50 MiB | 29.01 MiB/s, done. Resolving deltas: 100% (2047/2047), done. /content/clmr DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default. pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555. |████████████████████████████████| 2.9 MB 7.9 MB/s |████████████████████████████████| 1.0 MB 35.8 MB/s |████████████████████████████████| 829 kB 48.7 MB/s |████████████████████████████████| 596 kB 46.7 MB/s |████████████████████████████████| 125 kB 46.4 MB/s |████████████████████████████████| 329 kB 45.1 MB/s |████████████████████████████████| 1.1 MB 40.7 MB/s |████████████████████████████████| 271 kB 39.2 MB/s |████████████████████████████████| 160 kB 47.2 MB/s |████████████████████████████████| 192 kB 52.0 MB/s |██████████████████████████████▎ | 834.1 MB 1.3 MB/s eta 0:00:38tcmalloc: large alloc 1147494400 bytes == 0x55d9faec4000 @ 0x7f710baee615 0x55d9f71684cc 0x55d9f724847a 0x55d9f716b2ed 0x55d9f725ce1d 0x55d9f71dee99 0x55d9f71d99ee 0x55d9f716cbda 0x55d9f71ded00 0x55d9f71d99ee 0x55d9f716cbda 0x55d9f71db737 0x55d9f725dc66 0x55d9f71dadaf 0x55d9f725dc66 0x55d9f71dadaf 0x55d9f725dc66 0x55d9f71dadaf 0x55d9f716d039 0x55d9f71b0409 0x55d9f716bc52 0x55d9f71dec25 0x55d9f71d99ee 0x55d9f716cbda 0x55d9f71db737 0x55d9f71d99ee 0x55d9f716cbda 0x55d9f71da915 0x55d9f716cafa 0x55d9f71dac0d 0x55d9f71d99ee |████████████████████████████████| 881.9 MB 20 kB/s |████████████████████████████████| 58 kB 5.1 MB/s |████████████████████████████████| 23.3 MB 48.1 MB/s Building wheel for clmr (setup.py) ... done Building wheel for future (setup.py) ... done Building wheel for julius (setup.py) ... done ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchtext 0.10.0 requires torch==1.9.0, but you have torch 1.10.0 which is incompatible.

t-sne visualization

Can you tell me more details about t-sne visualization？I reproduced your program on the magnatagatune dataset and got PR-AUC of 0.3496 and ROC-AUC of 0.886. I noticed that you used t-sne visualization in your paper. But when I tried to visualize the obtained features, I got a bad visualization result. I put 'extract_ 'representations' in 'representations' is converted to list format as the input of t-sne. The feature map I got shows that the features are not well distinguished, I printed the representations and found that each feature was similar. I would like to know what is your input when visualizing with t-sne. I want to get a good feature visualization.

def extract_representations(self, dataloader: DataLoader) -> Dataset:

    representations = []
    ys = []
    for x, y in tqdm(dataloader):
        with torch.no_grad():
            h0 = self.encoder(x)
            representations.append(h0)
            ys.append(y)

    if len(representations) > 1:
        representations = torch.cat(representations, dim=0)
        ys = torch.cat(ys, dim=0)
    else:
        representations = representations[0]
        ys = ys[0]

    tensor_dataset = TensorDataset(representations, ys)
    return tensor_dataset

1%| | 1/147 [00:08<21:49, 8.97s/it][tensor([[ 0.0147, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170],
[ 0.0146, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170],
[ 0.0146, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170],
...,
[ 0.0147, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170],
[ 0.0146, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170],
[ 0.0146, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170]])]
[tensor([[ 0.0147, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170],
[ 0.0146, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170],
[ 0.0146, 0.0385, 0.0067, ..., -0.0021, -0.0179, -0.0170],
...,

config for reproducing the provided model clmr_checkpoint_10000.pt

I trained and evaluated with the default config, which resulted in roc auc 86.3.

Then I find the config is inconsistent with the paper's golden setttings,

                     config.yaml        paper
batch size            48                 96
max epoches          200               1000
sample rate        16000              22050

I changed these params and retrained, got roc auc 87.3, a lot better. Still, it was far from the 88.5 in the paper.

Meanwhile, finetuning linear classifier from clmr_checkpoint_10000.pt did work.

So I wonder if you could provide a config setting for reproducing the pretrained model.

How much loss does the model converge

million song dataset

Where can I get the dataset?

RuntimeError: Sizes of tensors must match except in dimension 0. Got 59049 and 1 in dimension 1 (The offending index is 1)

aa.shape torch.Size([1, 1]) zz.shape torch.Size([1, 59049])
This is caused by RandomApply(
p=0.6
<torchaudio_augmentations.augmentations.pitch_shift.PitchShift object at 0x7ffaf7365350>
)

Extracting embeddings?

Hey I want to use trained model to extract audio embeddings, could you help with an example on how to do it? I understand that the repo is no longer maintained and probably abandoned, but maybe someone sees this. =)

PyTorch Lightning version

Hi,

When I followed these steps to test pre-trained model, I got this error:

(clmr) yunusemre.ozkose@server:/CLMR$ python main.py --dataset audio --dataset_dir /path/to/test_audio_dataset_dir
Global seed set to 42
Traceback (most recent call last):
  File "main.py", line 129, in <module>
    module = module.load_from_checkpoint(
  File "/miniconda3/envs/cmlr/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 137, in load_from_checkpoint
    return _load_from_checkpoint(
  File "/miniconda3/envs/cmlr/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 205, in _load_from_checkpoint
    return _load_state(cls, checkpoint, strict=strict, **kwargs)
  File "/miniconda3/envs/cmlr/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 250, in _load_state
    obj = cls(**_cls_kwargs)
TypeError: __init__() missing 1 required positional argument: 'args'

There was a related bug in Lightning that seems to be solved. Could you state your PyTorch Lightning version?

colab installation broken (dependencies)

Hi! It seems there is some dependency issue going on. I've been troubleshooting locally with no luck.

When running the installation bit of the colab notebook I'm getting the following:

  Preparing metadata (setup.py) ... done
ERROR: Could not find a version that satisfies the requirement torch==1.9.0 (from clmr) (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch==1.9.0

Currently investigating a good dependency configuration for this repo. Info welcome.

Is there any length limit of the music with this method?

Nice to see contrastive learning used in music area, is there any length limit? Is it possible to get meaningful representation (for example, hundreds dimension vector) of song (few minutes long) with this method? Look forward for your reply, thanks a lot.

the pretrained model on GTZAN

hello, I appreciate your work in music representation. But i have a question: how many epochs do you train the model for the GTZAN dataset? Can you release the pretrained model of GTZAN?

Possible major bug in evaluation

Hi @Spijkervet !

I played around with the pretrained Magnatagatune weights and discovered something. The predicted values were relatively low and I was wondering what's up.

Now I discovered, that in the evaluation.py in line 29 there is output = torch.nn.functional.softmax(output, dim=1). But, as far as I understand, the Magnatagatune (and the MSD dataset) are multi-label tasks - their loss functions are also the binary cross entropy in code. Hence, I suppose that instead of the softmax there should be a torch.sigmoid being used there.

Please let me know if I'm wrong, but as I see it now this could change the results of CLMR quite significantly (for the better), given that this code was used to generate the results.

About the experiment results: different methods' songs' length?

Hello, @Spijkervet , I read the paper and source code. I found the clmr linear_evaluation only used about 2.67 (random sample size 59049 / sample rate 22050) seconds to do classification? This may decrease the auc?

ValueError: The `target` has to be an integer tensor.

Hi
When I do linear evaluation:
Traceback (most recent call last):
File "linear_evaluation.py", line 149, in
trainer.fit(module, train_loader, valid_loader)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
self._evaluation_loop.run()
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 239, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/data1/simon/projects/musicrepresentations/clmr/modules/linear_evaluation.py", line 64, in validation_step
self.log("Valid/accuracy", self.accuracy(preds, y))
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/torchmetrics/metric.py", line 205, in forward
self.update(*args, **kwargs)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/torchmetrics/metric.py", line 263, in wrapped_func
return update(*args, **kwargs)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/torchmetrics/classification/accuracy.py", line 228, in update
mode = _mode(preds, target, self.threshold, self.top_k, self.num_classes, self.multiclass)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/torchmetrics/functional/classification/accuracy.py", line 59, in _mode
preds, target, threshold=threshold, top_k=top_k, num_classes=num_classes, multiclass=multiclass
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/torchmetrics/utilities/checks.py", line 251, in _check_classification_inputs
_basic_input_validation(preds, target, threshold, multiclass)
File "/data1/anaconda3/envs/CLMR/lib/python3.6/site-packages/torchmetrics/utilities/checks.py", line 33, in _basic_input_validation
raise ValueError("The target has to be an integer tensor.")
ValueError: The target has to be an integer tensor.

Where is 'processed_annotations' for Million Song Dataset?

Hello, thank you very much for your wonderful GitHub repository. 😊

I tried to train the code in the repository using the Million Song Dataset, but it prompted me with a FileNotFoundError: [Errno 2] No such file or directory: ‘data/million_song_dataset/processed_annotations/output_labels_msd.txt’. 😢

Where can I download the files in the processed_annotations folder such as ‘train_gt_msd.tsv’, ‘output_labels_msd.txt’, ‘index_msd.tsv’, etc.?

I couldn’t find any clues to download these files on the official website of the Million Song Dataset, and the MSD dataset I downloaded did not contain these files.😭

Thank you for your time and assistance. 😊

Train / Validation / Test splits for million song dataset

Hi!
Thank you for releasing this repo :)

I was wondering where I can find the train/test/validation splits you used for MSD? My team and I are trying to reproduce this study but, unfortunately, we can't find the 201 680 / 11 774 / 28 435 splits and the corresponding tags from Last.FM. Would be very helpful for any assistance on this!

Kind regards,
Cody

train by raw dataset

Hi，I want to use this method to implement a feature of my graduation project. But how can i train a model using a raw dataset such as Universal Music? Looking forward to your reply

spijkervet / clmr Goto Github PK

clmr's People

Stargazers

Watchers

Forkers

clmr's Issues

Recommend Projects

Recommend Topics

Recommend Org