breezewhite / music-transcription-with-semantic-segmentation Goto Github PK

View Code? Open in Web Editor NEW

128.0 8.0 21.0 340.72 MB

Automatic music transcription using semantic segmentation model. Reached state-of-the-art score on MAPS and MusicNet.

Home Page: http://bit.ly/transcribe-colab

License: GNU General Public License v3.0

Python 99.43% Shell 0.57%

music-transcription

music-transcription-with-semantic-segmentation's Introduction

Music Transcription with Semantic Model

Notice - A new project has been launched, which also contains this work. Please visit omnizart.

This is a Automatic Music Transcription (AMT) project, aim to deal with Multi-pitch Estimation (MPE) problem, which has been a long-lasting and still a challenging problem. For the transcription, we leverage the state-of-the-art image semantic segmentation neural network and attention mechanism for transcribing piano solo, and also multi-instrument performances.

The dataset used is MAPS and MusicNet, which the first one is a solo-piano performance collection, and the second is a multi-instrument performance collection. On both dataset, we achieved the state-of-the-art results on MPE (Multi-Pitch Estimation) case frame-wisely, which on MAPS we achieved F-score 86.73%, and on MusicNet we achieved F-score 73.70%.

This work was done based on our prior work of repo1, repo2. For more about our works, please meet our website.

For whom would interested in more technical details, the original paper is here.

Quick Start

The most straight forward way to enjoy our project is to use our colab. Just press the start button cell by cell, and you cant get the final output midi file of the given piano clip.

A more technical way is to download this repository by executing git clone https://github.com/BreezeWhite/Music-Transcription-with-Semantic-Segmentation.git, and then enter scripts folder, modify transcribe_audio.sh, then run the script.

Quick Start
Overview
Install Dependencies
Usage

Overview

One of the main topic in AMT is to transcribe a given raw audio file into symbolic form, that is transformation from wav to midi. And our work is the middle stage of this final goal, which we first transcribe the audio into what we called "frame level" domain. This means we split time into frames, and the length of each frame is 88, corresponding to the piano rolls. And then make prediction of which key is in activation.

Here is an example output:

The top row is the predicted piano roll, and the bottom row is the original label. Colors blue, green, and red represent true-positive, false-positive, and false-negative respectively.

We used semantic segmentation model for transcription, which is also widely used in the field of image processing. This model is originally improved from DeepLabV3+, and further combined with U-net architecture and focal loss, Illustrated as below:

Installation

To install the requirements, enter the following command:

pip install -r requirements.txt

Download weights of the check points:

git lfs fetch

Usage

For a quick example usage, you can enter scripts folder and check the code in the script to see how to use the python code.

Pre-processing

Download dataset from the official website of MAPS and MusicNet.
cd scripts
Modify the content of generate_feature.sh
Run the generate_feature.sh script

Training

There are some cases for training, by defining different input feature type and output cases. For quick start, please refer to scripts/train_model.sh

For input, you can either choose using HCFP or CFP representation, depending on your settings of pre-processed feature.

For output, you can choose to train on MPE mode or multi-instrument mode, if you are using MusicNet for training. If you are using MAPS for training, then you can only train on MPE mode.

To train the model on MusicNet, run the command:

python3 TrainModel.py MusicNet \
    <output/model/name>
    --dataset-path <path/to/extracted/feature> \

The default case will train on MPE by using CFP features. You can train on multi-instrument mode by adding --multi-instruments flag, or change to use HCFP feature by adding --use-harmonic flag.

There are also some cases you can specify to accelerate the progress. Specify --use-ram to load the whole features into the ram if your ram is big enough (at least 64 GB, suggested >100 GB).

To validate the execution of the training command, you can also specify less epochs and steps by adding -e 1 -s 500.

And to continue train on a pre-trained model, add --input-model <path/to/pre-trained/model>.

Evaluation

NOTICE: For a whole and complete evaluation process, please check the version 1 code in v1 folder.

To predict and evaluate the scores with label, run the command:

python3 Evaluation.py frame \
    --feature-path <path/to/generated/feature> \
    --model-path <path/to/trained/model> \
    --pred-save-path <path/to/store/predictions> \

You can check out scripts/evaluate_with_pred.sh and scripts/pred_and_evaluate.sh for example use.

Single Song Transcription

To transcribe on a single song, run the command:

python3 SingleSongTest.py \
    --input-audio <input/audio> 
    --model-path <path/to/pre-trained/model>

There will be an output file under the same path named pred.hdf, which contains the prediction of the given audio.

To get the predicted midi, add --to-midi <path/to/save/midi> flag. The midi will be stored at the given path.

There is also an example script in scripts folder called transcribe_audio.sh

music-transcription-with-semantic-segmentation's People

Contributors

Stargazers

Watchers

music-transcription-with-semantic-segmentation's Issues

How can I reproduce the results on MAPS in your paper?

How can I reproduce the results on MAPS in your paper? The checkpoint in MAPS-CFP-Frame has the following problem OSError: unable to open file (file signature not found)

I found Miss 2

README.md > Usage > Training
python3 TrainModel.py MusicNet \ --dataset-path <path/to/extracted/feature> \ -o <output/model/name>
"-o" is not defined in TrainModel.py, line 164.

or correct writing is
python3 TrainModel.py MusicNet \ <output/model/name> \ --dataset-path <path/to/extracted/feature>

unable to open file after running transcribe_audio.sh

I ran transcribe_audio.sh and got error below:

bash transcribe_audio.sh
Using TensorFlow backend.
Processing features of input audio: sample_audio.wav
Sample: 12193
Traceback (most recent call last):
File "SingleSongTest.py", line 78, in
main(args)
File "SingleSongTest.py", line 44, in main
model = load_model(args.model_path)
File "/home/kr/Downloads/Music-Transcription-with-Semantic-Segmentation/project/utils.py", line 101, in load_model
with h5py.File(full_path, "r") as w:
File "/home/kr/.local/lib/python3.6/site-packages/h5py/_hl/files.py", line 408, in init
swmr=swmr)
File "/home/kr/.local/lib/python3.6/site-packages/h5py/_hl/files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

Am I doing it correctly?

The result of the model you provided

I fixed the bugs in your code and got the output midi files. Then I transformed the midi files to txt files contain onset, offset, pitch. Then I used mir_eval to evaluate your results on the note level and what I got was bellow:

note : f-0.130963044896 p-0.133633039813 r-0.131702754674

I doubted the result on frame level in your paper.

I found Miss

I am Japanese with dirt English.
In GenFeature.py, line 50, "dataset-path" is not correct.
It's "dataset_path". Must be underbar.

No module named 'project.test'

I tried to run "SingleSongTest.py", but apparently a module is missing:

ModuleNotFoundError: No module named 'project.test'

This module would contain the inference method definition.
Can you help me?

pred index 2 is out of bounds

when running "SingleSongTest.py" ，I meet this error：

D:\Music-Transcription-with-Semantic-Segmentation-master\project\utils.py:304: ResourceWarning: unclosed file <_io.TextIOWrapper name='.\CheckPoint\MAPS_CFP_MPE\arch.yaml' mode='r' encoding='cp936'>
model = model_from_yaml(open(os.path.join(model_path, "arch.yaml")).read(), custom_objects=custom_layers)
2019-08-14 21:59:51.698793: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
model .\CheckPoint\MAPS_CFP_MPE\ loaded
Predicting...
pred.shape
(4096, 352, 2)
Traceback (most recent call last):
File "SingleSongTest.py", line 108, in
main(args)
File "SingleSongTest.py", line 86, in main
notes, midi = PostProcess(pred)
File "D:\Music-Transcription-with-Semantic-Segmentation-master\project\postprocess.py", line 267, in PostProcess
ch = pred[:,:,2]
IndexError: index 2 is out of bounds for axis 2 with size 2

the shape of pred is [,,2] , so the index 2 is out of bounds.

How can I solve this problem?

Thanks.

Assertion error when running transcribe_audio.sh

I used one of my .wav file as the input and tried the transcribe_audio with the checkpoint model from CheckPoint/MAPS_CFP_MPE/ but I got this error.

Model CheckPoint/MAPS_CFP_MPE loaded
Predicting...
Traceback (most recent call last):
File "SingleSongTest.py", line 78, in
main(args)
File "SingleSongTest.py", line 53, in main
midi = MultiPostProcess(pred, mode="note", onset_th=args.onset_th, dura_th=0.5, frm_th=3, inst_th=1.1, t_unit=0.02)
File "/media/tenvinc/New Volume/Music-Transcription-with-Semantic-Segmentation/project/postprocess.py", line 326, in MultiPostProcess
assert((pred.shape[-1]-1)%ch_per_inst == 0)
AssertionError

I tried printing out the pred.shape and realized that it was 1 instead of 2 that it was expecting. Please help thanks:)

A lot of bugs in your code

In SingleSongTest.py [图片][图片][图片] Your code cannot run at all. Your main function in SingleSongTest.py cannot get timesteps variable.

module missing in v1/predict.py

The inference function is missing in line no. 14 v1/Predict.py.

from project.test import inference

I can't find project.test in the repo. Can you please tell me where is the location of project.test in line no 14 of v1/Predict.py.