A TensorFlow implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms"
This is a TensorFlow implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms" using Keras. This repository only implements the best model of the paper. (the model described in Table 1; m=3, n=9)
- Prerequisites
- Preparing MagnaTagATune (MTT) Dataset
- Preprocessing the MTT dataset
- Training a model from scratch
- Evaluating a model
- Python 3.5 and the required packages
ffmpeg
(required formadmom
)
pip install -r requirements.txt
pip install madmom
The madmom
package has a install-time dependency, so should be
installed after installing packages in requirements.txt
.
This will install the required packages:
- tensorflow 1.0.1 (has an issue on 1.1.0)
- keras
- pandas
- scikit-learn
- madmom
- numpy
- scipy
- cython
- h5py
ffmpeg
is required for madmom
.
brew install ffmpeg
add-apt-repository ppa:mc3man/trusty-media
apt-get update
apt-get dist-upgrade
apt-get install ffmpeg
yum install epel-release
rpm --import http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro
rpm -Uvh http://li.nux.ro/download/nux/dextop/el ... noarch.rpm
yum install ffmpeg
Download audio data and tag annotations from here. Then you should
see 3 .zip
files and 1 .csv
file:
mp3.zip.001
mp3.zip.002
mp3.zip.003
annotations_final.csv
To unzip the .zip
files, merge and unzip them (referenced here):
cat mp3.zip.* > mp3_all.zip
unzip mp3_all.zip
You should see 16 directories named 0
to f
. Typically, 0 ~ b
are
used to training, c
to validation, and d ~ f
to test.
To make your life easier, place them in a directory as below:
├── annotations_final.csv
└── raw
├── 0
├── 1
├── ...
└── f
And we will call the directory BASE_DIR
. Preparing the MTT dataset is Done!
This section describes a required preprocessing task for the MTT
dataset. Note that this requires 57G
storage space.
These are what the preprocessing does:
- Select top 50 tags in
annotations_final.csv
- Split dataset into training, validation, and test sets
- Segment the raw audio files into
59049
sample length - Convert to TFRecord format
To run the preprocessing, copy a shell template and edit the copy:
cp scripts/build_mtt.sh.template scripts/build_mtt.sh
vi scripts/build_mtt.sh
You should fill in the environment variables:
BASE_DIR
the directory containsannotations_final.csv
file andraw
directoryN_PROCESSES
number of processes to use; the preprocessing uses multi-processingENV_NAME
(optional) if you usevirtualenv
orconda
to create a separated environment, write your environment name
The below is an example:
BASE_DIR="/path/to/mtt/basedir"
N_PROCESSES=4
ENV_NAME="sample_cnn"
And run it:
./scripts/build_mtt.sh
The script will automatically run a process in the background, and tail output which the process prints. This will take a few minutes to an hour according to your device.
The converted TFRecord files will be located in your
${BASE_DIR}/tfrecord
. Now, your BASE_DIR
's structure should be like
this:
├── annotations_final.csv
├── build_mtt.log
├── labels.txt
├── raw
│ ├── 0
│ ├── ...
│ └── f
└── tfrecord
├── test-000-of-036.seq.tfrecords
├── ...
├── test-035-of-036.seq.tfrecords
├── train-000-of-128.tfrecords
├── ...
├── train-127-of-128.tfrecords
├── val-000-of-012.seq.tfrecords
├── ...
└── val-011-of-012.seq.tfrecords
To train a model from scratch, copy a shell template and edit the copy like what did above:
cp scripts/train.sh.template scripts/train.sh
vi scripts/train.sh
And fill in the environment variables:
BASE_DIR
the directory containstfrecord
directoryTRAIN_DIR
where to save your trained model, and summaries to visualize your training using TensorBoardENV_NAME
(optional) if you usevirtualenv
orconda
to create a separated environment, write your environment name
The below is an example:
BASE_DIR="/path/to/mtt/basedir"
TRAIN_DIR="/path/to/save/outputs"
ENV_NAME="sample_cnn"
Let's kick off the training!:
./scripts/train.sh
The script will automatically run a process in the background, and tail output which the process prints.
Copy an evaluating shell script template and edit the copy:
cp scripts/evaluate.sh.template scripts/evaluate.sh
vi scripts/evaluate.sh
Fill in the environment variables:
BASE_DIR
the directory containstfrecord
directoryCHECKPOINT_DIR
where you saved your model (TRAIN_DIR
when training)ENV_NAME
(optional) if you usevirtualenv
orconda
to create a separated environment, write your environment name
The script doesn't evaluate the latest model but the best model. If you
want to evaluate the latest model, you should give --best=False
as an
option.