mtg / deepconvsep Goto Github PK

Deep Convolutional Neural Networks for Musical Source Separation

License: GNU Affero General Public License v3.0

Python 74.81% MATLAB 23.98% Shell 1.21%

signal-processing deep-learning source-separation theano convolutional-neural-networks sample-querying data-augmentation data-generation score-synthesis audio-synthesis

deepconvsep's Introduction

DeepConvSep

Deep Convolutional Neural Networks for Musical Source Separation

This repository contains classes for data generation and preprocessing and feature computation, useful in training neural networks with large datasets that do not fit into memory. Additionally, you can find classes to query samples of instrument sounds from RWC instrument sound dataset.

In the 'examples' folder you can find use cases for the classes above for the case of music source separation. We provide code for feature computation (STFT) and for training convolutional neural networks for music source separation: singing voice source separation with the dataset iKala dataset, for voice, bass, drums separation with DSD100 dataset, for bassoon, clarinet, saxophone, violin with Bach10 dataset. The later is a good example for training a neural network with instrument samples from the RWC instrument sound database RWC instrument sound dataset, when the original score is available.

In the 'evaluation' folder you can find matlab code to evaluate the quality of separation, based on BSS eval.

For training neural networks we use Lasagne and Theano.

We provide code for separation using already trained models for different tasks.

Separate music into vocals, bass, drums, accompaniment in examples/dsd100/separate_dsd.py :

python separate_dsd.py -i <inputfile> -o <outputdir> -m <path_to_model.pkl>

where :

<inputfile> is the wav file to separate
<outputdir> is the output directory where to write the separation
<path_to_model.pkl> is the local path to the .pkl file you can download from this address

Singing voice source separation in examples/ikala/separate_ikala.py :

python separate_ikala.py -i <inputfile> -o <outputdir> -m <path_to_model.pkl>

where :

<inputfile> is the wav file to separate
<outputdir> is the output directory where to write the separation
<path_to_model.pkl> is the local path to the .pkl file you can download from this address

Separate Bach chorales from the Bach10 dataset into bassoon, clarinet, saxophone, violin in examples/bach10/separate_bach10.py :

python separate_bach10.py -i <inputfile> -o <outputdir> -m <path_to_model.pkl>

where :

<inputfile> is the wav file to separate
<outputdir> is the output directory where to write the separation
<path_to_model.pkl> is the local path to the .pkl file you can download from this address

Score-informed separation of Bach chorales from the Bach10 dataset into bassoon, clarinet, saxophone, violin in examples/bach10_scoreinformed/separate_bach10.py:

python separate_bach10.py -i -o -m <path_to_model.pkl>

where :

<inputfile> is the wav file to separate
<outputdir> is the output directory where to write the separation
<path_to_model.pkl> is the local path to the .pkl file you can download from zenodo

The folder with the <inputfile> must contain the scores: 'bassoon_b.txt','clarinet_b.txt','saxophone_b.txt','violin_b.txt'. The score file as a note on each line with the format: note_onset_time,note_offset_time,note_name .

Feature computation

Compute the features for a given set of audio signals extending the "Transform" class in transform.py

For instance the TransformFFT class helps computing the STFT of an audio signal and saves the magnitude spectrogram as a binary file.

Examples

### 1. Computing the STFT of a matrix of signals \"audio\" and writing the STFT data in \"path\" (except the phase)
tt1=transformFFT(frameSize=2048, hopSize=512, sampleRate=44100)
tt1.compute_transform(audio,out_path=path, phase=False)

### 2. Computing the STFT of a single signal \"audio\" and returning the magnitude and phase
tt1=transformFFT(frameSize=2048, hopSize=512, sampleRate=44100)
mag,ph = tt1.compute_file(audio,phase=True)

### 3. Computing the inverse STFT using the magnitude and phase and returning the audio data
#we use the tt1 from 2.
audio = tt1.compute_inverse(mag,phase)

Data preprocessing

Load features which have been computed with transform.py, and yield batches necessary for training neural networks. These classes are useful when the data does not fit into memory, and the batches can be loaded in chunks.

Example

### Load binary training data from the out_path folder
train = LargeDataset(path_transform_in=out_path, batch_size=32, batch_memory=200, time_context=30, overlap=20, nprocs=7)

Audio sample querying using RWC database

The RWC instrument sound dataset contains samples played by various musicians in various styles and dynamics, comprising different instruments. You can obtain a sample for a given midi note, instrument, style, dynamics and musician(1,2,3) by using the classes in 'rwc.py'.

Example

### construct lists for the desired dynamics,styles,musician and instrument codes
allowed_styles = ['NO']
allowed_dynamics = ['F','M','P']
allowed_case = [1,2,3]
instrument_nums=[30,31,27,15] #bassoon,clarinet,saxophone,violin
instruments = []
for ins in range(len(instrument_nums)):
    #for each instrument construct an Instrument object
    instruments.append(rwc.Instrument(rwc_path,instrument_nums[ins],allowed_styles,allowed_case,allowed_dynamics))

#then, for a given instrument 'i' and midi note 'm', dynamics 'd', style 's', musician 'n'
note = self.instruments[i].getNote(melNotes[m],d,s,n)
#get the audio vector for the note
audio = note.getAudio()

Data generation

Bach10 experiments offer examples of data generation (or augmentation). Starting from the score or from existing pieces, we can augment the existing data or generate new data with some desired factors. For instance if you have four factors time_shifts,intensity_shifts,style_shifts,timbre_shifts, you can generate the possible combinations between them for a set of pieces and instruments(sources).

#create the product of these factors
cc=[(time_shifts[i], intensity_shifts[j], style_shifts[l], timbre_shifts[k]) for i in xrange(len(time_shifts)) for j in xrange(len(intensity_shifts)) for l in xrange(len(style_shifts)) for k in xrange(len(timbre_shifts))]

#create combinations for each of the instruments (sources)
if len(cc)<len(sources):
    combo1 = list(it.product(cc,repeat=len(sources)))
    combo = []
    for i in range(len(combo1)):
      c = np.array(combo1[i])
      #if (all(x == c[0,0] for x in c[:,0]) or all(x == c[0,1] for x in c[:,1])) \
      if (len(intensity_shifts)==1 and not(all(x == c[0,0] for x in c[:,0]))) \
        or (len(time_shifts)==1 and not(all(x == c[0,1] for x in c[:,1]))):
          combo.append(c)
    combo = np.array(combo)
else:
    combo = np.array(list(it.permutations(cc,len(sources))))
if len(combo)==0:
    combo = np.array([[[time_shifts[0],intensity_shifts[0],style_shifts[0],timbre_shifts[0]] for s in sources]])

#if there are too many combination, you can just randomly sample
if sample_size<len(combo):
    sampled_combo = combo[np.random.choice(len(combo),size=sample_size, replace=False)]
else:
    sampled_combo = combo

References

More details on the separation method can be found in the following article:

P. Chandna, M. Miron, J. Janer, and E. Gomez, “Monoaural audio source separation using deep convolutional neural networks” International Conference on Latent Variable Analysis and Signal Separation, 2017. PDF

M. Miron, J. Janer, and E. Gomez, "Generating data to train convolutional neural networks for low latency classical music source separation" Sound and Music Computing Conference 2017

M. Miron, J. Janer, and E. Gomez, "Monaural score-informed source separation for classical music using convolutional neural networks" ISMIR Conference 2017

Dependencies

python 2.7

climate, numpy, scipy, cPickle, theano, lasagne

The dependencies can be installed with pip:

pip install numpy scipy pickle cPickle climate theano
pip install https://github.com/Lasagne/Lasagne/archive/master.zip

Separating classical music mixtures with Bach10 dataset

We separate bassoon,clarinet,saxophone,violing using Bach10 dataset, which comprises 10 Bach chorales. Our approach consists in synthesing the original scores considering different timbres, dynamics, playing styles, and local timing deviations to train a more robust model for classical music separation.

We have three experiments:

-Oracle: train with the original pieces (obviously overfitting, hence this is the "Oracle");

-Sibelius: train with the pieces sythesized with Sibelius software;

-RWC: train with the pieces synthesized using the samples in RWC instrument sound dataset.

The code for feature computation and training the network can be found in "examples/bach10" folder.

Score-informed separation of classical music mixtures with Bach10 dataset

We separate bassoon,clarinet,saxophone,violing using Bach10 dataset, which comprises 10 Bach chorales and the associated score.

We generate training data with the approach mentioned above using the RWC database. Consequently, we train with the pieces synthesized using the samples in RWC instrument sound dataset.

The score is given in .txt files containing the name of the of the instrument and an additional suffix, e.g. 'bassoon_g.txt'. The format for a note in the text file is: onset, offset, midinotename , as the following example: 6.1600,6.7000,F4# .

The code for feature computation and training the network can be found in "examples/bach10_sourceseparation" folder.

Separating Professionally Produced Music

We separate voice, bass, drums and accompaniment using DSD100 dataset comprising professionally produced music. For more details about the challenge, please refer to SiSEC MUS challenge and DSD100 dataset.

The code for feature computation and training the network can be found in "examples/dsd100" folder.

iKala - Singing voice separation

We separate voice and accompaniment using the iKala dataset. For more details about the challenge, please refer to MIREX Singing voice separation 2016 and iKala dataset.

The code for feature computation and training the network can be found in "examples/ikala" folder.

Training models

For Bach10 dataset :

#train with the original dataset
python -m examples.bach10.compute_features_bach10 --db '/path/to/Bach10/'
#train with the the synthetic dataset generated with Sibelius
python -m examples.bach10.compute_features_bach10sibelius --db '/path/to/Bach10Sibelius/'
#train with the rwc dataset
python -m examples.bach10.compute_features_bach10rwc --db '/path/to/Bach10Sibelius/' --rwc '/path/to/rwc/'
### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.bach10.trainCNNrwc --db '/path/to/Bach10/' --dbs '/path/to/Bach10Sibelius/' --output '/output/path/'
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.bach10.trainCNNSibelius --db '/path/to/Bach10/' --dbs '/path/to/Bach10Sibelius/' --output '/output/path/'

For iKala :

python -m examples.ikala.compute_features --db '/path/to/iKala/'
### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.ikala.trainCNN --db '/path/to/iKala/'

For SiSEC MUS using DSD100 dataset :

python -m examples.dsd100.compute_features --db '/path/to/DSD100/'
### Replace gpu0 with cpu,gpu,cuda,gpu0 etc. depending on your system configuration
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,lib.cnmem=0.95 python -m examples.dsd100.trainCNN --db '/path/to/DSD100/'

Evaluation

The metrics are computed with bsseval images v3.0, as described here.

The evaluation scripts can be found in the subfolder "evaluation". The subfolder "script_cluster" contains scripts to run the evaluation script in parallel on a HPC cluster system.

For Bach10, you need to run the script Bach10_eval_only.m for each method in the 'base_estimates_directory' folder and for the 10 pieces. To evaluate the separation of the Bach10 Sibeliust dataset, use the 'Bach10_eval_only_original.m' script. Be careful not to mix the estimation directories for the two datasets.

For iKala, you need to run the script evaluate_SS_iKala.m for each of the 252 files in the dataset. The script takes as parameters the id of the file, the path to the dataset, and the method of separation, which needs to be a directory containing the separation results, stored in 'output' folder.

for id=1:252
    evaluate_SS_iKala(id,'/homedtic/mmiron/data/iKala/','fft_1024');
end

For SiSEC-MUS/DSD100, use the scripts at the web-page.

If you have access to a HPC cluster, you can use the .sh scripts in the script_cluster folder which call the corresponding .m files.

Research reproducibility

For DSD100 and iKAla, the framework was tested as a part of a public evaluation campaign and the results were published online (see the sections above).

For Bach10, we provide the synthetic Bach10 Sibeliust dataset and the Bach10 Separation SMC2017 dataset containing the separation for each method as .wav files and the evaluation results as .mat files.

If you want to compute the features and re-train the models, check the 'examples/bach10' folder and the instructions above. Alternatively, you can download an already trained model and perform separation with 'separate_bach10.py'.

If you want to evaluate the methods in Bach10 Separation SMC2017 dataset, then you can use the scripts in evaluation directory, which we explained above in the 'Evaluation' section.

If you want to replicate the plots in the SMC2017 paper, you need to have installed 'pandas' and 'seaborn' (pip install pandas seaborn) and then run the script in the plots subfolder:

bach10_smc_stats.py --db 'path-to-results-dir'

Where 'path-to-results-dir' is the path to the folder where you have stored the results for each method (e.g. if you downloaded the Bach10 Separation SMC2017, it would be the 'results' subfolder).

Acknowledgments

The TITANX used for this research was donated by the NVIDIA Corporation.

License

Copyright (c) 2014-2017
Marius Miron <miron.marius at gmail dot com>,
Pritish Chandna <pc2752 at gmail dot com>,
Gerard Erruz, and Hector Martel
Music Technology Group, Universitat Pompeu Fabra, Barcelona <mtg.upf.edu>

This program is free software: you can redistribute it and/or modify
it under the terms of the Affero GPL license published by
the Free Software Foundation, either version 3 of the License, or (at your
option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
Affero GPL license for more details.

You should have received a copy of the Affero GPL license
along with this program.  If not, see <http://www.gnu.org/licenses/>.

deepconvsep's People

Contributors

Stargazers

Watchers

Forkers

eardrummer ml-lab jdc08161063 allensmile stevenlol nieshaoshuai chagge robustfengbin fancyerii leezqcst awesome-archive gerruz yyuzhong archenroot hmartelb agangzz zhaihr23 hrituraj202 soderwall chuckcho dansuh17 tonytongzhao michaelfeng87 melspectrum007 jbwebb8 jeremygottfried alecd2 ruohoruotsi runngezhang liw71 caikuijie iamzye cdyangyu cscg cccjourney dnahid bzvew mingzhaochina b03201003 qiujkx jhuiac dacson anigi98932 pukkapies heartsdesire audiobucket bruceyang-yeu sd12832 cj401 johndpope acidburn0zzz lvaleriu batermj zc280330 cxz ideaplexus 00001101-xt iooops agbilotia1998 rickmccourt xinsxin patropavan serenidpity guhur nindidooo data-man-34 16026 ssgalitsky hin0209 appletree123123 xp-speit2018 flavio58it sucrerouge bourdalas hadryan mbencherif tomi2223 jamesliao2016 kuonanhong kant xrosliang sadam1195 praveen-ait zhangfaquan darius522 jaaaaacky tuyenbk duyongqi jackyhsiung yaoyao20050321 syams86 d7dan geo63 dax009 jim79 sjoerdapp lordakims venkatkrishnan86 nipunagarwalbits soulthidapo

deepconvsep's Issues

Cannot convert model_dsd_fft_1024.pkl Model from PKL to JSON

I'm trying to convert above PKL file to JSON but gives error before converting. So as to import KERAS Model in MATLAB.

using this code in Python:

Convert a pkl file into json file
'''
import sys
import os
import pickle as pkl
import json

def convert_dict_to_json(file_path):
with open(file_path, 'rb') as fpkl, open('%s.json' % file_path, 'w') as fjson:
data = pkl.load(fpkl)
json.dump(data, fjson, ensure_ascii=False, sort_keys=True, indent=4)

And the error is:

File "C:\Python27\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([[[[ 1.4630563e+00, 1.2371855e+00, 9.0325326e-01, ...,
2.0356092e-03, -9.0812740e-04, -8.2676094e-03]]],

Can You Provide converted .JSON file.

another way of generating the pkl models?

Since I am still having trouble with my issue here:
#1

I was wondering, maybe if instead of that, there was another way to generate the required pkl model
after the successful compute_features, for the dsd100 separation?

I am just trying to avoid the trainCNN problem, since no one has a fix for my problem, yet.
If not, I will just wait until someone figures out my initial problem.

thanks!

fix path in train_auto

use join to build path instead of string concat

how to improve separation between sources?

No matter what mixture I try to separate, even if the separation gives me nice results, but I was wondering what are the main parameters and tweaks that can be done in the framework to improve the separation results, so that the vocals has really only the vocals, and the bass, drums, other have less vocal artifacts in them?

Is this maybe just a matter of how many training material is used? or/and is there a way to tweak the framework further to improve the results?

Thanks!

OSError: [Errno 22] using examples/dsd100/trainCNN.py (Win 10 64bit)

When I run trainCNN.py, with the following "db" path:
db = "D:\\DSD100\\"
I get this error:

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
MemoryError
Traceback (most recent call last):
File "C:/Users/gabri/Desktop/Tesi Segnali Audio/PyCharmTestingArea/CNNAdapted/examples/dsd100/trainCNN.py", line 332, in
mult_factor_out=scale_factor)
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 171, in init
self.updatePath(self.path_transform_in,self.path_transform_out)
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 639, in updatePath
self.initBatches()
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 676, in initBatches
self.loadBatches()
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 255, in loadBatches
self.genBatches()
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 294, in genBatches
xall = parmap(self.loadFile, list(range(self.findex+1,self.nindex)),nprocs=self.nprocs)
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 58, in parmap
p.start()
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OSError: [Errno 22] Invalid argument

I'm using python 3.6

Versions of relevant libraries:
numpy (1.16.4)
theano (1.0.4+unknown)
lasagne 0.2.dev1
tqdm (4.32.1)
scipy (1.2.1)
m2w64-toolchain (5.3.0)
mkl (2019.4)

DSD Compute Features - ImportError transform

which pip install do I need to overcome this Error:

from transform import transformFFT

gives error

python -m compute_features --db 'D:\DSD100\DSD100'
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\runpy.py", line 193, in _run_module_as_mai
n
"main", mod_spec)
File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\DeepConvSep-master\DeepConvSep-master\examples
\dsd100\compute_features.py", line 21, in
import transform
ModuleNotFoundError: No module named 'transform'

Using examples/dsd100/separate_dsd.py

Hello,

I am getting an error in running the above file. I used the fft_1024.pkl file and a mixture.wav file from the dsd100 dataset.

The error is :
File "separate_dsd.py", line 336, in
main(sys.argv[1:])
File "separate_dsd.py", line 333, in main
train_auto(inputfile,outdir,model,0.3,30,25,32,513)
File "separate_dsd.py", line 251, in train_auto
lasagne.layers.set_all_param_values(network2,params)
File "C:\Users\path\lasagne\layers\helper.py", line 516, in set_all_param_values
(len(values), len(params)))
ValueError: mismatch: got 13 values to set 15 parameters

another cool idea

implementing the following artificial intelligence model in tensorflo:

deep convolutional recursive swarm of hybrid bid and artificial neural networks

Trying to get a version of DeepConvSep in Python3

Trying to make DeepConvSep work for Python3, on my mac, seemed like a pretty simple task at the beginning. However, I reached an impasse at the point described below:

When I run the program with this command...

python3 separate_dsd.py -i ./../../Ricotti\ \&\ Alburquerque\ -\ Dont\ You\ Believe\ Me.mp3 -o ./ -m ./../../model1.pkl

...I get the error NameError: name 'file' is not defined. file has been replaced with open within Python3.

Then I changed my code to:

def load_model(filename):
    with open(filename, 'rb') as f:
        return pickle.load(f)

However, I got the error:

Traceback (most recent call last):
  File "separate_dsd.py", line 336, in <module>
    main(sys.argv[1:])
  File "separate_dsd.py", line 333, in main
    train_auto(inputfile,outdir,model,0.3,30,25,32,513)
  File "separate_dsd.py", line 250, in train_auto
    params=load_model(model)
  File "separate_dsd.py", line 19, in load_model
    params=pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position 2: ordinal not in range(128)

Is there something up with your pickler? Can you please take a look at this? I want to help make a Python3 version of this code. I would be glad to help you along with this task.

The DSD model, where is it from?

hello, and thanks for posting this project, facinating!

Can you share where the DSD100 model is comming from? Who (or what team) was the author of that model? In this project, it is imply a link to a google drive download.

Thanks!

iKala is amplifying my results to the point of distortion

Hello. I'm running the iKala script on recordings and I like the results: However, it is amplifying my results considerably when they're done processing! They clip very bad. I've tried to normalize to -4 dB via Audacity prior to processing and that doesn't fix it... It also complains about Theano but seems to work regradless.

lasagne.layers.Conv2DLayer default adds bias

So no need to use lasagne.layers.BiasLayer

I found this when I print the shape of arrays in file fft_1024.pkl (iKala dataset)

These are the shapes and bias array (30,) occurs twice for each convolutional layer.

(30, 1, 1, 30)
(30,)
(30,)
(30, 30, 10, 20)
(30,)
(30,)
(13230, 256)
(256,)
(256, 13230)
(13230,)
(256, 13230)
(13230,)
(2,)

I also checked the code of lasagne.layers.Conv2DLayer, it confirms my assumption.

other separation tasks with this framework? force stereo?

Now that I have the training and separation finally working, I was wondering about the limits of this framework. For example, can this be modified somehow to separate speech(dialogue) from background music? or is it only built for singing vocals?

Also, training material is in stereo, but input can be stereo or mono, however why is the output mono
if input was stereo? Is there no way to force stereo output with this framework? or is that a project for the future?

Thanks!

a cool idea

idea1

feeding wavenet implementation in tensorflow simulatenous with the following things to do advanced music gestural recognition:

elastic fusion dense slam
audio data + audio advanced gestural recognition / spectral classifiers

plus:

doing training in real time

idea2

implementing the following artificial intelligence model in tensorflo:

deep convolutional recursive swarm of hybrid bid and artificial neural networks

idea3

feeding an implementing the following artificial intelligence model in tensorflow:

deep convolutional recursive swarm of hybrid bid and artificial neural networks with the following_things to do advanced music gestural recognition:
- elastic fusion dense slam
- audio data + audio advanced gestural recognition spectral classifiers

plus:

doing training in real time

idea 4

creating a c++ framework for live electronics and algorithmic composition using some of these

- use next generation state of art machine learning algorithmics implemented in tensorflow such as deep convolutional recursive swarm of hybrids bdi and ann;
- using elasticfusion orbit slam2 as an input for gesture recognition by using computer vision;
- using gpgpu driven fft for audio digital signal processing, and gesture recognition, and audio feature extraction;
- computing audio in non real time using complex gpgpu transformations
- using raya as a sound spatialization gpgpu engine

TypeError when running trainCNN, please help

OK, I have windows 7 ultimate 64-bit with service pack 1 installed.
i have visual studio 2013 community with update 5 installed
i have every req you guys put in your readme (even if you didn't specify exact version of each
req, but i assumed at least theano 0.8.2 and lasagne 0.2dev1), numpy, scipy, climate, etc
are all standard installs.

my theano installation works, i tried it by itself, so the problem is not there, also,
the compiler nvcc does work, so everything is linked and working.

in terms of your framework here, i can manage to separate a mixture using pre-trained
pkl you guys provided, without any errors. i can also manage to do the compute_features
option of dsd100 (im using dsd100subset 120mb package, not the full dsd100).

the compute_features generates a warning about non-data chunks in the wav files,
so not sure how wav files were generated, but anyway, just saying this in case this
turns out to be a problem), but it works, i get .data and .shape files in the transform
folder.

however, only thing i can't get to work, is the dsd100 trainCNN. i get the following
error:
Using gpu device 0: GeForce GTX 770 (CNMeM is enabled with initial size: 70.0% o
f memory, cuDNN 5005)
I 2017-02-21 21:49:32 trainer:433 Maximum: 0.634328
I 2017-02-21 21:49:32 trainer:434 Mean: 0.003356
I 2017-02-21 21:49:32 trainer:435 Standard dev: 0.013143
I 2017-02-21 21:49:32 trainer:163 Building Autoencoder
Traceback (most recent call last):
File "C:\Python27\lib\runpy.py", line 162, in run_module_as_main
"main", fname, loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 444, in
train_errs=train_auto(train=ld1,fun=build_ca,transform=tt,outdir=db+'output/
'+model+"/",testdir=db+'Mixtures/',model=db+"models/"+"model"+model+".pkl",num
epochs=nepochs,scale_factor=scale_factor)
File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 173, in train_auto
network2 = fun(input_var=input_var2,batch_size=train.batch_size,time_context
=train.time_context,feat_size=train.input_size)
File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 93, in build_ca
l_conv1 = lasagne.layers.Conv2DLayer(l_in_1, num_filters=50, filter_size=(1,
feat_size),stride=(1,1), pad='valid', nonlinearity=None)
File "C:\Python27\lib\site-packages\lasagne\layers\conv.py", line 599, in in
it
**kwargs)
File "C:\Python27\lib\site-packages\lasagne\layers\conv.py", line 282, in in
it
self.filter_size = as_tuple(filter_size, n, int)
File "C:\Python27\lib\site-packages\lasagne\utils.py", line 196, in as_tuple
"of {0}, got {1} instead".format(t.name, x))
TypeError: expected a single value or an iterable of int, got (1, 513L) instead

C:\DeepConvSep>

I am really not sure what that means, seems to be either a problem in your code or
something else on my end, but what could it be? thanks a lot for amazing source code
guys, i hope you can help me with my problem :)

requirements.txt climate dependency missing

In your requirements.txt you specify the package climate

This has been removed from pip:
https://pypi.org/project/climate/0.4.6/

and the users github:
https://github.com/lmjohns3/py-cli

As a result, unable to install the dependencies and run it!

Theano 0.9.0 API change

Traceback (most recent call last):
File "/homedtic/rgong/DeepConvSep/trainCNN.py", line 42, in
import lasagne
File "/homedtic/rgong/keras_env/lib/python2.7/site-packages/lasagne/init.py", line 24, in
from . import layers
File "/homedtic/rgong/keras_env/lib/python2.7/site-packages/lasagne/layers/init.py", line 7, in
from .pool import *
File "/homedtic/rgong/keras_env/lib/python2.7/site-packages/lasagne/layers/pool.py", line 6, in
from theano.tensor.signal import downsample
ImportError: cannot import name downsample

Theano has changed its API for the 0.9.0 version. lasagne haven't updated yet.

max_pool_2D method doesn't exist anymore in ``downsample".

I found the solution for this issue in Theano/Theano#4337

iKala Dataset not available

The iKala dataset it seems it is not longer available from the official repo. Any mirror?

dear MTG

I have a problem when run your codes. My command line like "python separate_dsd.py -i /home/hjz/test/1.wav -o /home/hjz/test/ -m /home/hjz/test/model_dsd_fft_1024.pkl". The 1.wav is a music file converted from a .mp3 file. Screenshot like that:

The version of theano installed on my computer is 0.8.2 and Lasagne is 0.1. I run this code in linux17.04
Could you tell me why?

parameters in a file?

Maybe I am missing something here, but if I want to let's say use other FFT or hopsize, etc, I have to update the entire framework with these values.

I was thinking a separate script or file that can update the other scripts that use these parameters, and the script would only contain couple of main parameters:

FFT = x
hopsize = y
scale factor = z
etc

Thanks!

mtg / deepconvsep Goto Github PK

deepconvsep's Introduction

DeepConvSep

Feature computation

Data preprocessing

Audio sample querying using RWC database

Data generation

References

Dependencies

Separating classical music mixtures with Bach10 dataset

Score-informed separation of classical music mixtures with Bach10 dataset

Separating Professionally Produced Music

iKala - Singing voice separation

Training models

Evaluation

Research reproducibility

Acknowledgments

License

deepconvsep's People

Contributors

Stargazers

Watchers

Forkers

deepconvsep's Issues

idea1

idea2

idea3

idea 4

Recommend Projects

Recommend Topics

Recommend Org