dr-costas / mad-twinnet Goto Github PK

The code for the MaD TwinNet. Demo page:

Home Page: http://arg.cs.tut.fi/demo/mad-twinnet/

License: Other

Python 100.00%

source-separation deep-learning audio-signal-processing music-source-separation recurrent-neural-networks singing-voice music-information-retrieval audio audio-processing twin-networks

mad-twinnet's Introduction

MaD TwinNet Repository

Welcome to the repository of the MaD TwinNet.

If you want to reproduce the results of the paper and know what you are doing, then jump ahead, get the pre-trained weights from , get the paper code version from here, and start using the MaD TwinNet.

If you just need the results, you can get them from .

If you want to re-train MaDTwinNet, then you can use the master branch, as it has the code based on the most up-to-date version of PyTorch.

There is also an on-line demo of the MaD TwinNet at the website of the MaD TwinNet.

The paper of the MaD TwinNet is presented at the IEEE World Congress on Computational Intelligence (WCCI)/International Joint Conference on Neural Networks, 2018 and can be found online at the corresponding arXiv entry. If you find MaD TwinNet useful, please consider citing our paper.

If you need some help on using MaD TwinNet, please read the following instructions.

Also, if you use any of the things existing in this repository or the associated binary files from Zenodo, please consider citing the MaD TwinNet paper, available from here.

Previous work
Extensions
How do I use it with no manual
What is the MaD TwinNet
How do I use the MaD TwinNet
Acknowledgements

Previous work

A previous usage of the MaD architecture (including the vanilla MaD) can be found at: https://github.com/Js-Mim/mss_pytorch

Extensions

Joining forces in the research (apart from beers) with the colleague P. Magron, we enhanced the performance of MaD TwinNet by applying phase recovery algorithms on top of MaD TwinNet.

We tried two use cases. One regards the singing voice separation. You can see the results for singing voice separation at the corresponding online demo. The results are also presented at INTERSPEECH 2018 and the paper can be found at the paper entry on HAL (French equivalent of arXiv).

The second case is for harmonic/percussive separation. The corresponding online demo is here, the paper is presented at the 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018, and an online version of the paper at arXiv is due to appear. You can see, though, the HAL entry of the HPSS paper.

How do I use it with no manual

You can:

re-train the MaD TwinNet, by running the script training.py,
re-test it, by running the script testing.py, or
use it, by running the script use_me.py.

The settings for all the above processes are controller by the settings.py file. For the use_me.py script, you can find directions in it or go to the How do I use it (RTFM version)? section.

If you want to re-test or use the MaD TwinNet as it is on the paper, you will need the pre-trained weights of the MaD TwinNet and the version of the code that can be used with the pre-trained weights. You can get the pre-trained weights from and the version of the code that works with the pre-trained weights from here.

If you want to re-train the MaDTwinNet and use it for your own reseach/goals, then you can use the most recent version of the code.

What is the MaD TwinNet

MaD TwinNet stands for the "Masker-Denoiser with Twin Networks architecture/method" for monaural music sound source separation. An illustration of the MaD TwinNet can be seen in the following figure:

You can read more at our paper on arXiv.

For the implementation of our method, we used the PyTorch framework.

How do I use the MaD TwinNet

Setting up the environment

Before starting using the code of this repository, you will have to install some packages for your python environment.

Must be noted that, our code is based on Python 3.6 version. So, for a better experience, we recommend using Python 3.6.

To install the dependencies, you can either use the pip package manager or the anaconda/conda.

If you want to use the pip, then you have to

clone our repository (e.g. git clone ),
navigate with your terminal inside the directory of the cloned repo (e.g. cd mad-twinnet), and then
issue at your terminal the command pip install -r requirements.txt

If you want to use the anaconda/conda, then you have to

clone our repository (e.g. git clone ),
navigate with your terminal inside the directory of the cloned repo (e.g. cd mad-twinnet), and then
issue at your terminal the command conda install --yes --file conda_requirements.txt

Dataset set-up

To do so, you will have to obtain your dataset. Your dataset should be in the dataset directory. By default, the training set should be under a directory named Dev and the testing set under a directory named Test. This means that the directories for the training and testing sets must be dataset/Dev and dataset/Test, respectively.

Also, by default, you will need numbered file names (e.g. 001.wav) and each file name should have an identifier whether the file is about the mixture, the voice, the bass, and other. Please check the Demixing Secret Dataset (DSD) for the exact file naming conventions.

If you want to use the DSD, then you most probably will want to extract it in the dataset directory and you will end up with the above mentioned directory structure and proper file names.

If you want to use a different dataset, then you have two options:

either you format your file names and directory structure to match the one from the DSD, or
you modify the file reading function to suit your needs.

For the second option, you will have to at least modify the _get_files_lists function, in the helpers directory/package.

Using the pre-trained weights

To use the pre-trained weights of the MaD TwinNet, first you have to obtain them from , and then you must get the version of the code that is tagged as "Paper-code" and available here.

Then, you have to unzip the obtained .zip file and move the resulting files in the outputs/states/ directory. These files will be the following:

rnn_enc.pt
rnn_dec.pt
fnn.pt
denoiser.pt

You must not alter the names of the files and these files cannot be used if you alter any members of the classes used in the modules/ directory.

Re-training MaD TwinNet

You can re-train the MaD TwinNet, either the mosr recent version or the paper version of the code. For example, you might want to try and find better hyper-parameters, try how the MaD TwinNet will go on a different training dataset, or any other wonderful idea :)

If you have set up the dataset correctly, then you just want to run the scripts/training.py file. You have quite enough options to run this file. For example, you can run it through your favorite IDE, or through terminal.

If you run it through terminal, please do not forget to set up the PYTHONPATH environmental variable correctly. E.g., if you are in the project root directory, you can issue the command export PYTHONPATH=$PYTHONPATH:../ and then you can issue the command python scripts/training.py.

Altering the hyper-parameters

All the hyper-parameters are in the helpers/settings.py file.

You can alter any hyper-parameter you want, but make sure that the values that you will use are correct and can actually be used.

Re-testing MaD TwinNet

You can re-test the MaD TwinNet. To do so, you need again the proper set-up of the dataset and the weights of the MaD TwinNet.

When the above are OK, then you simply run the scripts/testing.py file.

If you run the testing file through terminal, please do not forget to set up the PYTHONPATH environmental variable correctly. E.g., if you are in the project root directory, you can issue the command export PYTHONPATH=$PYTHONPATH:../ and then you can issue the command python scripts/testing.py.

Use MaD TwinNet

To use the MaD TwinNet you need to have set up the pre-trained weights. If these weights are properly set up, then you need to call the script scripts/use_me.py and provide as an argument:

either a single file, or
a text file (i.e. with ending .txt) which will have the path to a single wav file in each line.

The script will extract the voice and the background music from the provided arguments (i.e. either the single wav file or all the wav files from the .txt file) and will save it as .wav file at the same position where the corresponding wav file is.

Note bold: All wav files must be 44.1 kHz sampling frequency and 16 bits sample width (a.k.a. standard CD quality).

Example of using the MaD TwinNet:

python scripts/use_me.py -w my_wav_file.wav

python scripts/use_me.py -l a_txt_file_with_wavs.txt

Please remember to set properly the python path (e.g. export PYTHONPATH=$PYTHONPATH:../)!

Acknowledgements

Part of the computations leading to these results was performed on a TITAN-X GPU donated by NVIDIA to K. Drossos.
K. Drossos and T. Virtanen wish to acknowledge CSC-IT Center for Science, Finland, for computational resources.
D. Serdyuk would like to acknowledge the support of the following agencies for research funding and computing support: Samsung, NSERC, Calcul Quebec, Compute Canada, the Canada Research Chairs, and
CIFAR.
S.-I. Mimilakis is supported by the European Union’s H2020 Framework Programme (H2020-MSCA-ITN-2014) under grant agreement no 642685 MacSeNet (yes, in the video, is him).
The authors would like to thank P. Magron and G. Naithani (TUT, Finland) for their valuable comments and feedback during the writing process.

mad-twinnet's People

Contributors

Stargazers

Watchers

mad-twinnet's Issues

'outputs/states/mad.pt' file not found error while using the use_me.py script

Hi, I am trying to use the MaD TwinNet pre-trained model on an audio sample with the use_me.py script. I have downloaded the pretrained weights from here (https://zenodo.org/record/1164592#.X-CFxNbhXeQ) as suggested in the ReadMe file and placed them in the outputs/states directory. However there doesn't seem to be a mad.pt file present which seems to be causing the error. Please let me know how this can be fixed?

Thanks

pre_train modules do not correspond to the code

In the code, I find these:
in "use_me.py" the way to load is as follows
mad.load_state_dict(torch.load(output_states_path['mad']))
in "helpers/settings.py" it says:
output_states_path = {'mad': os.path.join(_states_path, 'mad{}.pt'.format(_debug_suffix))}
but the modules you offer are:
rnn_enc.pt
rnn_dec.pt
fnn.pt
denoiser.pt
they are part of "mad" so I can't easily load them.
Readme.md says "you must get the version of the code that is tagged as "Paper-code" and available here". So I think I downloaded the correct code. Did I do something wrong?

in case of single background instrument

Hello!
I have been trying to train with my dataset having voice and single instrument (e.g. drums).

In lines 397- of helpers/data_feeder.py, the following appears to require changes. Is it OK to just delete 'bass' and 'others' without adjusting other parts? Your suggestion wound be appreciated.

if not usage_case:
bass = wav_read(os.path.join(sources_parent_path, 'bass.wav'), mono=False)[0]
drums = wav_read(os.path.join(sources_parent_path, 'drums.wav'), mono=False)[0]
others = wav_read(os.path.join(sources_parent_path, 'other.wav'), mono=False)[0]
voice = wav_read(os.path.join(sources_parent_path, 'vocals.wav'), mono=False)[0]

    bg_true = np.sum(bass + drums + others, axis=-1) * 0.5
    voice_true = np.sum(voice, axis=-1) * 0.5
    mix = np.sum(bass + drums + others + voice, axis=-1) * 0.5

Unable to reproduce same error metrics as claimed in the paper

Hi,
I downloaded the latest code from this repo and tested the model on DSD 100 dataset based on pre-trained weights. I am running this model on nVidia GTX 1080 on python 3.5.4. I am getting the following error metrics:-

Median SDR: 4.04 dB | Median SIR: 7.14 dB

In the paper, authors have claimed the following:-

Median SDR: 4.57 dB | Median SIR: 8.17 dB

My results are significantly different from the ideal values. Am I missing something?

Furthermore, I tried to retrain the model on DSD100 dataset. I used weights from 100th epoch for testing & i got the following results:-

Median SDR: 4.13 dB | Median SIR: 7.32 dB

It's a bit better but not upto the mark. What might be going wrong?

Could not find a version that satisfies the requirement torch==0.4.1

Downloaded the source code, and the pre-trained weights. I was running the requirements.txt file but an error keeps occuring when it reaches installation of pytorch version 0.4.1. Tried using conda and similar error occurs. Hence I can't move forward to use_me.py.

Error When I run scripts/training.py

Hello !
I'm getting error when I run training.py and I couldn't know why?

-- Starting training process. Debug mode: False
-- Setting up modules... done.
-- Setting up optimizes and losses... done.
-- Training starts

Traceback (most recent call last):
  File "scripts/training.py", line 199, in <module>
    main()
  File "scripts/training.py", line 195, in main
    training_process()
  File "scripts/training.py", line 176, in training_process
    l_m=torch.mean(torch.FloatTensor(epoch_l_m)),
RuntimeError: invalid argument 1: empty Tensor at /pytorch/torch/lib/TH/generic/THTensorMath.c:3311

Thank You!

Memory error when use use_me.py

Hi, I have trained with my own data and when I'm going to use the use_me.py script I get the following error:

Traceback (most recent call last):
  File "scripts/use_me.py", line 200, in <module>
    main()
  File "scripts/use_me.py", line 195, in main
    output_file_names=_make_target_file_names(input_list)
  File "scripts/use_me.py", line 113, in use_me_process
    output_file_name=output_file_names[index]
  File "/home/david/ASR/mad-twinnet/helpers/data_feeder.py", line 191, in data_process_results_testing
    bg_hat = mix[:min_len] - voice_hat[:min_len]
MemoryError

Anyone knows what's the problem?

PD: Excellent project! It's amazing.

RuntimeError: invalid argument 1: empty Tensor at /pytorch/torch/lib/TH/generic/THTensorMath.c:3381

Reported from @djpg

Originall message follows:

Now when I'm training with mono audio files 8KHz and change it on helpers/settings.py wav_quality = {'sampling_rate': 8000, 'nb_bits': 8}

Traceback (most recent call last):
  File "scripts/training.py", line 199, in <module>
    main()
  File "scripts/training.py", line 195, in main
    training_process()
  File "scripts/training.py", line 176, in training_process
    l_m=torch.mean(torch.FloatTensor(epoch_l_m)),
RuntimeError: invalid argument 1: empty Tensor at /pytorch/torch/lib/TH/generic/THTensorMath.c:3381

Thank you!

Justification of skip-filtering connection for general speech enhancement

I am exciting to find this work. Thanks for sharing your paper and code with us 👍

For designing masker and denoiser, MAD-Twinnet assumes skip-filtering connection to produce masked output.

Could you provide any relevant material that mask can be estimated by the function of mixture? (formula (8))
Also, could skip-filtering connection can be valid for general speech enhancement which can include reverberation, channel distortion?

I appreciate your comments.

No module named helpers

Downloaded the zip repository, and trying to execute use_me.py
Following output gives this
Traceback (most recent call last): File "D:\mad-twinnet-master\mad-twinnet-master\scripts\use_me.py", line 17, in <module> from helpers.data_feeder import data_feeder_testing, data_process_results_te sting ModuleNotFoundError: No module named 'helpers'

Missing variable from model file

I got errors when loading RNNEnc model:
RuntimeError: Error(s) in loading state_dict for RNNEnc:
Missing key(s) in state_dict: "gru_enc.weight_ih_l0", "gru_enc.weight_hh_l0", "gru_enc.bias_ih_l0", "gru_enc.bias_hh_l0", "gru_enc.weight_ih_l0_reverse", "gru_enc.weight_hh_l0_reverse", "gru_enc.bias_ih_l0_reverse", "gru_enc.bias_hh_l0_reverse".
Unexpected key(s) in state_dict: "gru_enc_f.weight_ih", "gru_enc_f.weight_hh", "gru_enc_f.bias_ih", "gru_enc_f.bias_hh", "gru_enc_b.weight_ih", "gru_enc_b.weight_hh", "gru_enc_b.bias_ih", "gru_enc_b.bias_hh".

Got errors when loading RNNDec model:
RuntimeError: Error(s) in loading state_dict for RNNDec:
Missing key(s) in state_dict: "gru_dec.weight_ih_l0", "gru_dec.weight_hh_l0", "gru_dec.bias_ih_l0", "gru_dec.bias_hh_l0".
Unexpected key(s) in state_dict: "gru_dec.weight_ih", "gru_dec.weight_hh", "gru_dec.bias_ih", "gru_dec.bias_hh".

But no error had found when loading FNNMasker and FNNDenoiser model

about overlapping frames

Hi,
Thanks for your inspiring working.After read your paper and code in data_feeder.py,I have some questions.
Let's say,for example,N = 2049,the sequence length(T) is 60 ,the overlapping subsequences(L) is 10 and the hop is 384.So for a wav file with 16KHz and duration 4 seconds,after the stft we can get the transformed array with shape [16000 * 4 / 384,2049],it's [166,2049].If the current set size is 4, the output of stft is [166*4,2049],it's [664,2049],and then we will make the overlapping subsequences with this array.
According to your code in _make_overlap_sequences,more precisely with these codes:

 mixture = stride_tricks.as_strided(
        mixture,
        shape=(int(mixture.shape[0] / (l_size - o_lap)), l_size, mixture.shape[1]),
        strides=(mixture.strides[0] * (l_size - o_lap), mixture.strides[0], mixture.strides[1])
    )
 mixture = voice[:-1, :, :]

there the l_size == 60, o_lap == 20 so we get the mixture with shape:[16, 60, 2049]
I mean the mixture[1] is 20 frames overlapped with mixture[0] and 20 frames overlapped with mixture[2],the mixture[2] is 20 frames overlapped with mixture[1] and 20 frames overlapped with
mixture[3] ...
Consequently,in code def epoch_it():

                mix_batch = mix[b_start:b_end, :, :]
                voice_true_batch = voice_true[b_start:b_end, context_length:-context_length, :]

for the voice_truth_batch the leading 10 frames and the backing 10 frames are stripped left with 40 frames and in these 40 frames there are still 10 overlapped frames ahead and 10 overlapped frames backing.There are only 20 frames which are not overlapped.
Why you make 2*L as the overlapping length to the param passed to _make_overlap_sequences?

AttributeError: 'NoneType' object has no attribute 'to' when use use_me.py

Hi, i wanted to use the pre-trained model to process music. But i got error below when i run use_me.py

Traceback (most recent call last): File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/mad-twinnet-master/scripts/use_me.py", line 192, in <module> main() File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/mad-twinnet-master/scripts/use_me.py", line 187, in main output_file_names=_make_target_file_names(input_list) File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/mad-twinnet-master/scripts/use_me.py", line 56, in use_me_process rnn_enc.load_state_dict(torch.load(output_states_path['rnn_enc'])).to(device) AttributeError: 'NoneType' object has no attribute 'to'

Do you have any idea about that? BTW, i have no gpu in my computer.

output size not equal to input size

My input time-domain mixture mono signal is of a certain length, but the output voice and background music files are equal, but not equal to the input file. Is there an easy fix for this?

helper module is not working

%matplotlib inline
import helper

images, labels = next(iter(trainloader))

img = images[0].view(1,784)
#turn off gradient to sppped up this part
with torch.no_grad():
logits = model.forward(img)

ps = F.softmax(logits, dim= 1)
helper.view_classify(img.view(1,28,28), ps)