Code Monkey home page Code Monkey logo

dr-costas / mad-twinnet Goto Github PK

View Code? Open in Web Editor NEW
111.0 111.0 26.0 315 KB

The code for the MaD TwinNet. Demo page:

Home Page: http://arg.cs.tut.fi/demo/mad-twinnet/

License: Other

Python 100.00%
audio audio-processing audio-signal-processing autoencoders deep-learning deep-neural-networks deeplearning denoising-autoencoders mad-twinnet music-information-retrieval music-signal-processing music-source-separation pytorch recurrent-neural-networks singing-voice source-separation twin-networks voice wav

mad-twinnet's Issues

Missing variable from model file

I got errors when loading RNNEnc model:
RuntimeError: Error(s) in loading state_dict for RNNEnc:
Missing key(s) in state_dict: "gru_enc.weight_ih_l0", "gru_enc.weight_hh_l0", "gru_enc.bias_ih_l0", "gru_enc.bias_hh_l0", "gru_enc.weight_ih_l0_reverse", "gru_enc.weight_hh_l0_reverse", "gru_enc.bias_ih_l0_reverse", "gru_enc.bias_hh_l0_reverse".
Unexpected key(s) in state_dict: "gru_enc_f.weight_ih", "gru_enc_f.weight_hh", "gru_enc_f.bias_ih", "gru_enc_f.bias_hh", "gru_enc_b.weight_ih", "gru_enc_b.weight_hh", "gru_enc_b.bias_ih", "gru_enc_b.bias_hh".

Got errors when loading RNNDec model:
RuntimeError: Error(s) in loading state_dict for RNNDec:
Missing key(s) in state_dict: "gru_dec.weight_ih_l0", "gru_dec.weight_hh_l0", "gru_dec.bias_ih_l0", "gru_dec.bias_hh_l0".
Unexpected key(s) in state_dict: "gru_dec.weight_ih", "gru_dec.weight_hh", "gru_dec.bias_ih", "gru_dec.bias_hh".

But no error had found when loading FNNMasker and FNNDenoiser model

Memory error when use use_me.py

Hi, I have trained with my own data and when I'm going to use the use_me.py script I get the following error:

Traceback (most recent call last):
  File "scripts/use_me.py", line 200, in <module>
    main()
  File "scripts/use_me.py", line 195, in main
    output_file_names=_make_target_file_names(input_list)
  File "scripts/use_me.py", line 113, in use_me_process
    output_file_name=output_file_names[index]
  File "/home/david/ASR/mad-twinnet/helpers/data_feeder.py", line 191, in data_process_results_testing
    bg_hat = mix[:min_len] - voice_hat[:min_len]
MemoryError

Anyone knows what's the problem?

PD: Excellent project! It's amazing.

'outputs/states/mad.pt' file not found error while using the use_me.py script

Hi, I am trying to use the MaD TwinNet pre-trained model on an audio sample with the use_me.py script. I have downloaded the pretrained weights from here (https://zenodo.org/record/1164592#.X-CFxNbhXeQ) as suggested in the ReadMe file and placed them in the outputs/states directory. However there doesn't seem to be a mad.pt file present which seems to be causing the error. Please let me know how this can be fixed?

Thanks

Unable to reproduce same error metrics as claimed in the paper

Hi,
I downloaded the latest code from this repo and tested the model on DSD 100 dataset based on pre-trained weights. I am running this model on nVidia GTX 1080 on python 3.5.4. I am getting the following error metrics:-

Median SDR: 4.04 dB | Median SIR: 7.14 dB

In the paper, authors have claimed the following:-

Median SDR: 4.57 dB | Median SIR: 8.17 dB

My results are significantly different from the ideal values. Am I missing something?

Furthermore, I tried to retrain the model on DSD100 dataset. I used weights from 100th epoch for testing & i got the following results:-

Median SDR: 4.13 dB | Median SIR: 7.32 dB

It's a bit better but not upto the mark. What might be going wrong?

Justification of skip-filtering connection for general speech enhancement

I am exciting to find this work. Thanks for sharing your paper and code with us ๐Ÿ‘

For designing masker and denoiser, MAD-Twinnet assumes skip-filtering connection to produce masked output.

Could you provide any relevant material that mask can be estimated by the function of mixture? (formula (8))
Also, could skip-filtering connection can be valid for general speech enhancement which can include reverberation, channel distortion?

I appreciate your comments.

about overlapping frames

Hi,
Thanks for your inspiring working.After read your paper and code in data_feeder.py,I have some questions.
Let's say,for example,N = 2049,the sequence length(T) is 60 ,the overlapping subsequences(L) is 10 and the hop is 384.So for a wav file with 16KHz and duration 4 seconds,after the stft we can get the transformed array with shape [16000 * 4 / 384,2049],it's [166,2049].If the current set size is 4, the output of stft is [166*4,2049],it's [664,2049],and then we will make the overlapping subsequences with this array.
According to your code in _make_overlap_sequences,more precisely with these codes:

 mixture = stride_tricks.as_strided(
        mixture,
        shape=(int(mixture.shape[0] / (l_size - o_lap)), l_size, mixture.shape[1]),
        strides=(mixture.strides[0] * (l_size - o_lap), mixture.strides[0], mixture.strides[1])
    )
 mixture = voice[:-1, :, :]

there the l_size == 60, o_lap == 20 so we get the mixture with shape:[16, 60, 2049]
I mean the mixture[1] is 20 frames overlapped with mixture[0] and 20 frames overlapped with mixture[2],the mixture[2] is 20 frames overlapped with mixture[1] and 20 frames overlapped with
mixture[3] ...

Consequently,in code def epoch_it():

                mix_batch = mix[b_start:b_end, :, :]
                voice_true_batch = voice_true[b_start:b_end, context_length:-context_length, :]

for the voice_truth_batch the leading 10 frames and the backing 10 frames are stripped left with 40 frames and in these 40 frames there are still 10 overlapped frames ahead and 10 overlapped frames backing.There are only 20 frames which are not overlapped.
Why you make 2*L as the overlapping length to the param passed to _make_overlap_sequences?

output size not equal to input size

My input time-domain mixture mono signal is of a certain length, but the output voice and background music files are equal, but not equal to the input file. Is there an easy fix for this?

helper module is not working

%matplotlib inline
import helper

images, labels = next(iter(trainloader))

img = images[0].view(1,784)
#turn off gradient to sppped up this part
with torch.no_grad():
logits = model.forward(img)

ps = F.softmax(logits, dim= 1)
helper.view_classify(img.view(1,28,28), ps)

AttributeError: 'NoneType' object has no attribute 'to' when use use_me.py

Hi, i wanted to use the pre-trained model to process music. But i got error below when i run use_me.py

Traceback (most recent call last): File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/mad-twinnet-master/scripts/use_me.py", line 192, in <module> main() File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/mad-twinnet-master/scripts/use_me.py", line 187, in main output_file_names=_make_target_file_names(input_list) File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/mad-twinnet-master/scripts/use_me.py", line 56, in use_me_process rnn_enc.load_state_dict(torch.load(output_states_path['rnn_enc'])).to(device) AttributeError: 'NoneType' object has no attribute 'to'

Do you have any idea about that? BTW, i have no gpu in my computer.

in case of single background instrument

Hello!
I have been trying to train with my dataset having voice and single instrument (e.g. drums).

In lines 397- of helpers/data_feeder.py, the following appears to require changes. Is it OK to just delete 'bass' and 'others' without adjusting other parts? Your suggestion wound be appreciated.

if not usage_case:
bass = wav_read(os.path.join(sources_parent_path, 'bass.wav'), mono=False)[0]
drums = wav_read(os.path.join(sources_parent_path, 'drums.wav'), mono=False)[0]
others = wav_read(os.path.join(sources_parent_path, 'other.wav'), mono=False)[0]
voice = wav_read(os.path.join(sources_parent_path, 'vocals.wav'), mono=False)[0]

    bg_true = np.sum(bass + drums + others, axis=-1) * 0.5
    voice_true = np.sum(voice, axis=-1) * 0.5
    mix = np.sum(bass + drums + others + voice, axis=-1) * 0.5

Error When I run scripts/training.py

Hello !
I'm getting error when I run training.py and I couldn't know why?

-- Starting training process. Debug mode: False
-- Setting up modules... done.
-- Setting up optimizes and losses... done.
-- Training starts

Traceback (most recent call last):
  File "scripts/training.py", line 199, in <module>
    main()
  File "scripts/training.py", line 195, in main
    training_process()
  File "scripts/training.py", line 176, in training_process
    l_m=torch.mean(torch.FloatTensor(epoch_l_m)),
RuntimeError: invalid argument 1: empty Tensor at /pytorch/torch/lib/TH/generic/THTensorMath.c:3311

Thank You!

No module named helpers

Downloaded the zip repository, and trying to execute use_me.py
Following output gives this
Traceback (most recent call last): File "D:\mad-twinnet-master\mad-twinnet-master\scripts\use_me.py", line 17, in <module> from helpers.data_feeder import data_feeder_testing, data_process_results_te sting ModuleNotFoundError: No module named 'helpers'

pre_train modules do not correspond to the code

In the code, I find these:
in "use_me.py" the way to load is as follows
mad.load_state_dict(torch.load(output_states_path['mad']))
in "helpers/settings.py" it says:
output_states_path = {'mad': os.path.join(_states_path, 'mad{}.pt'.format(_debug_suffix))}
but the modules you offer are:
rnn_enc.pt
rnn_dec.pt
fnn.pt
denoiser.pt
they are part of "mad" so I can't easily load them.
Readme.md says "you must get the version of the code that is tagged as "Paper-code" and available here". So I think I downloaded the correct code. Did I do something wrong?

RuntimeError: invalid argument 1: empty Tensor at /pytorch/torch/lib/TH/generic/THTensorMath.c:3381

Reported from @djpg

Originall message follows:

Now when I'm training with mono audio files 8KHz and change it on helpers/settings.py wav_quality = {'sampling_rate': 8000, 'nb_bits': 8}

Traceback (most recent call last):
  File "scripts/training.py", line 199, in <module>
    main()
  File "scripts/training.py", line 195, in main
    training_process()
  File "scripts/training.py", line 176, in training_process
    l_m=torch.mean(torch.FloatTensor(epoch_l_m)),
RuntimeError: invalid argument 1: empty Tensor at /pytorch/torch/lib/TH/generic/THTensorMath.c:3381

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.