sigsep / norbert Goto Github PK

Painless Wiener filters for audio separation

Home Page: https://sigsep.github.io/norbert

License: MIT License

Python 96.56% Shell 3.44%

norbert's Introduction

Norbert

Norbert is an implementation of multichannel Wiener filter, that is a very popular way of filtering multichannel audio for several applications, notably speech enhancement and source separation.

This filtering method assumes you have some way of estimating power or magnitude spectrograms for all the audio sources (non-negative) composing a mixture. If you only have a model for some target sources, and not for the rest, you may use norbert.residual_model to let Norbert create a residual model for you.

Given all source spectrograms and the mixture Time-Frequency representation, this repository can build and apply the filter that is appropriate for separation, by optimally exploiting multichannel information (like in stereo signals). This is done in an iterative procedure called Expectation Maximization, where filtering and re-estimation of the parameters are iterated.

From a beginner's perspective, all you need to do is often to call norbert.wiener with the mix and your spectrogram estimates. This should handle the rest.

From a more expert perspective, you will find the different ingredients from the EM algorithm as functions in the module as described in the API documentation

Installation

pip install norbert

Usage

Asssuming a complex spectrogram X, and a (magnitude) estimate of a target V to be extracted from the spectrogram, performing the multichannel wiener filter is as simple as this:

X = stft(audio)
V = model(X)
Y = norbert.wiener(V, X)
estimate = istft(Y)

How to contribute

norbert is a community focused project, we therefore encourage the community to submit bug-fixes and requests for technical support through github issues. For more details of how to contribute, please follow our CONTRIBUTING.md.

Authors

Antoine Liutkus, Fabian-Robert Stöter

Citation

If you want to cite the Norbert software package, please use the DOI from Zenodo:

License

MIT

norbert's People

Contributors

Stargazers

Watchers

norbert's Issues

Enable automatic unit testing

add travis scripts

Problem using the residual model

I think that I followed the documentation closely, but I'm failing to get some decent results with the residual_model + wiener.
Basically I took a background music track, and added another source using sox -m. Then I'm feeding the mix and the background track to my script and expect to extract the added source.
This is a script I'm using for separation, but it doesn't seem to work well.
Please advise...

#!/usr/bin/env python3
import sys
import librosa
import norbert
import numpy as np
from scipy.io import wavfile

sr = 48000
n_fft = 4096
hop = 512
mono = True


def to_short(y):
    norm = np.abs(y).max()
    y /= norm
    return (y * 32767).astype(np.int16)


def separate(mix, src):
    X = librosa.stft(mix, n_fft=n_fft, hop_length=hop, center=False).T
    V = librosa.stft(src, n_fft=n_fft, hop_length=hop, center=False).T

    # 1 channel dimension
    X = np.expand_dims(X, -1)
    # 1 channel dimension
    V = np.expand_dims(V, -1)
    # 1 source dimension
    V = np.expand_dims(V, -1)

    print('X shape', X.shape)
    print('V shape', V.shape)

    # v: (nb_frames, nb_bins, {1, nb_channels}, nb_sources)
    # x: (nb_frames, nb_bins, nb_channels)
    V = norbert.residual_model(V, X, alpha=1)
    print('V shape (w/residual)', V.shape)
    Y = norbert.wiener(V, X.astype(np.complex128),
                       iterations=1, use_softmask=False)
    print('Y shape', Y.shape)

    Y = np.squeeze(Y[..., 1]).T
    print('Y shape', Y.shape)
    y = librosa.istft(Y, hop_length=hop, win_length=n_fft)

    return y


if __name__ == '__main__':
    mix, _ = librosa.load(sys.argv[1], sr=sr, mono=mono, duration=30)
    src, _ = librosa.load(sys.argv[2], sr=sr, mono=mono, duration=30)
    print('Input shape:', mix.shape)
    output = separate(mix, src)
    print('Output shape:', output.shape)
    wavfile.write('out.wav', sr, to_short(output))

Thank you in advance.

MATLAB/Octave equivalent

Hello - do you know if any implementations of norbert (or multichannel Wiener filters) exist for MATLAB/Octave? Or perhaps similar libraries?

Thanks.

JPG quality tests for single and multichannel

Running the jpg experiment on the musdb now to find the best parameters for jpg quality for both single channel and multichannel

Is there a way to implement Wiener filter in real-time?

Thank you for this great project. I was wondering if there was a way to implement Wiener filters in real-time? In this case, nb_frames=1. I am using the Open-Unmix model, but I don't get a meaningful output when I set wiener window length to 1. Thank you.

add binary masking

just for the sake of completeness.

@aliutkus should we add this?

include script to reproduce the musmag dataset

it's currently part of open-unmix but I think it should go into norbert...

Is there a reference to this implementation?

Is this the same MWF technique as described here?

support for more than 2 channels

just in case someone wants to put 3 or 4 (RGBA) signals into an image

Softmask always returns multichannel output

norbert.softmask allows to compute the ratio mask for mono mixtures without giving a channel axis. Why, in this case, is the channel axis required for the mixture?

I would propose to not allow to omit the channel axis at all. This makes things easier to handle.

Publishing Conda package on conda-forge

Hello, I just open this issue to inform you that I created and published a Conda package on conda-forge package.

The associated publishing PR is available at conda-forge/staged-recipes#9996 and a dedicated feedstock repository has been created at https://github.com/conda-forge/norbert-feedstock.

If you want to be added as maintainer please tell me :).

Regard

Store peak amplitude value in EXIF part of image?

This is just an idea but it would be great if the absolute peak value of an audio if it is processed with norbert would not be lost.

I just did an experiment where I run encode-decode.py but now I cannot directly compute the difference (in order to see the compression artefacts) as the scale is different.

Maybe we could store this information in the EXIF part of the JPG? There is e.g. a UserComment field.

@faroit What do you think?

Numerical Stability

have we tried to fix the numerical instability by just do an elementwise minimum like...

filter = np.minimum(filter, V)

add wiener filters

as the baseline is evolving quickly now, I think its time to add the filtering modules.

what would go into norbert: IRM, IBM, and MWF (=default)?

@aliutkus Should we just copy this over from the oracles or do you want to make some adjustments so that the code is better suited for educational purposes?

Add License

unit tests

Residual doesn't take single channel inputs

Doesn't residual support fewer channels than the mix?
Thats how I read the docs, see here.
probably some broadcasting bug.

add graphs/models

combine or chain existing modules into a graph so that we can offer it for users:

basically this

# complex spectrogram
Xc = tf.transform(audio)
# limit spectrogram to 16Khz
Xl = bw.downsample(Xc)
# log scale
Xs = ls.scale(Xl)
# quantize to 8bit
Xq = qt.quantize(Xs)
# write as jpg image
im.encode(Xq, "quantized_image.jpg", user_comment_dict={'max': ls.max})

should be replace by

m = norbert.Mag()
Xq = m.compress(audio)

Should softmask have optional power?

Librosa computes

M = X**power / (X**power + X_ref**power)

I think that makes sense but of course, it may interfere with the EM algorithm

Finalize SIgnal-Flow

As Norbert is getting more mature and I want to add the unit tests soon, it would be great if we can finalize the actual user handling.

Also maybe we ask some users that are inexperienced with audio to try out the package...

Support multichannel data

Currently, using examples/encode-decode.py on a stereo file throws an exception:

$ python encode-decode.py ../../../../9_GoodMaterialForDemos/JenniferWarnes_RockYouGently.wav 
Traceback (most recent call last):
  File "encode-decode.py", line 34, in <module>
    im.encode(Xq, "quantized_image.jpg", user_comment_dict={'max': ls.max})
  File "build/bdist.linux-x86_64/egg/norbert/image.py", line 37, in encode
  File "/speech/misc/com/software/AnacondaPython2_5.1_PyTorch0.4/lib/python2.7/site-packages/PIL/Image.py", line 2436, in fromarray
    raise ValueError("Too many dimensions: %d > %d." % (ndim, ndmax))
ValueError: Too many dimensions: 3 > 2.

I will add an assert that checks for single-channel only in a MR. It would be great if norbert could deal with multichannel data. The data layout could be channel id x feature id x frame id.

wiener filter memory consumption

niter > 0 uses a lot of RAM to filter the signal when the mixture is of long duration. Can we provide a processing that computes and applies the filter in chunks?

Add proper documentation and docstrings

add sphinx requirements
add proper docstrings to functions
generate outputs and demo plots

Redundant sum?

Was it supposed to be v_g instead of v? Otherwise the sum is operation is redundant.

https://github.com/sigsep/norbert/blob/master/norbert/contrib.py#L73