Code Monkey home page Code Monkey logo

norbert's Introduction

Norbert

Build Status Latest Version Supported Python versions DOI

Norbert is an implementation of multichannel Wiener filter, that is a very popular way of filtering multichannel audio for several applications, notably speech enhancement and source separation.

This filtering method assumes you have some way of estimating power or magnitude spectrograms for all the audio sources (non-negative) composing a mixture. If you only have a model for some target sources, and not for the rest, you may use norbert.residual_model to let Norbert create a residual model for you.

Given all source spectrograms and the mixture Time-Frequency representation, this repository can build and apply the filter that is appropriate for separation, by optimally exploiting multichannel information (like in stereo signals). This is done in an iterative procedure called Expectation Maximization, where filtering and re-estimation of the parameters are iterated.

From a beginner's perspective, all you need to do is often to call norbert.wiener with the mix and your spectrogram estimates. This should handle the rest.

From a more expert perspective, you will find the different ingredients from the EM algorithm as functions in the module as described in the API documentation

Installation

pip install norbert

Usage

Asssuming a complex spectrogram X, and a (magnitude) estimate of a target V to be extracted from the spectrogram, performing the multichannel wiener filter is as simple as this:

X = stft(audio)
V = model(X)
Y = norbert.wiener(V, X)
estimate = istft(Y)

How to contribute

norbert is a community focused project, we therefore encourage the community to submit bug-fixes and requests for technical support through github issues. For more details of how to contribute, please follow our CONTRIBUTING.md.

Authors

Antoine Liutkus, Fabian-Robert Stöter

Citation

If you want to cite the Norbert software package, please use the DOI from Zenodo:

DOI

License

MIT

norbert's People

Contributors

aliutkus avatar faroit avatar stefanuhlich-sony avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

norbert's Issues

Problem using the residual model

I think that I followed the documentation closely, but I'm failing to get some decent results with the residual_model + wiener.
Basically I took a background music track, and added another source using sox -m. Then I'm feeding the mix and the background track to my script and expect to extract the added source.
This is a script I'm using for separation, but it doesn't seem to work well.
Please advise...

#!/usr/bin/env python3
import sys
import librosa
import norbert
import numpy as np
from scipy.io import wavfile

sr = 48000
n_fft = 4096
hop = 512
mono = True


def to_short(y):
    norm = np.abs(y).max()
    y /= norm
    return (y * 32767).astype(np.int16)


def separate(mix, src):
    X = librosa.stft(mix, n_fft=n_fft, hop_length=hop, center=False).T
    V = librosa.stft(src, n_fft=n_fft, hop_length=hop, center=False).T

    # 1 channel dimension
    X = np.expand_dims(X, -1)
    # 1 channel dimension
    V = np.expand_dims(V, -1)
    # 1 source dimension
    V = np.expand_dims(V, -1)

    print('X shape', X.shape)
    print('V shape', V.shape)

    # v: (nb_frames, nb_bins, {1, nb_channels}, nb_sources)
    # x: (nb_frames, nb_bins, nb_channels)
    V = norbert.residual_model(V, X, alpha=1)
    print('V shape (w/residual)', V.shape)
    Y = norbert.wiener(V, X.astype(np.complex128),
                       iterations=1, use_softmask=False)
    print('Y shape', Y.shape)

    Y = np.squeeze(Y[..., 1]).T
    print('Y shape', Y.shape)
    y = librosa.istft(Y, hop_length=hop, win_length=n_fft)

    return y


if __name__ == '__main__':
    mix, _ = librosa.load(sys.argv[1], sr=sr, mono=mono, duration=30)
    src, _ = librosa.load(sys.argv[2], sr=sr, mono=mono, duration=30)
    print('Input shape:', mix.shape)
    output = separate(mix, src)
    print('Output shape:', output.shape)
    wavfile.write('out.wav', sr, to_short(output))

Thank you in advance.

MATLAB/Octave equivalent

Hello - do you know if any implementations of norbert (or multichannel Wiener filters) exist for MATLAB/Octave? Or perhaps similar libraries?

Thanks.

Is there a way to implement Wiener filter in real-time?

Thank you for this great project. I was wondering if there was a way to implement Wiener filters in real-time? In this case, nb_frames=1. I am using the Open-Unmix model, but I don't get a meaningful output when I set wiener window length to 1. Thank you.

Softmask always returns multichannel output

norbert.softmask allows to compute the ratio mask for mono mixtures without giving a channel axis. Why, in this case, is the channel axis required for the mixture?

I would propose to not allow to omit the channel axis at all. This makes things easier to handle.

Store peak amplitude value in EXIF part of image?

This is just an idea but it would be great if the absolute peak value of an audio if it is processed with norbert would not be lost.

I just did an experiment where I run encode-decode.py but now I cannot directly compute the difference (in order to see the compression artefacts) as the scale is different.

Maybe we could store this information in the EXIF part of the JPG? There is e.g. a UserComment field.

@faroit What do you think?

Numerical Stability

have we tried to fix the numerical instability by just do an elementwise minimum like...

filter = np.minimum(filter, V)

?

add wiener filters

as the baseline is evolving quickly now, I think its time to add the filtering modules.

what would go into norbert: IRM, IBM, and MWF (=default)?

@aliutkus Should we just copy this over from the oracles or do you want to make some adjustments so that the code is better suited for educational purposes?

unit tests

  • transform import TF
  • BandwidthLimiter
  • LogScaler
  • Quantizer
  • ImageEncoder

add graphs/models

combine or chain existing modules into a graph so that we can offer it for users:

basically this

# complex spectrogram
Xc = tf.transform(audio)
# limit spectrogram to 16Khz
Xl = bw.downsample(Xc)
# log scale
Xs = ls.scale(Xl)
# quantize to 8bit
Xq = qt.quantize(Xs)
# write as jpg image
im.encode(Xq, "quantized_image.jpg", user_comment_dict={'max': ls.max})

should be replace by

m = norbert.Mag()
Xq = m.compress(audio)

Finalize SIgnal-Flow

As Norbert is getting more mature and I want to add the unit tests soon, it would be great if we can finalize the actual user handling.

Also maybe we ask some users that are inexperienced with audio to try out the package...

Support multichannel data

Currently, using examples/encode-decode.py on a stereo file throws an exception:

$ python encode-decode.py ../../../../9_GoodMaterialForDemos/JenniferWarnes_RockYouGently.wav 
Traceback (most recent call last):
  File "encode-decode.py", line 34, in <module>
    im.encode(Xq, "quantized_image.jpg", user_comment_dict={'max': ls.max})
  File "build/bdist.linux-x86_64/egg/norbert/image.py", line 37, in encode
  File "/speech/misc/com/software/AnacondaPython2_5.1_PyTorch0.4/lib/python2.7/site-packages/PIL/Image.py", line 2436, in fromarray
    raise ValueError("Too many dimensions: %d > %d." % (ndim, ndmax))
ValueError: Too many dimensions: 3 > 2.

I will add an assert that checks for single-channel only in a MR. It would be great if norbert could deal with multichannel data. The data layout could be channel id x feature id x frame id.

wiener filter memory consumption

niter > 0 uses a lot of RAM to filter the signal when the mixture is of long duration. Can we provide a processing that computes and applies the filter in chunks?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.