Code Monkey home page Code Monkey logo

neural-speech-dereverberation's Introduction

Neural-Speech-Dereverberation

Machine and Deep Learning models for speech dereverberation

Data

Models

  • MLP and LSTM with "Context Window"
  • Late Reverberation Supression LSTM [5]
  • FD-NDLP (WPE + frequency domain) [6]. Implementation taken from https://github.com/helianvine/fdndlp
  • U-net for speech dereverberation [7]. U-net architecture is based on image segmentation, available: https://github.com/milesial/Pytorch-UNet
  • Late Reverberation Supression U-net (proposed method, based on [5, 7] ideas)
  • GAN training with U-net generator [7]

Speech Enhancement Example with U-net generator:

Metrics

  • Perceptual Evaluation of Speech Quality (PESQ)
  • Cepstral Distorsion (CD)
  • Log Likelihood Ratio (LLR)
  • Frequency-Weighted Segmental Signal to Noise Ratio (fwSNRseg)
  • Speech to Reverberation Modulation Energy Ratio (SRMR)

Python implementation is taken from: https://github.com/schmiph2/pysepm

Citing

If you use code or any ideas from here, please cite our publication at arXiv

References

[1] Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, "LibriSpeech: an ASR corpus based on public domain audio books", ICASSP 2015.

[2] R. Stewart and M. Sandler, "Database of omnidirectional and B-format room impulse responses," 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 165-168, doi: 10.1109/ICASSP.2010.5496083.

[3] J. Y. C. Wen, N. D Gaubitch, E. a. P. Habets, T. Myatt, and P. a. Naylor, "Evaluation of Speech Dereverberation Algorithms using the MARDY Database," Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC)}, pp. 12-15, 2006.

[4] I. Szöke, M. Skácel, L. Mošner, J. Paliesek and J. Černocký, ''Building and evaluation of a real room impulse response dataset'', in IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 4, pp. 863-876, Aug. 2019, doi: 10.1109/JSTSP.2019.2917582.

[5] Yan Zhao, Deliang Wang, Buye Xu y Tao Zhang, ''Late Reverberation Supression using Recurrent Neural Networks with Long Short-Term Memory''. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.

[6] T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi and B. Juang, "Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1717-1731, Sept. 2010, doi: 10.1109/TASL.2010.2052251.

[7] Ori Ernst, Shlomo E. Chazan, Sharon Gannot and Jacob Goldberger, "Speech Dereverberation Using Fully Convolutional Networks". Faculty of Engineering, Bar-Ilan University, 3 Apr, 2019.

neural-speech-dereverberation's People

Contributors

diegoleon96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

neural-speech-dereverberation's Issues

MARDY no longer available at provided link

I guess I should raise this issue with the original authors of MARDY and not here,
but the download link to http://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
as provided at https://www.commsp.ee.ic.ac.uk/~sap/resources/mardy-multichannel-acoustic-reverberation-database-at-york-database/ and mentioned here in the README, does no longer work:

$ wget --no-check-certificate http://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
...
--2021-08-17 09:21:39--  http://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
Resolving cspserver5.ee.ic.ac.uk (cspserver5.ee.ic.ac.uk)... 146.179.44.193
Connecting to cspserver5.ee.ic.ac.uk (cspserver5.ee.ic.ac.uk)|146.179.44.193|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar [following]
--2021-08-17 09:21:39--  https://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
Connecting to cspserver5.ee.ic.ac.uk (cspserver5.ee.ic.ac.uk)|146.179.44.193|:443... connected.
WARNING: no certificate subject alternative name matches
        requested host name ‘cspserver5.ee.ic.ac.uk’.
HTTP request sent, awaiting response... 404 Not Found
2021-08-17 09:21:39 ERROR 404: Not Found.

The wav audio have artifacts and clipping after the wave reconstruction

Hello, I'm a student, and a beginner of neural speech dereverberation. Thank you very much for the code, it really helped me a lot.

I'm running the LSTM model, and find out that the reconstructed wav from log mel spectrogram is not perfect. The reconstructed audio have artifact and it's sound quality become poor. So the dereverberant audio's PESQ and other evaluate parameters become even worse than the reverberant speech.

Do I need to change some argument in the mel spectrom to solve this problem? Or is there some other reason?

It will be realyy helpful if somebody could help me with this question. My email address is [email protected] , looking forward to hearing from you.

Thanks and Regards

Data Preparation Notebook doesn't download MARDY data

I noticed that when I run the Generate_spectrograms notebook it fails because there are no files in the MARDY folder.

To fix this I added the following to the top of the notebook:

!sudo apt-get install unrar
!wget http://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
!mkdir data_espec
!mkdir data_espec/MARDY
!unrar e -y MARDY.rar data_espec/MARDY

Regarding reconstructing waveforms from normalized predicted spectrograms

Hello, I went through your code for Speech dereverberation, and I find it really useful and helpful for a project I'm working on, thanks a ton for that!

I had one doubt though, I have seen that your predicted audio looks clean in spectrograms, but I can't find code to convert these predicted normalized spectrograms back into audio waveforms. I see a utils function called reconstruct_wave but that seems to be for unnormalized spectrogram inversion.

Since you send in normalized spectrograms as your input and output to train your model, I'm guessing the predicted spectrograms while evaluation would be normalized too. So in that case, how do I un-normalize these predicted spectrograms, and then invert them? Or am I missing something obvious in these inversions?

If you could help me with this, it'll be really helpful for my project. Please reach out to me at [email protected] or just answer here, if you are happy clearing my doubt!

Looking forward to hearing from you

Thanks and Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.