diegoleon96 / neural-speech-dereverberation Goto Github PK

View Code? Open in Web Editor NEW

98.0 2.0 19.0 14.54 MB

Machine and Deep Learning models for speech dereverberation

License: GNU General Public License v3.0

Python 100.00%

speech dereverberation speech-enhancement

neural-speech-dereverberation's Introduction

Neural-Speech-Dereverberation

Machine and Deep Learning models for speech dereverberation

Data

LibriSpeech for speech audio files [1]. Available: https://www.openslr.org/12
Omni and MARDY dataset for Room Impulse Responses (RIRs) [2, 3]. Available: http://isophonics.org/content/room-impulse-response-data-set and https://www.commsp.ee.ic.ac.uk/~sap/resources/mardy-multichannel-acoustic-reverberation-database-at-york-database/
BUT Speech@FIT Reverb Database for retransmitted data [4]. Available: https://speech.fit.vutbr.cz/software/but-speech-fit-reverb-database

Models

MLP and LSTM with "Context Window"
Late Reverberation Supression LSTM [5]
FD-NDLP (WPE + frequency domain) [6]. Implementation taken from https://github.com/helianvine/fdndlp
U-net for speech dereverberation [7]. U-net architecture is based on image segmentation, available: https://github.com/milesial/Pytorch-UNet
Late Reverberation Supression U-net (proposed method, based on [5, 7] ideas)
GAN training with U-net generator [7]

Speech Enhancement Example with U-net generator:

Metrics

Perceptual Evaluation of Speech Quality (PESQ)
Cepstral Distorsion (CD)
Log Likelihood Ratio (LLR)
Frequency-Weighted Segmental Signal to Noise Ratio (fwSNRseg)
Speech to Reverberation Modulation Energy Ratio (SRMR)

Python implementation is taken from: https://github.com/schmiph2/pysepm

Citing

If you use code or any ideas from here, please cite our publication at arXiv

References

[1] Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, "LibriSpeech: an ASR corpus based on public domain audio books", ICASSP 2015.

[2] R. Stewart and M. Sandler, "Database of omnidirectional and B-format room impulse responses," 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 165-168, doi: 10.1109/ICASSP.2010.5496083.

[3] J. Y. C. Wen, N. D Gaubitch, E. a. P. Habets, T. Myatt, and P. a. Naylor, "Evaluation of Speech Dereverberation Algorithms using the MARDY Database," Proc. Intl. Workshop Acoust. Echo Noise Control (IWAENC)}, pp. 12-15, 2006.

[4] I. Szöke, M. Skácel, L. Mošner, J. Paliesek and J. Černocký, ''Building and evaluation of a real room impulse response dataset'', in IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 4, pp. 863-876, Aug. 2019, doi: 10.1109/JSTSP.2019.2917582.

[5] Yan Zhao, Deliang Wang, Buye Xu y Tao Zhang, ''Late Reverberation Supression using Recurrent Neural Networks with Long Short-Term Memory''. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.

[6] T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi and B. Juang, "Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1717-1731, Sept. 2010, doi: 10.1109/TASL.2010.2052251.

[7] Ori Ernst, Shlomo E. Chazan, Sharon Gannot and Jacob Goldberger, "Speech Dereverberation Using Fully Convolutional Networks". Faculty of Engineering, Bar-Ilan University, 3 Apr, 2019.

neural-speech-dereverberation's People

Contributors

Stargazers

Watchers

Forkers

ishine maxmax2016 road2018 xuanphu108 kuper7 atuxhe vkothapally mxe191 simpleishappy ksatiitb pzhang266 runngezhang wenzheliu-speech techthiyanes okrio nzpeng normonisping devdeep-j-s runngezhang-jx

neural-speech-dereverberation's Issues

MARDY no longer available at provided link

I guess I should raise this issue with the original authors of MARDY and not here,
but the download link to http://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
as provided at https://www.commsp.ee.ic.ac.uk/~sap/resources/mardy-multichannel-acoustic-reverberation-database-at-york-database/ and mentioned here in the README, does no longer work:

$ wget --no-check-certificate http://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
...
--2021-08-17 09:21:39--  http://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
Resolving cspserver5.ee.ic.ac.uk (cspserver5.ee.ic.ac.uk)... 146.179.44.193
Connecting to cspserver5.ee.ic.ac.uk (cspserver5.ee.ic.ac.uk)|146.179.44.193|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar [following]
--2021-08-17 09:21:39--  https://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
Connecting to cspserver5.ee.ic.ac.uk (cspserver5.ee.ic.ac.uk)|146.179.44.193|:443... connected.
WARNING: no certificate subject alternative name matches
        requested host name ‘cspserver5.ee.ic.ac.uk’.
HTTP request sent, awaiting response... 404 Not Found
2021-08-17 09:21:39 ERROR 404: Not Found.

The wav audio have artifacts and clipping after the wave reconstruction

Hello, I'm a student, and a beginner of neural speech dereverberation. Thank you very much for the code, it really helped me a lot.

I'm running the LSTM model, and find out that the reconstructed wav from log mel spectrogram is not perfect. The reconstructed audio have artifact and it's sound quality become poor. So the dereverberant audio's PESQ and other evaluate parameters become even worse than the reverberant speech.

Do I need to change some argument in the mel spectrom to solve this problem? Or is there some other reason?

It will be realyy helpful if somebody could help me with this question. My email address is [email protected] , looking forward to hearing from you.

Thanks and Regards

Data Preparation Notebook doesn't download MARDY data

I noticed that when I run the Generate_spectrograms notebook it fails because there are no files in the MARDY folder.

To fix this I added the following to the top of the notebook:

!sudo apt-get install unrar
!wget http://cspserver5.ee.ic.ac.uk/~sap/uploads/data/MARDY.rar
!mkdir data_espec
!mkdir data_espec/MARDY
!unrar e -y MARDY.rar data_espec/MARDY

Regarding reconstructing waveforms from normalized predicted spectrograms

Hello, I went through your code for Speech dereverberation, and I find it really useful and helpful for a project I'm working on, thanks a ton for that!

I had one doubt though, I have seen that your predicted audio looks clean in spectrograms, but I can't find code to convert these predicted normalized spectrograms back into audio waveforms. I see a utils function called reconstruct_wave but that seems to be for unnormalized spectrogram inversion.

Since you send in normalized spectrograms as your input and output to train your model, I'm guessing the predicted spectrograms while evaluation would be normalized too. So in that case, how do I un-normalize these predicted spectrograms, and then invert them? Or am I missing something obvious in these inversions?

If you could help me with this, it'll be really helpful for my project. Please reach out to me at [email protected] or just answer here, if you are happy clearing my doubt!

Looking forward to hearing from you

Thanks and Regards

diegoleon96 / neural-speech-dereverberation Goto Github PK

neural-speech-dereverberation's Introduction

Neural-Speech-Dereverberation

Data

Models

Metrics

Citing

References

neural-speech-dereverberation's People

Contributors

Stargazers

Watchers

Forkers

neural-speech-dereverberation's Issues

MARDY no longer available at provided link

The wav audio have artifacts and clipping after the wave reconstruction

Data Preparation Notebook doesn't download MARDY data

Regarding reconstructing waveforms from normalized predicted spectrograms

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent