Code Monkey home page Code Monkey logo

nunet-tls's Introduction

Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections

This is a repo of the paper "Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections", which is accepted to INTERSPEECH2022.

Abstract:Capturing the contextual information in multi-scale is known to be beneficial for improving the performance of DNN-based speech enhancement (SE) models. This paper proposes a new SE model, called NUNet-TLS, having two-level skip connections between the residual U-Blocks nested in each layer of a large U-Net structure. The proposed model also has a causal time-frequency attention (CTFA) at the output of the residual U-Block to boost dynamic representation of the speech context in multi-scale. Even having the two-level skip connections, the proposed model slightly increases the network parameters, but the performance improvement is significant. Experimental results show that the proposed NUNet-TLS has superior performance in various objective evaluation metrics to other state-of-the-art models.

Requirements

This repo is tested on Ubuntu 20.04.

# for train
python == 3.7.9   
pytorch == 1.9.0_cu111   
scipy == 1.6.0      
soundfile == 0.10.3  
# for evaluation
tensorboard == 2.7.0   
pesq == 0.0.2       
pystoi == 0.3.3       
matplotlib == 3.3.3      

Getting started

  1. Install the necessary libraries.
  2. Set directory paths for your dataset. (config.py)
# dataset path
noisy_dirs_for_train = '../Dataset/train/noisy/'   
clean_dirs_for_train = '../Dataset/train/clean/'   
noisy_dirs_for_valid = '../Dataset/valid/noisy/'   
clean_dirs_for_valid = '../Dataset/valid/clean/'   
  • You need to modify the find_pair function in tools.py according to the data file name you have.
  • And if you need to adjust any parameter settings, you can simply change them.
  1. Run train_interface.py

Results

Demo

We randomly select one sample for demonstration at 10 dB SNR.

1_Clean.mov
1_Noisy.mov
1_DCCRN+C.mov
1_FullSubNet.mov
1_SADNUNet.mov
1_NUNet-TLS.mov

References

U2-Net: Going deeper with nested u-structure for salient object detection
X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand
[paper] [code]
A nested u-net with self-attention and dense connectivity for monaural speech enhancement
X. Xiang, X. Zhang, and H. Chen
[paper]
Time-frequency attention for monaural speech enhancement
Q. Zhang, Q. Song, Z. Ni, A. Nicolson, and H. Li
[paper]

nunet-tls's People

Contributors

seorim0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

nunet-tls's Issues

about Data Normalization function: minMaxNorm

Hello,同学你好!

In the last few days, I was trying to change the dataloader from loading .npy huge file to load wav files from folders directly on your DCCRN project. (https://github.com/seorim0/DCCRN-with-various-loss-functions)

Fortunately, I saw a same requset from the Issues (seorim0/DCCRN-with-various-loss-functions#4), which helped me a lot.
Now, I find a function in NUNet-TLS --> tools.py --> minMaxNorm(wav, eps=1e-8), However, is the function result value correct?

tools.py line72-76:
def minMaxNorm(wav, eps=1e-8):
max = np.max(abs(wav))
min = np.min(abs(wav))
wav = (wav - min) / (max - min + eps)
return wav

the wav data should have negtive values, so... should we change the code to like this?

def minMaxNorm(wav, eps=1e-8):
max = np.max(wav)
min = np.min(wav)
wav = (wav - min) / (max - min + eps)
return wav

I mean, should we abandon the abs()?
thank you!!!

The paper

Hello, as a newcomer who has just dabbled in speech enhancement, I really want to read your paper on this program. Excuse me for being stupid, I didn't find your paper on the Internet. Can you share it here?

real_time

Firstly,thanks your work very much, I'm study the TFA module recently. I have some problems in the script. 1. the shape of ZF is [1 D] where the script is [T D] 2. the time_seq is the number that add before input x, so why is this done in real time and how the value (32 in script) of time_seq should be set and what it relates to? Thank you again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.