Code Monkey home page Code Monkey logo

gtcrn's Introduction

GTCRN

This repository is the official implementation of the ICASSP2024 paper: GTCRN: A Speech Enhancement Model Requiring Ultralow Computational Resources.

Audio examples are available at Audio examples of GTCRN.

About GTCRN

Grouped Temporal Convolutional Recurrent Network (GTCRN) is a speech enhancement model requiring ultralow computational resources, featuring only 23.7 K parameters and 33.0 MMACs per second. Experimental results show that our proposed model not only surpasses RNNoise, a typical lightweight model with similar computational burden, but also achieves competitive performance when compared to recent baseline models with significantly higher computational resources requirements.

Note:

  • Although the complexity mentioned in the paper is 39.6 MMACs per second, we find that it can be further reduced to 33.0 MMACs per second. This reduction can be achieved by modifying only the ERB module, specifically by substituting the invariant mapping from linear bands to ERB bands in the low-frequency dimension, from matrix multiplication to simple concatenation.
  • The explicit feature rearrangement layer in the grouped RNN, which is implemented by feature shuffle, can result in an unstreamable model. Therefore, we discard it and implicitly achieve feature rearrangement through the following FC layer in the DPGRNN.

Performance

Experiments show that GTCRN not only outperforms RNNoise by a substantial margin on the VCTK-DEMAND and DNS3 dataset, but also achieves competitive performance compared to several baseline models with significantly higher computational overhead.

Table 1: Performance on VCTK-DEMAND test set

Para. (M) MACs (G/s) SISNR PESQ STOI
Noisy - - 8.45 1.97 0.921
RNNoise (2018) 0.06 0.04 - 2.29 -
PercepNet (2020) 8.00 0.80 - 2.73 -
DeepFilterNet (2022) 1.80 0.35 16.63 2.81 0.942
S-DCCRN (2022) 2.34 - - 2.84 0.940
GTCRN (proposed) 0.02 0.04 18.83 2.87 0.940

Table 2: Performance on DNS3 blind test set.

Para. (M) MACs (G/s) DNSMOS-P.808 BAK SIG OVRL
Noisy - - 2.96 2.65 3.20 2.33
RNNoise (2018) 0.06 0.04 3.15 3.45 3.00 2.53
S-DCCRN (2022) 2.34 - 3.43 - - -
GTCRN (proposed) 0.02 0.04 3.44 3.90 3.00 2.70

Pre-trained Models

Pre-trained models are provided in checkpoints folder, which were trained on DNS3 and VCTK-DEMAND datasets, respectively.

The inference procedure is presented in infer.py.

Streaming Inference

A streaming GTCRN is provided in stream folder, which demonstrates an impressive real-time factor (RTF) of 0.07 on the 12th Gen Intel(R) Core(TM) i5-12400 CPU @ 2.50 GHz.

Related Repositories

SEtrain: A training code template for DNN-based speech enhancement.

TRT-SE: An example of how to convert a speech enhancement model into a streaming format and deploy it using ONNX or TensorRT.

gtcrn's People

Contributors

xiaobin-rong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar Pablo Perez avatar  avatar MichaelChen avatar PlatformKit avatar  avatar Yongyi Zang avatar  avatar  avatar wblgers avatar Iver Jordal avatar Wendong Gan avatar  avatar  avatar XiaoLeiLiu  avatar Jeffrey Wang avatar BrownsugarZeer avatar 张康豪 avatar xucan avatar  avatar aaronchen avatar  avatar Shirunwu avatar  avatar Jiang Wenbin avatar  avatar Jing-Yi Li avatar HAESUNG JEON (chad.plus) avatar  avatar  avatar  avatar JIJIN CHEN avatar  avatar  avatar  avatar Than Lwin Aung avatar roycheng avatar MelonJack avatar  avatar Lan avatar  avatar  avatar  avatar  avatar DonkeyDDDDD avatar  avatar Derek McNeil avatar  avatar Vladislav Skripniuk avatar para avatar  avatar eagle avatar Valentin Ackva avatar  avatar  avatar Onur Babacan avatar Alef Iury avatar XK avatar  avatar  avatar  avatar ChenYD avatar Nickolay V. Shmyrev avatar owlwang avatar Haitao avatar  avatar kk avatar redust avatar  avatar  avatar Ryuk avatar  avatar Oleg avatar tingweichen avatar Shimin Zhang avatar wendong avatar Tim avatar  avatar  avatar  avatar oucxlw avatar LI NAN avatar sagit avatar Aquira  avatar Sofian Mejjoute avatar  avatar DS.Xu avatar MaxMax avatar carwin avatar  avatar  avatar  avatar zhuoy avatar

Watchers

Nickolay V. Shmyrev avatar didadida avatar Wendong Gan avatar  avatar

gtcrn's Issues

请问您排查到onnxsim.simplify导出时的问题所在了吗

首先非常感谢您的工作,我这里使用onnxsim.simplify导出时会报这个错
RuntimeError: /project/third_party/onnx-optimizer/onnxoptimizer/passes/eliminate_shape_gather.h:48: runTransform: Assertion 'indices_val < dims.size()' failed.
我和DeepVQE对比后排查了如下的地方:
chunk改切片
转置卷积去掉
groups去掉
空洞卷积去掉
layernorm去掉
均没有排查到原因,还是报上述的错
请问您找到问题在哪了吗

LIcense?

Hi, what license is this code released under?

提问,关于SFE模块使用nn.unfold的目的。

您好,恭喜您完成了一个非常棒的工作并且被ICASSP 2024 收录。关于Subband Feature Extraction这一块中,您使用了nn.Unfold的操作,我想请教一下这一块的设计**和作用是什么?感谢您的回答!

Loss gets nan

I use the loss function in loss.py to train my network, but I get nan in some epoch, how can i fix this problem?

48k_training

你好!多谢你的工作及开源!如果我想使用48k采样率,需要对模型代码进行修改吗,还是只需要改变输入数据即可。
祝您工作顺利!

对比模型的计算量问题

请问README Tabel1 跟Tabel2 中的RNNoise 的MACs 是怎么统计的呢? 单计算网络部分的话,不会有0.04(G/s) 这么大的, 我按你代码中的统计方式,如果按帧长512,帧移256 计算出来的Macs 为5.53M (约0.0055 G/s), 如果按帧长320, 帧移160计算出来的Macs为8.74M (约0.0087 G/s)

请问,模型低信噪比场景听感难受问题

晓彬,您好:
非常感谢您的开源精神,我正在使用您开源的训练框架 SEtrain。
我发现训练出来的小参数网络在低信噪比场景下会出现降噪不干净,人声听感难受的问题。在尝试更换网络结构、更换mask、单独使用mse或sisnr均无明显改善,并且减小参数时问题更加明显。
您开源的本模型在处理低信噪比音频时同样会出现降噪不干净,听感难受的问题。您能否提出建议帮助解决这一问题。
期待回复,感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.