echocatzh / mtfaa-net Goto Github PK

View Code? Open in Web Editor NEW

169.0 7.0 49.0 23 KB

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

License: MIT License

Python 100.00%

acoustic-echo-cancellation dns-challenge speech-enhancement

mtfaa-net's Introduction

Last update time: 07-20-2022

Hi, I'm Shimin Zhang (张是民)

📕 Research interests: Speech Enhacement (including acoustic echo cancellation, noise suppression, target speaker extration)
📫 How to reach me: [email protected]

Visitor count from 07-20-2022 to the present:

mtfaa-net's People

Contributors

Stargazers

Watchers

Forkers

wendongj pzhang266 jinmingche hbwu-ntu ioyy900205 lupengliu spiralanch miblue119 dangf15 ishine nanless fragrantrookie ai-x-king daiyuuu aaronhsueh0506 noise-suppression jzi040941 zhongshijun speechwatch chengwei-ouyang normonisping johnusher gedebabin yegeli sherryyu33 runngezhang caochenbin wang-asher spxnn p-entol wdwlinda xuanphu108 koobh kodavatimahendra ahyswang nullnan2023 max-3l newoneincntk huqingli ouleiwa mxe191 hniceday zzzzzzxm potato-boys runngezhang-jx poelsen icassp-papers robotseye zheqiushui panhu

mtfaa-net's Issues

question about the network

thanks for your code, there is a problem still confuse me, the input of the u-net structure is the magnitude after the phase encoder, but the output of the u-net have two-stage mask, one is magnitude mask, the other is phase mask and magnitude mask, I am confusing that there is no phase information input to the u-net structure, how can it get the correct phase mask? or after phase encoder, although the output is magnitude, but it includes phase information?

有关代码的一些请教

     您好！首先非常感谢您贡献的项目代码，这为我学习AEC提供了很大的帮助！由于我刚接触这个领域，所以在阅读您的代码时有些地方不太明白想向您请教一下。如果您在百忙之中能抽出时间回复，我将感激不尽！
第一个问题是，您的代码中的sig有一句注释：“sigs: list [B N] of len(sigs)”这里的B和N分别是指什么呢？是指0-8khz,8-16khz,16-48khz 3条带宽吗？N或许是音频的长度？之前的issue中有位大佬提到3个通道，指的是声道吗？
    第二个问题是，模型代码中有一句注释        # D / E ? 是指判断是Deep noisy supression 或 Echo cancellation吗？
期待得到您的热心回答！

关于lookahead的问题

论文中MTFAA-Net-Streaming的lookahead有40ms，我没有在代码中找到具体的体现，论文的帧移是8ms，应该是用了5帧的未来帧，在哪里体现呢？

Training with custom data

Hi,

How can I train a model with my dataset. Where can I find sample usage for training the model?

Thank you

about ERB band filter bank

Hi! Iwonder if we can reduce nerb which is currently using 256. I think if it has less number of bands it can reduce IOPS.

based on Baidu paper, I guess they are using ERB Bands applied in [10].

As I noticed in PercepNet paper(A Perceptually-Motivated Approach for Low-Complexity, Real-Time
Enhancement of Fullband Speech) they are using 34 bands with 20khz highest frequency.

Did you normalize signals when you calculate loss?

Hi, thanks for you great work.
I find that the loss decrease hardly when I train your MTFAA, I dont normalize signals when I calculate loss.
Maybe I should normalize signals like 《Data augmentation and loss normalization for deep noise suppression》，I want to know your way to calculate loss.

real performance

Did you test the model's real performance on real AEC data? and what's the FLOPS and parameters？

CUDA out of memory when using the network to train

Hello,

First of all, thank you for proving the implementation. It was very helpful to understand the paper.

I had one question though. When I was trying to train the network using 30-second 48kHz audio, I always run into CUDA out of memory error, even if the batch size is set to 1. Have you seen that in your experiments or do you have any advice maybe?

Anything will be greatly appreciated!

erb.py reported a error :expected np.ndarray (got tuple)

The report is as follows:

  File "E:/code_paper/MTFAA-Net-main/erb.py", line 24, in __init__
    filter = th.from_numpy(filter).float()
TypeError: expected np.ndarray (got tuple)

The error occurred on line 24 of erb.py.
filter = th.from_numpy(filter).float()
"filter" is a tuple has two members.

My Python version is 3.9.12.
My spafe version is 0.2.0.
My torch version is 1.12.1.

Hope you can teach me.

有关LAEC和模型的连接

你好，大佬，感谢开源这么优秀的项目；看到论文中说“LAEC中引入附加的条件信息，可以进一步提高模型对回声任务的性能。但是，如果将LAEC与模型简单地连接在一起，由于LAEC引入的失真会降低系统的性能”，有几个问题想请教下呢：
1、这个模型输入是三个吧，混合音频、LACE数据、远端数据
2、这个LAEC的输出要经过的附加条件是指什么呢，不太理解这个，LAEC与模型直接连接，是不是就是将LAEC的输出直接和混合音频、远端数据一起送到模型呢。求大佬指教

Lincense

Hi @echocatzh

I think there are many people supposed to use your awesome work for both commercial and non-commercial purposes
it would be great for people who use this and of course for you as well if you could add an explicit License
would you be able to add the license file?