Last update time: 07-20-2022
Hi, I'm Shimin Zhang (张是民)
- 📕 Research interests: Speech Enhacement (including acoustic echo cancellation, noise suppression, target speaker extration)
- 📫 How to reach me: [email protected]
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement
License: MIT License
Last update time: 07-20-2022
Hi, I'm Shimin Zhang (张是民)
thanks for your code, there is a problem still confuse me, the input of the u-net structure is the magnitude after the phase encoder, but the output of the u-net have two-stage mask, one is magnitude mask, the other is phase mask and magnitude mask, I am confusing that there is no phase information input to the u-net structure, how can it get the correct phase mask? or after phase encoder, although the output is magnitude, but it includes phase information?
您好!首先非常感谢您贡献的项目代码,这为我学习AEC提供了很大的帮助!由于我刚接触这个领域,所以在阅读您的代码时有些地方不太明白想向您请教一下。如果您在百忙之中能抽出时间回复,我将感激不尽!
第一个问题是,您的代码中的sig有一句注释:“sigs: list [B N] of len(sigs)”这里的B和N分别是指什么呢?是指0-8khz,8-16khz,16-48khz 3条带宽吗?N或许是音频的长度?之前的issue中有位大佬提到3个通道,指的是声道吗?
第二个问题是,模型代码中有一句注释 # D / E ? 是指判断是Deep noisy supression 或 Echo cancellation吗?
期待得到您的热心回答!
论文中MTFAA-Net-Streaming的lookahead有40ms,我没有在代码中找到具体的体现,论文的帧移是8ms,应该是用了5帧的未来帧,在哪里体现呢?
Hi,
How can I train a model with my dataset. Where can I find sample usage for training the model?
Thank you
Hi! Iwonder if we can reduce nerb which is currently using 256. I think if it has less number of bands it can reduce IOPS.
based on Baidu paper, I guess they are using ERB Bands applied in [10].
As I noticed in PercepNet paper(A Perceptually-Motivated Approach for Low-Complexity, Real-Time
Enhancement of Fullband Speech) they are using 34 bands with 20khz highest frequency.
Hi, thanks for you great work.
I find that the loss decrease hardly when I train your MTFAA, I dont normalize signals when I calculate loss.
Maybe I should normalize signals like 《Data augmentation and loss normalization for deep noise suppression》,I want to know your way to calculate loss.
Did you test the model's real performance on real AEC data? and what's the FLOPS and parameters?
Hello,
First of all, thank you for proving the implementation. It was very helpful to understand the paper.
I had one question though. When I was trying to train the network using 30-second 48kHz audio, I always run into CUDA out of memory error, even if the batch size is set to 1. Have you seen that in your experiments or do you have any advice maybe?
Anything will be greatly appreciated!
The report is as follows:
File "E:/code_paper/MTFAA-Net-main/erb.py", line 24, in __init__
filter = th.from_numpy(filter).float()
TypeError: expected np.ndarray (got tuple)
The error occurred on line 24 of erb.py.
filter = th.from_numpy(filter).float()
"filter" is a tuple has two members.
My Python version is 3.9.12.
My spafe version is 0.2.0.
My torch version is 1.12.1.
Hope you can teach me.
你好,大佬,感谢开源这么优秀的项目;看到论文中说“LAEC中引入附加的条件信息,可以进一步提高模型对回声任务的性能。但是,如果将LAEC与模型简单地连接在一起,由于LAEC引入的失真会降低系统的性能”,有几个问题想请教下呢:
1、这个模型输入是三个吧,混合音频、LACE数据、远端数据
2、这个LAEC的输出要经过的附加条件是指什么呢,不太理解这个,LAEC与模型直接连接,是不是就是将LAEC的输出直接和混合音频、远端数据一起送到模型呢。求大佬指教
Hi @echocatzh
I think there are many people supposed to use your awesome work for both commercial and non-commercial purposes
it would be great for people who use this and of course for you as well if you could add an explicit License
would you be able to add the license file?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.