TSTNN
This is an official PyTorch implementation of paper "TSTNN: Two-Stage Transformer based Neural Network for Speech Enhancement in Time Domain", which has been accepted by ICASSP 2021. More details will be showed soon!
transformer based neural network for speech enhancement in time domain
This is an official PyTorch implementation of paper "TSTNN: Two-Stage Transformer based Neural Network for Speech Enhancement in Time Domain", which has been accepted by ICASSP 2021. More details will be showed soon!
file_name = 'psquare_17.5'
test_file_list_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/voice_bank/Transformer/v5/test_file_break' + '/' + file_name
audio_file_save = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/voice_bank/Transformer/v5/audio_file' + '/' + 'enhanced_' + file_name
这里file_name以及test_file_list_path不太懂是什么意思诶
Hello,
Kudos for your work. We would like to have a pre-trained model which can reproduce the results quoted in the paper. Thanks in advance.
Question 1:
In following codes:
train_clean_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/trainset/clean_trainset'
train_noisy_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/trainset/noisy_trainset'
train_mix_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank_mix/trainset'
I know that train_clean_path‘s input is clean trainset,but i don’t know what‘s the input of train_noisy_path? Is it only the noise or the mix of noise and clean trainset? I also want to know what`s the input of train_mix_path if it's possiable.
Question 2:
Could you please share the trained weights with me?
Looking forward to your reply.Thank you !
您好,想问一下如果用的数据集不是您所用的那个的话,日志文件以及路径要怎么写呢
请问您这套模型的实时性怎么样呢?
Hello, I want to ask what's the input of the test.py ? .wav ?
您好,想请问一下您的模型大小大概是多大呢?谢谢
Lines 62 to 64 in 3a1dac4
Hi, I follow the code and try to generate the training data by "gen_pair.py",
but there is a file missing that cannot go further. Is there something I missed in the preprocessing ?
Thanks !
Can you share the download link of your dataset?
In the TSTNN paper, the two-stage transformer block passes the input into a local transformer followed by a global transformer. This means that the sequence length (in here) for the local transformer must be the frame length, and that for the global transformer must be the number of frames. This makes sense in Dual_Transformer that dim1
must be input.shape[3]
, but I'm confused with that of the DPTNet implementation, wherein their dim1
is input.shape[2]
. In their implementation, aren't they passing in the global transformer first before the local transformer? I'm just confused with how to implement a local transformer and a global transformer separately.
I try to reproduce the TSTNN network. The data set is 50 hours, and each epoch is iterated 12000 times. The training is very slow. An epoch requires about 8 hours of training. I wonder if it is the same when you train?
Hi, I have already read your paper and it is definitely an innovative paper. And I've also been working on the Demucs modelu recently and you also compared TSTNN with Demucs in your experiments. The Facebook AI lab has descripted taht the casual Demucs with a lookahead time of 37 ms and it can run real time on the specified cpu in their paper. So I have a question about your TSTNN, have you compute or estimate the time delay of the model or does it sastify the real time running?
Hopefully to received your reply and we can communicate with each other!
Dear Mr. Wang,
I tried to replicate TSTNN: TWO-STAGE TRANSFORMER BASED NEURAL NETWORK FOR SPEECH ENHANCEMENT IN THE TIME DOMAIN.[Kai Wang, Bengbeng He, Wei-Ping Zhu], but there exists some problems.
I used gen_pair.py
to generate the pairs from the dataset with 28 speakers, and I didn't change any other codes in this repository. However after 150 epoches on RTX3090 with batch_size 2, the results of the best-model in test.py
showed worse performance compared to that in the dissertation. The results are as follows.
Result1// loss_time + 0.6 * loss_freq
REPLICATING | ORIGINAL(IN DISSERTATION) | |
---|---|---|
STOI: | 0.9450 | 0.95 |
SSNR: | 9.2971 | 9.70 |
PESQ: | 2.8254 | 2.96 |
CSIG: | 4.2356 | 4.33 |
CBAK: | 3.4413 | 3.53 |
COVL: | 3.5462 | 3.67 |
Result2 //loss = 0.8 * loss_time + 0.2 * loss_freq
REPLICATING | ORIGINAL(IN DISSERTATION) | |
---|---|---|
STOI: | 0.9478 | 0.95 |
SSNR: | 9.4876 | 9.70 |
PESQ: | 2.9128 | 2.96 |
CSIG: | 4.2624 | 4.33 |
CBAK: | 3.4989 | 3.53 |
COVL: | 3.6072 | 3.67 |
Do you have any advice/reference on how to approach the best performance?
Also I'm confused about the parameter of loss_time. Should it be 'loss = 0.4 * loss_time + 0.6 * loss_freq' or 'loss = 0.8 * loss_time + 0.2 * loss_freq'?
Best regards
想问问学长跑代码是怎么跑的呢, 能告诉我个步骤吗
Hello, these unorganized files are very unfriendly to me. I want to reproduce your experimental results. Can you write the purpose of these files in readme.txt. thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.