key2miao / tstnn Goto Github PK

transformer based neural network for speech enhancement in time domain

Python 100.00%

tstnn's Introduction

TSTNN

This is an official PyTorch implementation of paper "TSTNN: Two-Stage Transformer based Neural Network for Speech Enhancement in Time Domain", which has been accepted by ICASSP 2021. More details will be showed soon!

tstnn's People

Contributors

Stargazers

Watchers

Forkers

3i-hust-asr youngjay0612 jiaxp3144 zelokuo ishine cunnh3 boson-lv runngezhang amberjouleday frankliu007 arifuzzamanjoy

tstnn's Issues

请问test.py中的输入是什么呢

file_name = 'psquare_17.5'
test_file_list_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/voice_bank/Transformer/v5/test_file_break' + '/' + file_name
audio_file_save = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/voice_bank/Transformer/v5/audio_file' + '/' + 'enhanced_' + file_name
这里file_name以及test_file_list_path不太懂是什么意思诶

Pre-trained model

Hello,
Kudos for your work. We would like to have a pre-trained model which can reproduce the results quoted in the paper. Thanks in advance.

What are the inputs of gan_pair.py?

Question 1:
In following codes:
train_clean_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/trainset/clean_trainset'
train_noisy_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/trainset/noisy_trainset'
train_mix_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank_mix/trainset'

I know that train_clean_path‘s input is clean trainset,but i don’t know what‘s the input of train_noisy_path? Is it only the noise or the mix of noise and clean trainset? I also want to know what`s the input of train_mix_path if it's possiable.
Question 2:
Could you please share the trained weights with me?

Looking forward to your reply.Thank you !

关于日志文件

您好，想问一下如果用的数据集不是您所用的那个的话，日志文件以及路径要怎么写呢

大佬好

请问您这套模型的实时性怎么样呢？

test.py

Hello, I want to ask what's the input of the test.py ? .wav ?

模型大小

您好，想请问一下您的模型大小大概是多大呢？谢谢

File missing - "log_testset.txt"

TSTNN/gen_pair.py

Lines 62 to 64 in 3a1dac4

    
           test_log_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/log/logfiles/log_testset.txt' 
        
           with open(test_log_path, 'r') as test_log_file: 
        
               file_list = [line.split()[0] for line in test_log_file.readlines() if '2.5' in line]

Hi, I follow the code and try to generate the training data by "gen_pair.py",
but there is a file missing that cannot go further. Is there something I missed in the preprocessing ?

Thanks !

dataset link request

Can you share the download link of your dataset？

Input shape for Dual_Transformer

In the TSTNN paper, the two-stage transformer block passes the input into a local transformer followed by a global transformer. This means that the sequence length (in here) for the local transformer must be the frame length, and that for the global transformer must be the number of frames. This makes sense in Dual_Transformer that dim1 must be input.shape[3], but I'm confused with that of the DPTNet implementation, wherein their dim1 is input.shape[2]. In their implementation, aren't they passing in the global transformer first before the local transformer? I'm just confused with how to implement a local transformer and a global transformer separately.

TSTNN network training is very slow

I try to reproduce the TSTNN network. The data set is 50 hours, and each epoch is iterated 12000 times. The training is very slow. An epoch requires about 8 hours of training. I wonder if it is the same when you train?

Time Delay of TSTNN

Hi, I have already read your paper and it is definitely an innovative paper. And I've also been working on the Demucs modelu recently and you also compared TSTNN with Demucs in your experiments. The Facebook AI lab has descripted taht the casual Demucs with a lookahead time of 37 ms and it can run real time on the specified cpu in their paper. So I have a question about your TSTNN, have you compute or estimate the time delay of the model or does it sastify the real time running?
Hopefully to received your reply and we can communicate with each other!

repilcating experiments

Dear Mr. Wang,

I tried to replicate TSTNN: TWO-STAGE TRANSFORMER BASED NEURAL NETWORK FOR SPEECH ENHANCEMENT IN THE TIME DOMAIN.[Kai Wang, Bengbeng He, Wei-Ping Zhu], but there exists some problems.

I used gen_pair.py to generate the pairs from the dataset with 28 speakers, and I didn't change any other codes in this repository. However after 150 epoches on RTX3090 with batch_size 2, the results of the best-model in test.py showed worse performance compared to that in the dissertation. The results are as follows.
Result1// loss_time + 0.6 * loss_freq

	REPLICATING	ORIGINAL(IN DISSERTATION)
STOI:	0.9450	0.95
SSNR:	9.2971	9.70
PESQ:	2.8254	2.96
CSIG:	4.2356	4.33
CBAK:	3.4413	3.53
COVL:	3.5462	3.67

Result2 //loss = 0.8 * loss_time + 0.2 * loss_freq

	REPLICATING	ORIGINAL(IN DISSERTATION)
STOI:	0.9478	0.95
SSNR:	9.4876	9.70
PESQ:	2.9128	2.96
CSIG:	4.2624	4.33
CBAK:	3.4989	3.53
COVL:	3.6072	3.67

Do you have any advice/reference on how to approach the best performance?
Also I'm confused about the parameter of loss_time. Should it be 'loss = 0.4 * loss_time + 0.6 * loss_freq' or 'loss = 0.8 * loss_time + 0.2 * loss_freq'?

Best regards

	test_log_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/log/logfiles/log_testset.txt'
	with open(test_log_path, 'r') as test_log_file:
	file_list = [line.split()[0] for line in test_log_file.readlines() if '2.5' in line]