Code Monkey home page Code Monkey logo

tstnn's Introduction

TSTNN

This is an official PyTorch implementation of paper "TSTNN: Two-Stage Transformer based Neural Network for Speech Enhancement in Time Domain", which has been accepted by ICASSP 2021. More details will be showed soon!

tstnn's People

Contributors

key2miao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tstnn's Issues

请问test.py中的输入是什么呢

file_name = 'psquare_17.5'
test_file_list_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/voice_bank/Transformer/v5/test_file_break' + '/' + file_name
audio_file_save = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/voice_bank/Transformer/v5/audio_file' + '/' + 'enhanced_' + file_name
这里file_name以及test_file_list_path不太懂是什么意思诶

Pre-trained model

Hello,
Kudos for your work. We would like to have a pre-trained model which can reproduce the results quoted in the paper. Thanks in advance.

What are the inputs of gan_pair.py?

Question 1:
In following codes:
train_clean_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/trainset/clean_trainset'
train_noisy_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/trainset/noisy_trainset'
train_mix_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank_mix/trainset'

I know that train_clean_path‘s input is clean trainset,but i don’t know what‘s the input of train_noisy_path? Is it only the noise or the mix of noise and clean trainset? I also want to know what`s the input of train_mix_path if it's possiable.
Question 2:
Could you please share the trained weights with me?

Looking forward to your reply.Thank you !

关于日志文件

您好,想问一下如果用的数据集不是您所用的那个的话,日志文件以及路径要怎么写呢

大佬好

请问您这套模型的实时性怎么样呢?

test.py

Hello, I want to ask what's the input of the test.py ? .wav ?

模型大小

您好,想请问一下您的模型大小大概是多大呢?谢谢

File missing - "log_testset.txt"

TSTNN/gen_pair.py

Lines 62 to 64 in 3a1dac4

test_log_path = '/media/concordia/DATA/KaiWang/pytorch_learn/pytorch_for_speech/dataset/voice_bank/log/logfiles/log_testset.txt'
with open(test_log_path, 'r') as test_log_file:
file_list = [line.split()[0] for line in test_log_file.readlines() if '2.5' in line]

Hi, I follow the code and try to generate the training data by "gen_pair.py",
but there is a file missing that cannot go further. Is there something I missed in the preprocessing ?

Thanks !

Input shape for Dual_Transformer

In the TSTNN paper, the two-stage transformer block passes the input into a local transformer followed by a global transformer. This means that the sequence length (in here) for the local transformer must be the frame length, and that for the global transformer must be the number of frames. This makes sense in Dual_Transformer that dim1 must be input.shape[3], but I'm confused with that of the DPTNet implementation, wherein their dim1 is input.shape[2]. In their implementation, aren't they passing in the global transformer first before the local transformer? I'm just confused with how to implement a local transformer and a global transformer separately.

TSTNN network training is very slow

I try to reproduce the TSTNN network. The data set is 50 hours, and each epoch is iterated 12000 times. The training is very slow. An epoch requires about 8 hours of training. I wonder if it is the same when you train?

Time Delay of TSTNN

Hi, I have already read your paper and it is definitely an innovative paper. And I've also been working on the Demucs modelu recently and you also compared TSTNN with Demucs in your experiments. The Facebook AI lab has descripted taht the casual Demucs with a lookahead time of 37 ms and it can run real time on the specified cpu in their paper. So I have a question about your TSTNN, have you compute or estimate the time delay of the model or does it sastify the real time running?
Hopefully to received your reply and we can communicate with each other!

repilcating experiments

Dear Mr. Wang,

I tried to replicate TSTNN: TWO-STAGE TRANSFORMER BASED NEURAL NETWORK FOR SPEECH ENHANCEMENT IN THE TIME DOMAIN.[Kai Wang, Bengbeng He, Wei-Ping Zhu], but there exists some problems.

I used gen_pair.py to generate the pairs from the dataset with 28 speakers, and I didn't change any other codes in this repository. However after 150 epoches on RTX3090 with batch_size 2, the results of the best-model in test.py showed worse performance compared to that in the dissertation. The results are as follows.
Result1// loss_time + 0.6 * loss_freq

REPLICATING ORIGINAL(IN DISSERTATION)
STOI: 0.9450 0.95
SSNR: 9.2971 9.70
PESQ: 2.8254 2.96
CSIG: 4.2356 4.33
CBAK: 3.4413 3.53
COVL: 3.5462 3.67

Result2 //loss = 0.8 * loss_time + 0.2 * loss_freq

REPLICATING ORIGINAL(IN DISSERTATION)
STOI: 0.9478 0.95
SSNR: 9.4876 9.70
PESQ: 2.9128 2.96
CSIG: 4.2624 4.33
CBAK: 3.4989 3.53
COVL: 3.6072 3.67

Do you have any advice/reference on how to approach the best performance?
Also I'm confused about the parameter of loss_time. Should it be 'loss = 0.4 * loss_time + 0.6 * loss_freq' or 'loss = 0.8 * loss_time + 0.2 * loss_freq'?

ttttttt

Best regards

关于跑的顺序

想问问学长跑代码是怎么跑的呢, 能告诉我个步骤吗

hi

Hello, these unorganized files are very unfriendly to me. I want to reproduce your experimental results. Can you write the purpose of these files in readme.txt. thanks!

Why do I get particularly low indicators?

When it was in fifth epoch already overfitting。I use VCTK train_56spks dataset and test_25spks dataset.
This is my trained metrics:
image

because in 5th epoch, it was slow change,so i only trained 10 epoch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.