yluo42 / tac Goto Github PK
View Code? Open in Web Editor NEWtransform-average-concatenate (TAC) method for end-to-end microphone permutation and number invariant ad-hoc beamforming.
transform-average-concatenate (TAC) method for end-to-end microphone permutation and number invariant ad-hoc beamforming.
I am new to machine learning. I want to improve myself by reproducing your work, and I have encountered many problems in the training. I'm sorry to disturb you. Could you please share the training script if possible? This will help me a lot.
Could you please share the training script? I can not reproduce the results after reading
FaSNet, https://doi.org/10.1109/ASRU46091.2019.9003849
TAC, https://doi.org/10.1109/ICASSP40776.2020.9054177
Thanks in advance.
In in single-channel speech separation task, SI-SNRi is calculate by the follows:
SI-SINR0 = calc_SISNR(source, estimate_source)
SI-SINR1 = calc_SISNR(source, mixture)
SI-SNRi = SI-SINR0 - SI-SINR1
But in in multi-channel speech separation task, what is equivalent respectively for source, estmate_source, mixture?
Assume that there 2 speaks to be separated, then source is the 2 raw speech audios, each of which contains 1 speaker, and estimate_source is the 2 separated audios, each of which contains 1 speaker .
if so, how to calculate the SISNR between source and mixture in that mixture contains 2 channel audios?
II hope I've made myself plain on this issue.
Hi,
I'm trying to use FasNet as frontend to denoise and dereverb the audio at the same time. I noticed that in create_dataset.py, you save spk1_echoic_sig and spk2_echoic_sig as label, which I think refers to "Reverberant clean" in your paper (please correct me if i'm wrong). What should I do if I want to do dereverberation (like "Clean source" or "Mel-spectrogram" in your paper)?
To be more specific,
Hallo,
I am a little confused about these operations
"spk2 = spk2 / np.sqrt(np.sum(spk22)+1e-8) * 1e2" and
"noise = noise / np.sqrt(np.sum(noise2)+1e-8) * np.sqrt(np.sum((spk1+spk2)**2)+1e-8)"
in create_dataset.py, could u please explain why multiply "1e2" and "np.sqrt(np.sum((spk1+spk2)**2)+1e-8)" at the end.
thanks
Hello
Can you please provide instructions on how to use the suggested loss function?
Is the loss used for training contained in the sdr.py file? If so, how to call it? and if not, is there an implementation you can reference to?
Hi, I find the data generation part needs to read the pkl file , but if I want to modify the room size and num of mics, how to generate pkl? Does this mean that I need to manually modify the pkl file you provided?
Hello, I am new to speech separation by using python, can you tell me the details of using the data sets?I want to know how to input the data during training,because I have seen the code that generates the data。The data naming format is 'spk1_mic'+str(mic+1)+'.wav', so how to use multi-channel information,and how to set the batchsize ?
Thank you very much.
Hi , Yi,
Great work. And I 've been following your research, it seems the model is small and effective. And I notice there is two task, ESE and ESS, but have you experimented this beamformer towards AEC for echo cancellation? I suppose AEC is still substantially important in real scenarios.
Maybe this module could serve as a frontend of AEC, or make up a more complex network including both AEC and denoising. Does it do good to AEC than the original noisy signal?
Any thoughts on that? I am working on far-field AEC and trying to integrate it to our network if possible. Look forward to your reply.
BRs,
Yi
hi
because the url of the 100 Nonspeech Sounds corpus changed, so i can not download it, so could you share it with me? thanks a lot!
I tried reimplementation of dual-path RNN TasNet reading your paper.
DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION
Is the general structure published in this repository the same as dual-path RNN TasNet? I understand this repository is written for TAC.
There seem to be some improvements like gated outputs. Are these modules included in dual-path RNN TasNet?
Lines 15 to 21 in 96640a8
How to generate "original clean signal" as training target when the distance of speaker to mac is not fixed?
I have tried to use h(f)=\aplha\exp{-1j 2 \pi f d/c} (a impulse response in TD), fftconvolve with clean speech signal as direct path label signal. However, my small ConvTasNet dose not work well with this data generation method. Can you give me some suggestion about this question? Thank you very much.
Hi ,I want ask you a question about the model forward func, there have two input, one is the feature called 'input', another tensor is called 'num_mic'. And I saw your example num_mic = torch.from_numpy(np.array([3, 2])).view(-1,).type(x.type()) , is that means there have two set , one is two mics condition , another one is 3 mics condition? Suppose I use 8 channel dataset, should I set the unm_mic = np.array([8]) ? Thank you very much.
your work is so awesome! I am interested in how to run this project using the model.
would you mind update more details about running these script?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.