Code Monkey home page Code Monkey logo

tac's Issues

Could you please share the training script?

I am new to machine learning. I want to improve myself by reproducing your work, and I have encountered many problems in the training. I'm sorry to disturb you. Could you please share the training script if possible? This will help me a lot.

how to calculate SI-SNRi in multi-channel speech separation task

In in single-channel speech separation task, SI-SNRi is calculate by the follows:
SI-SINR0 = calc_SISNR(source, estimate_source)
SI-SINR1 = calc_SISNR(source, mixture)
SI-SNRi = SI-SINR0 - SI-SINR1
But in in multi-channel speech separation task, what is equivalent respectively for source, estmate_source, mixture?
Assume that there 2 speaks to be separated, then source is the 2 raw speech audios, each of which contains 1 speaker, and estimate_source is the 2 separated audios, each of which contains 1 speaker .
if so, how to calculate the SISNR between source and mixture in that mixture contains 2 channel audios?

II hope I've made myself plain on this issue.

Use FasNet as Frontend

Hi,

I'm trying to use FasNet as frontend to denoise and dereverb the audio at the same time. I noticed that in create_dataset.py, you save spk1_echoic_sig and spk2_echoic_sig as label, which I think refers to "Reverberant clean" in your paper (please correct me if i'm wrong). What should I do if I want to do dereverberation (like "Clean source" or "Mel-spectrogram" in your paper)?

To be more specific,

  1. Do I need to normalize or rescale the original audio (I find that the energy of the original audio is much larger than the echoic one. Si-sdr would even be a positive number when given original audio)?
  2. What should I do to follow the shift invariant training?
  3. If I what to learn Mel-spectrogram, what is the input? Audio or mel-spectrogram? Do I need to flatten the last dimension to be compatible with current loss function? (Since there will be one more dimension storing mel-spectrogram feature)

how to understand the rescaling with snr in create_dataset.py

Hallo,
I am a little confused about these operations
"spk2 = spk2 / np.sqrt(np.sum(spk22)+1e-8) * 1e2" and
"noise = noise / np.sqrt(np.sum(noise
2)+1e-8) * np.sqrt(np.sum((spk1+spk2)**2)+1e-8)"
in create_dataset.py, could u please explain why multiply "1e2" and "np.sqrt(np.sum((spk1+spk2)**2)+1e-8)" at the end.
thanks

Loss function

Hello
Can you please provide instructions on how to use the suggested loss function?
Is the loss used for training contained in the sdr.py file? If so, how to call it? and if not, is there an implementation you can reference to?

how to generate the pkl?

Hi, I find the data generation part needs to read the pkl file , but if I want to modify the room size and num of mics, how to generate pkl? Does this mean that I need to manually modify the pkl file you provided?

How to use the dataset in the paper?

Hello, I am new to speech separation by using python, can you tell me the details of using the data sets?I want to know how to input the data during training,because I have seen the code that generates the data。The data naming format is 'spk1_mic'+str(mic+1)+'.wav', so how to use multi-channel information,and how to set the batchsize ?
Thank you very much.

can FasNet be used to be a frontend for subsequent AEC

Hi , Yi,

Great work. And I 've been following your research, it seems the model is small and effective. And I notice there is two task, ESE and ESS, but have you experimented this beamformer towards AEC for echo cancellation? I suppose AEC is still substantially important in real scenarios.

Maybe this module could serve as a frontend of AEC, or make up a more complex network including both AEC and denoising. Does it do good to AEC than the original noisy signal?

Any thoughts on that? I am working on far-field AEC and trying to integrate it to our network if possible. Look forward to your reply.

BRs,

Yi

The network architecture related to dual-path RNN TasNet.

I tried reimplementation of dual-path RNN TasNet reading your paper.

DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION

Is the general structure published in this repository the same as dual-path RNN TasNet? I understand this repository is written for TAC.
There seem to be some improvements like gated outputs. Are these modules included in dual-path RNN TasNet?

TAC/FaSNet.py

Lines 15 to 21 in 96640a8

# gated output layer
self.output = nn.Sequential(nn.Conv1d(self.feature_dim, self.output_dim, 1),
nn.Tanh()
)
self.output_gate = nn.Sequential(nn.Conv1d(self.feature_dim, self.output_dim, 1),
nn.Sigmoid()
)

Q about the data generation!

How to generate "original clean signal" as training target when the distance of speaker to mac is not fixed?
I have tried to use h(f)=\aplha\exp{-1j 2 \pi f d/c} (a impulse response in TD), fftconvolve with clean speech signal as direct path label signal. However, my small ConvTasNet dose not work well with this data generation method. Can you give me some suggestion about this question? Thank you very much.

num_mic array

Hi ,I want ask you a question about the model forward func, there have two input, one is the feature called 'input', another tensor is called 'num_mic'. And I saw your example num_mic = torch.from_numpy(np.array([3, 2])).view(-1,).type(x.type()) , is that means there have two set , one is two mics condition , another one is 3 mics condition? Suppose I use 8 channel dataset, should I set the unm_mic = np.array([8]) ? Thank you very much.

how to run this project

your work is so awesome! I am interested in how to run this project using the model.
would you mind update more details about running these script?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.