audio-westlakeu / nbss Goto Github PK

View Code? Open in Web Editor NEW

181.0 6.0 21.0 258.26 MB

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

License: MIT License

Python 100.00%

pytorch separation speech multi-channel narrow-band full-band denoising dereverberation enhancement

nbss's People

Contributors

Stargazers

Watchers

nbss's Issues

pytorchlightning version mismatched or missing some code

Why is this lr_scheduler_step() function not overridden in the script the pytorchlightning version requires this function to be overloaded as well

How to train model for more than 2 speakers?

I trained the model (form NBSS) branch for 2 speakers separation using wsj0 dataset. It perfectly worked. But now I want to train the model for more than 2 speakers. What steps should I follow?

number of speakers is unknown

Can NBSS be applied to scenarios where the number of speakers is unknown

License

Thanks for sharing the code!

Under what license is this code released?

Loss Nan Value

I am getting the value for loss as Nan

And cuda error while training

I tried to train SpatialNet on WHAMR! dataset by the script python SharedTrainer.py fit --config=configs/SpatialNet.yaml --config=configs/datasets/whamr.yaml --model.arch.dim_input=12 --model.arch.dim_output=4 --model.arch.num_freqs=129 --trainer.precision=bf16-mixed --model.compile=True --data.batch_size=[2,4] --trainer.devices=0,1,2,3, --trainer.max_epochs=100, but I got an error：

Traceback (most recent call last):
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
    self._run_sanity_check()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1062, in _run_sanity_check
    val_loop.run()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 134, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 391, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_args)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 402, in validation_step
    return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 633, in __call__
    wrapper_output = wrapper_module(*args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 626, in wrapped_forward
    out = method(*_args, **_kwargs)
  File "/data/lzx/SpatialNet/NBSS/SharedTrainer.py", line 153, in validation_step
    yr_hat = self.forward(x)
  File "/data/lzx/SpatialNet/NBSS/SharedTrainer.py", line 107, in forward
    X, stft_paras = self.stft.stft(x[:, self.channels])  # [B,C,F,T], complex
  File "/data/lzx/SpatialNet/NBSS/models/io/stft.py", line 55, in stft
    X = torch.stft(x, n_fft=self.n_fft, hop_length=self.n_hop, win_length=self.win_len, window=self.window, return_complex=True)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/functional.py", line 648, in stft
    input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/lzx/SpatialNet/NBSS/SharedTrainer.py", line 335, in <module>
    cli = TrainCLI(
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/cli.py", line 386, in __init__
    self._run_subcommand(self.subcommand)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/cli.py", line 677, in _run_subcommand
    fn(**fn_kwargs)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 68, in _call_and_handle_interrupt
    trainer._teardown()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1012, in _teardown
    self.strategy.teardown()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 405, in teardown
    super().teardown()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/strategies/parallel.py", line 127, in teardown
    super().teardown()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 528, in teardown
    self.lightning_module.cpu()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 79, in cpu
    return super().cpu()
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 967, in cpu
    return self._apply(lambda t: t.cpu())
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 967, in <lambda>
    return self._apply(lambda t: t.cpu())
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/lib/../../../.././libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0xc0)[0x7fc55453ff00]
[ubuntu:39556] [ 4] /opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/lib/../../../.././libstdc++.so.6(+0xb643c)[0x7fc55453e43c]
[ubuntu:39556] [ 5] /opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/lib/../../../.././libstdc++.so.6(+0xb648e)[0x7fc55453e48e]
[ubuntu:39556] [ 6] /opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/lib/../../../.././libstdc++.so.6(+0xb6435)[0x7fc55453e435]
[ubuntu:39556] [ 7] /opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so(+0xcc8475)[0x7fc4b780f475]
[ubuntu:39556] [ 8] /opt/anaconda3/envs/lzx/lib/python3.8/site-packages/torch/lib/../../../.././libstdc++.so.6(+0xd3e95)[0x7fc55455be95]
[ubuntu:39556] [ 9] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fc5583c9609]
[ubuntu:39556] [10] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fc558194353]
[ubuntu:39556] *** End of error message ***
Aborted (core dumped)

May I ask if you have encountered the similar problem and how to fix it?

关于SpatialNet 参数量变大从而对性能提升的上限

感谢您开源的优秀作品，有个问题想向您请教一下。从论文中看SpatialNet-small → SpatialNet-large 性能有比较大的提升，您是否尝试过更大参数量的SpatialNet？SpatialNet-large已经是上限了吗？

模型训练完成后，推理报错

Unable to run training

Hi,

I am trying to run training SpatialNet with SMS_WSJ data. I use an example from your README:

python SharedTrainer.py fit \
 --config=configs/SpatialNet.yaml \ # network config
 --config=configs/datasets/sms_wsj_plus.yaml \ # dataset config
 --model.channels=[0,1,2,3,4,5] \ # the channels used
 --model.arch.dim_input=12 \ # input dim per T-F point, i.e. 2 * the number of channels
 --model.arch.dim_output=4 \ # output dim per T-F point, i.e. 2 * the number of sources
 --model.arch.num_freqs=129 \ # the number of frequencies, related to model.stft.n_fft
 --trainer.precision=bf16-mixed \ # mixed precision training, can also be 16-mixed or 32, where 32 can produce the best performance
 --model.compile=true \ # compile the network, requires torch>=2.0. the compiled model is trained much faster
 --data.batch_size=[2,4] \ # batch size for train and val
 --trainer.devices=0, \
 --trainer.max_epochs=100

but has faced some troubles.

Whet attempting run it I got multiple errors about unknown parameters. I invoked python SharedTrainer.py fit --help that parameter values are expected to be passed with space, not =. Moreover, configs for model and data are expected to be passed via --trainer and --data, not --config.
After fixing above issues and running again I got a following error:

error: Parser key "data":
  Not a valid subclass of LightningDataModule. 
.......
  Subclass types expect one of:
  - a class path (str)
  - a dict with class_path entry
  - a dict without class_path but with init_args entry (class path given previously)

I looked into configs/datasets/sms_wsj_plus.yaml and found that it indeed describes a dict without a key 'class_path', only value of a key 'data' does have such a key:

data:
  class_path: data_loaders.sms_wsj_plus.SmsWsjPlusDataModule
  init_args:
    sms_wsj_dir: data/sms_wsj/data
    rir_dir: datasets/SMS_WSJ_Plus_rirs/
    target: direct_path
    datasets: ["train_si284", "test_dev93", "test_eval92", "test_eval92"]
    audio_time_len: [4.0, 4.0, null, null]
    ovlp: mid
    speech_overlap_ratio: [0.1, 1.0]
    sir: [-5, 5]
    snr: [0, 20]
    num_spk: 2
    noise_type: ["babble", "white"]
    batch_size: [2, 1]

So I removed a top-level key 'data' for dict to have key 'class_path'. But now I am getting another error, namely

error: Parser key "data":
  'type' object is not subscriptable

I tried to replace path to yaml with actual class path data_loaders.sms_wsj_plus.SmsWsjPlusDataModule for the --data parameter but the same error is obtained.

Please, explain what I am doing wrong

Best regards,
Maxim

NBSSCLI.py: error: 'Configuration check failed :: No action for destination key "trainer.num_processes" to check its value.'

Hi,

I recently received a trained checkpoint (ckpt) file from my colleague and attempted to test and run it on my own device. To ensure consistency, I used the same configuration and ckpt files that my colleague successfully used on her device. However, when I tried to test the model on my device, I encountered the following error in the configuration:

NBSSCLI.py: error: 'Configuration check failed :: No action for destination key "trainer.num_processes" to check its value.'

I'm not sure what could be causing this issue and would appreciate any insights or guidance on how to fix it.

Thank you in advance for your assistance!

如何生成训练数据？

请问一下，我目前只有LDC93S6A这个数据集，在运行 sms_wsj 的make时总是出错，所以只生成了wsj_8k_zeromean(里面是wav文件)和wsj_8k_zermean.json（这文件里的性别，转录文本都是空的，我的kaldi跑不通）这两个文件，我的最终目标是把这个模型应用到2mic的阵列上，进行降噪。我要如何才能跑通这个例子呀？是不是只需要有干净数据就行了？有没有一步一步的操作步骤呀？

Ask Help for OnlineSpatialNet Mamba Version Can't Work

Aowesome job!
I encountered some problems when trying to reproduce the OnlineSpeatialnet Mamba version. I hope to get your help.
When I set the inference=False, the model can forward normally. But when I set the inference=True, it can't work.
Here is the Traceback:

  Traceback (most recent call last):
    File "/mnt/raid2/user_space/lizixuan/projects/SpatialNet_Casual/models/arch/OnlineSpatialNet.py", line 418, in <module>
      res = model(x, inference=True).mean()
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
      result = forward_call(*args, **kwargs)
    File "/mnt/raid2/user_space/lizixuan/projects/SpatialNet_Casual/models/arch/OnlineSpatialNet.py", line 349, in forward
      x, attn = m(x, mask, chunkwise_recurrent, self.rope, None, inference)
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
      result = forward_call(*args, **kwargs)
    File "/mnt/raid2/user_space/lizixuan/projects/SpatialNet_Casual/models/arch/OnlineSpatialNet.py", line 160, in forward
      x = x + self._mamba(x, self.mhsa, self.norm_mhsa, self.dropout_mhsa, inference)
    File "/mnt/raid2/user_space/lizixuan/projects/SpatialNet_Casual/models/arch/OnlineSpatialNet.py", line 179, in _mamba
      xi = mamba.forward(x[:, [i], :], inference_params)
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/mamba_ssm/modules/mamba_simple.py", line 131, in forward
      out, _, _ = self.step(hidden_states, conv_state, ssm_state)
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/mamba_ssm/modules/mamba_simple.py", line 248, in step
      y = selective_state_update(
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/mamba_ssm/ops/triton/selective_state_update.py", line 137, in selective_state_update
      _selective_scan_update_kernel[grid](
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 305, in run
      return self.fn.run(*args, **kwargs)
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 305, in run
      return self.fn.run(*args, **kwargs)
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 305, in run
      return self.fn.run(*args, **kwargs)
    [Previous line repeated 1 more time]
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/triton/runtime/jit.py", line 550, in run
      bin.c_wrapper(
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/triton/compiler/compiler.py", line 692, in __getattribute__
      self._init_handles()
    File "/home/lizixuan/miniconda3/envs/SpatialNet/lib/python3.10/site-packages/triton/compiler/compiler.py", line 683, in _init_handles
      mod, func, n_regs, n_spills = fn_load_binary(self.metadata["name"], self.asm[bin_path], self.shared, device)
  RuntimeError: Triton Error [CUDA]: device kernel image is invalid

Additionally, I did that on single V100(32G) gpu, and here are my environment configuration:

python == 3.10.14
torch == 2.2.2+cu118
causal-conv1d == 1.2.0.post2
mamba-sim == 1.2.0.post1

My WeChat ID is zx1292982431, if it can make our communication more convenient.

Can I use shorter training and testing utterances?

I notice that both training and testing utterances are 4seconds long and the inference is "the evaluation utterances are first chunked to 4-second segments and processed by the network, with 2-second overlapping between consecutive segments."

If I want to shink the input of the network, Is there any chance I can use them in shorter audio, say 200ms long?
Can I use 4-seconds for training and 200ms for inference?
If not, Can I use 200ms for training and 200ms for inference?

在运行generate_rirs.py部分遇到的问题

学长您好，为什么在运行generate_rirs.py会出现这样的问题呢？请问您之前遇到过吗？
(NBSS) root@autodl-container-d47211bbac-724f9c76:~/autodl-tmp/NBSS-main# python generate_rirs.py
Traceback (most recent call last):
File "generate_rirs.py", line 18, in
import gpuRIR
File "/root/miniconda3/envs/NBSS/lib/python3.8/site-packages/gpuRIR/init.py", line 9, in
from gpuRIR_bind import gpuRIR_bind
ImportError: /root/miniconda3/envs/NBSS/lib/python3.8/site-packages/gpuRIR_bind.cpython-38-x86_64-linux-gnu.so: undefined symbol: cufftExecC2R

dataset issue

Hi!
I've learned from your paper that you remix the WSJ0 dataset in the manner used in Fasnet. Actually I don't have WSJ0 dataset, but I generated mixed utterances with the data generation script used in Fasnet (https://github.com/yluo42/TAC/tree/master/data), which also contains 20000, 5000 and 3000 mixed utterances for training, validation and test respectively.
So I wonder if I can directly used the data I generated above to train your model? And it would be great if you could give me some advice on how to modify the code!
Looking for your reply! Thank you!

OSpatialNet貌似出现过拟合的问题？

你好，我用开源的SpatialNet和OSpatialNet分别训练了语音分离的模型，SpatialNet的表现确实非常惊艳，但是OSpatialNet出现训练时loss下降的比较正常，但是测试集中得到的结果非常差。猜测可能是出现过拟合的问题？

How to train SpatialNet using CHiME3/4 dataset

Hi. I'm interested in SpatialNet and thank you for providing the code.
I would like to implement the results from the SpatialNet paper using the CHiME3/4 dataset (Table X). However, it seems that the available dataloader only includes a dataset created with noise from CHiME3/4 (chime3_moving.py). Could you possibly provide a dataloader for the CHiME3/4 dataset?

data issue

Hello, I would like to know where should I put the wjs0-mix dataset in order to generate the RIR dataset？

`state` and `share_qk` options

Hi again,

Is the state argument of the different foward methods always None in your experiments? If not, when should it be set to something different from None?
When using Retention, I can see you are sharing the query and key projection layers when RoPE is disabled here. Can you explain why? This does not seem to be explained in the paper.

How to use custom dataset with SpatialNet

This is an interesting project, and I am very interested. I am having trouble understanding how to effectively use custom dataset with SpatialNet. Can you guide me on:
How to properly format and prepare my custom dataset for use with this project?
What are the best practices for importing and integrating the custom dataset into the project?
Are there any specific steps or configurations needed to make the project compatible with my custom dataset?
Thank you!

学长请问下面的论文查看的顺序是怎么样的呢?

why use the reverberated speech signal as the training target

hi,
it a great amazing project, thanks for your effort.
When I looked at the code, I found that the training target signal was reverberated speech. (https://github.com/Audio-WestlakeU/NBSS/blob/af66db92bb9d6f72f7100d613d3df38c40b10b09/data_loaders/ss_semi_online_dataset.py#L294C27-L294C27)
I wander why not use clean speech as the training target, as it would not only separate speakers, but also remove reverberation and even noise.

GPU memory requirements

Hi and thanks again for this cool project.

Could you provide some insight on the GPU memory requirements for training the different configurations of the online SpatialNet (MHSA vs. Retention vs. Mamba and 4-s vs. 32-s utterances)? I am currently facing GPU out-of-memory errors on a A100 40GB GPU when using Retention, 4-s utterances and a batch size of 4 utterances. My sampling rate is 16 kHz as opposed to 8 kHz in your paper but I doubled the STFT window and hop lengths so the dimensionality along the time axis should be the same.

Uniform Linear Array is not as good as circular array?

I tried the code on uniform linear array and SDR is used as loss. However, the training loss is around -7dB---8dB, which is much higher than the circular array. Are there any possible reasons? Do you have any suggestions on ULA?

usage: NBSSCLI.py [-h] [--config CONFIG] [--print_config [={comments,skip_null,skip_default}+]] {fit,validate,test,predict,tune} ... NBSSCLI-debug.py: error: 'Configuration check failed :: No action for destination key "test.model.lr_scheduler_kwargs.mode" to check its value.'

python NBSSCLI.py test --config=logs/NBSS_ifp/version_66/config-test.yaml --ckpt_path=logs/NBSS_ifp/version_66/checkpoints/epoch707_neg_si_sdr-13.7777.ckpt --trainer.gpus=0, --data.seeds="{'train':null,'val':2,'test':3}"
Hello, we encountered this problem in the test according to the example you sent. How can we solve it

Question about mamba edition.

Dear Lee,
Awesome job and congratulation!
It seems that their is only multi-head self attention edition SpatialNet here. Will you release the online mamba edition in the future?
Best!

For adhoc configuration

What can be done to utilize the algorithm for a non circular/random geometry of mic configurations?

will you combine it with speaker embedding?

many target speaker extraction is for single channel, multi-channel target speaker extraction is less researched. and many target speaker extraction network is time domain and performance is poor under real world reverberation recordings. 

SpatialNet performance is well under real world reverberation recordings, so, I just wonder if you will combine it with speaker embedding? 

I try to combine SptialNet with speaker embedding, the result is not good in real multichannel recordings, I just replace the bottleneck in PEA-TSE 3.0 with SptialNet structure.

多通道数据集的生成

学长您好，请问有用WSJ0生成多通道的脚本嘛？不太清楚这部分要怎么做，在generate_rirs生成房间脉冲响应后，data_loaders里面的代码就不太能对应上了

dataset

Where can I find the WSJ0 dataset to run the project? Can I use other datasets instead of WSJO ?

ModuleNotFoundError: No module named 'torchmetrics.audio.utils'

When I train NBSS, there's something wrong in "NBSS\NBSS-main\models\NBSS_ifp.py", line 15 :
ModuleNotFoundError: No module named 'torchmetrics.audio.utils'
It seems this file (torchmetrics.audio.utils) doesn't exist?
In "NBSS\NBSS-main\generate_rirs.py", line 18, I couldn't import gpuRIR on Windows 10,
how can I get gpuRIR?

using one real recordings to inference

Hi, deal author, fortunately to see the state of the arts results on the multi-channel speech separation task, and also thanks for opening source the code, due to the detail introduction in readme, I can now train the model with two mic config, I just want to ask a stupid question, if I want to inference using real recorded 2 mic signal, the length is not 4 seconds, how should I do..., sorry to disturb you...

audio-westlakeu / nbss Goto Github PK

nbss's People

Contributors

Stargazers

Watchers

Forkers

nbss's Issues

Recommend Projects

Recommend Topics

Recommend Org