audio-westlakeu / fs-eend Goto Github PK

The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024]

License: MIT License

Python 100.00%

online-inference speaker-diarization frame-wise self-attention end-to-end pytorch

fs-eend's Issues

inference

老师您好，推理时运行python train_diaxxx.py文件是python train_dia.py这个代码吗

idea

老师您好，请问你们有尝试过在帧级speaker embedding上面拼上使用预训练的说话人认证模型提取出的speaker embedding的相关实验吗？我这边在尝试这种做法，但实验效果一直没有达到预期

pre-trained model

Hello, author. I have submitted a request to obtain permission to download the pre-trained model for my research purposes. I kindly request your approval for this permission. Thank you.

关于数据集格式

老师您好！我是跨领域到说话人日志方向的学生。我现在对于这个说话人日志技术的kaldi数据格式很混沌，不清楚具体的文件夹类型和数据列表详情，在使用非CALLHOME的相关数据集时也不知道要如何预处理（没有CALLHOME数据集可供参考），目前是想把CN-Celeb数据集转换成适用于diarization的CALLHOME数据集格式，请问老师您可以给一个数据列表截图或者树状图给我看看数据集格式吗？

finetune 語料時數

您好，想請問一下，如果想要接續著fine-tune這個model，您建議語料時數為多少小時?
以及每個語者最少要講話多少分鐘?

期待您的回覆，感謝

评估

老师您好，评估的时候是需要先运行gen_h5_output.py再计算DER吗？gen_h5_output.py的输入是什么呢？我训练后的结果中没有看到代码运行所需要的文件

use pre-trained model infer dataset

老师您好，我想要尝试在不经过finetune的情况下使用simu_avg_41_50epo.ckpt预训练模型评估ami的效能。
但是在这个过程中，我因为遇到了许多挫折儿感到迷茫，不知道自己是否在正确的道路上前行，希望可以得到老师的指点。
以下是我目前所尝试的过程：
首先，我准备了kaldi格式的ami测试集，并创建一份spk_onl_tfm_enc_dec_nonautoreg_infer.yaml的拷贝，命名为ami_infer.yaml，修改其中train_data_dir与val_data_dir为我测试集位置。
之后运行 python train_dia.py --configs conf/ami_infer.yaml --gpus 0 --test_from_folder FS-EEND_simu_41_50epo_avg_model 我遇到错误抓不到ckpt档，因此我修改 ckpts = [x for x in all_files if (".ckpt" in x) and ("epoch" in x) and int(x.split("=")[1].split("-")[0])>=configs["log"]["start_epoch"] and int(x.split("=")[1].split("-")[0])<=configs["log"]["end_epoch"]] 为 ckpts = [x for x in all_files if (".ckpt" in x)]
但接着我遇到错误

Traceback (most recent call last):
  File "/mnt/HDD/HDD2/DTDwind/FS-EEND/train_dia.py", line 217, in <module>
    train(configs, gpus=setup.gpus, checkpoint_resume=setup.checkpoint_resume, test_folder=setup.test_from_folder)
  File "/mnt/HDD/HDD2/DTDwind/FS-EEND/train_dia.py", line 185, in train
    for name, param in state_dict.items():
AttributeError: 'float' object has no attribute 'items'

我发现程式无法正确读取ckpt当中的值，我把 state_dict = torch.load(test_folder + "/" + c, map_location="cpu")["state_dict"] 改成 state_dict = torch.load(test_folder + "/" + c, map_location="cpu") 后就能顺利读值了。

接着我遇到 TypeError: mel() takes 0 positional arguments but 3 were given1 我修改了 mel_basis = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=n_mels) 解决。
但接着我遇到错误

raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'TransformerEncoderFusionLayer' object has no attribute 'self_attn'. Did you mean: 'self_attn1'?

这似乎显示ckpt存的模型跟预期的不同，但是我看其他人的issue都可以顺利的运行程式，
，我不明白为何我会遇到如此众多的问题，是不是我有哪个步骤有缺失，恳请老师指点我正确的执行方式。

audio-westlakeu / fs-eend Goto Github PK

fs-eend's Issues

inference

idea

pre-trained model

关于数据集格式

finetune 語料時數

评估

use pre-trained model infer dataset

speaker_id

train

training time

requirement

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent