nttcslab-sp / eend-vector-clustering Goto Github PK
View Code? Open in Web Editor NEWThis repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.
License: Other
This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.
License: Other
Hello, I found there is a paramter named num_speakers in the train.yaml. Dose that mean the number of speaker in audio shoud equal to num_speakers? I
Recently, I want to compare our algorithm with your paper in callhome result. Can you kindly provide the rttm hypothesis of callhome for us in the original paper? Thanks a lot.
Hello there,
Thanks for your efforts in open-sourcing the code, it's vital for us trying to reproduce the result presented in the paper.
But I've come across a RuntimeError
when adapting the model with our private data which shows:
/*/EEND-vector-clustering/eend/pytorch_backend/train.py:186: RuntimeWarning: invalid value encountered in true_divide
fet_arr[spk] = org / norm
...
Traceback (most recent call last):
...
RuntimeError: The loss (nan) is not finite.
After some debugging, I found the problem actually happens during the backpropagation step when there exists an entry left with zeros in the embedding layer:
EEND-vector-clustering/eend/pytorch_backend/train.py
Lines 173 to 186 in b3649ee
Since the embeddings are actually loaded from the dumped speaker embeddings generated by the save_spkv_lab.py
script when adapting the model, I suspect there might exist some issue in the save_spkv_lab
function.
After some careful step-by-step checking with pdb, I found there is actually some silent speaker label added in the all_labels
variable when dumping the speaker embeddings:
EEND-vector-clustering/eend/pytorch_backend/infer.py
Lines 349 to 355 in b3649ee
Even when if torch.sum(t_chunked_t[sigma[i]]) > 0
, lab
can still be -1
which is considered as silent speaker acroding to code in:
EEND-vector-clustering/eend/pytorch_backend/diarization_dataset.py
Lines 94 to 99 in b3649ee
Since these silent speaker labels are -1 and the python list support negative indexing, this issue is silently ignored when dumping the embedding but will cause Exceptions when training begins.
I could simply fix this issue by adding speaker label to all_labels
only if lab < 0
when saving speaker embeddings and the followed training process could continue smoothly resulting in a good performing model.
But before opening any PR, I would like to know if you guys have ever come across such an issue or do you have any idea on why this will happen.
Thanks!
Hello, I was wondering have you evaluate with different net architecture? I modified the net according to the transformer's paper (layer numbers, heads numbers and hidden units size). And I found that the result does not get better (even worse on some unseen test wav) with the net become complicated.
Hi,
Thanks for this amazing work and open-sourcing it!
I do have a question regarding the number of speakers for simulated training data. Do you use a fix number of 3 or max number of 3? I saw from run.sh where you used 'simu_opts_num_speaker=3' for all the elements in simu_opts_num_speaker_array. So, you should use a fix spk num of 3, right?
Btw, any chance you can shared the trained/adapted models?
Cheers,
Xiang
Hello, while I modified the layer and head num of transformer in train.yaml I got a RuntimeError.
RuntimeError: shape '[128, -1, 12, 21]' is invalid for input of size 4915200
spk_loss_ratio: 0.03
spkv_dim: 256
max_epochs: 120
input_transform: logmel23_mn
lr: 0.001
optimizer: noam
num_speakers: 3
gradclip: 5
chunk_size: 150
batchsize: 128
num_workers: 8
hidden_size: 256
context_size: 7
subsampling: 10
frame_size: 200
frame_shift: 80
sampling_rate: 8000
noam_scale: 1.0
noam_warmup_steps: 25000
transformer_encoder_n_heads: 12
transformer_encoder_n_layers: 8
transformer_encoder_dropout: 0.1
seed: 777
feature_nj: 100
batchsize_per_gpu: 8
test_run: 0
I didnt go through the model structure code now. So these two parameter cannot random modified? or they are related to other para(context_size)?
Hi, for some reson my training process was interupted. I want to resume the train from the lastest ckpt and continue training on the old data. There is a para --spkv_lab: "file path of speaker vector with label and speaker ID conversion table for adaptation" . which file does it exactly mean? I tried the featlab_chunk_indices.txt but failed. I cannot find another file suitable for it... please help.
Thanks
When I run the callhome recipe using the default config file, the GPU utility is extremely low (less than 10%). Is this normal?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.