Code Monkey home page Code Monkey logo

eend-vector-clustering's People

Contributors

nttcslab-sp-admin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

eend-vector-clustering's Issues

Requesting for the callhome result.

Recently, I want to compare our algorithm with your paper in callhome result. Can you kindly provide the rttm hypothesis of callhome for us in the original paper? Thanks a lot.

Potentiel issue excluding silent speaker

Hello there,

Thanks for your efforts in open-sourcing the code, it's vital for us trying to reproduce the result presented in the paper.

Problem

But I've come across a RuntimeError when adapting the model with our private data which shows:

/*/EEND-vector-clustering/eend/pytorch_backend/train.py:186: RuntimeWarning: invalid value encountered in true_divide
  fet_arr[spk] = org / norm
...
Traceback (most recent call last):
...
RuntimeError: The loss (nan) is not finite.

Detail

After some debugging, I found the problem actually happens during the backpropagation step when there exists an entry left with zeros in the embedding layer:

fet_arr = np.zeros([spk_num, fet_dim])
# sum
bs = spklabs.shape[0]
for i in range(bs):
if spkidx_tbl[spklabs[i]] == -1:
raise ValueError(spklabs[i])
fet_arr[spkidx_tbl[spklabs[i]]] += spkvecs[i]
# normalize
for spk in range(spk_num):
org = fet_arr[spk]
norm = np.linalg.norm(org, ord=2)
fet_arr[spk] = org / norm

Since the embeddings are actually loaded from the dumped speaker embeddings generated by the save_spkv_lab.py script when adapting the model, I suspect there might exist some issue in the save_spkv_lab function.

After some careful step-by-step checking with pdb, I found there is actually some silent speaker label added in the all_labels variable when dumping the speaker embeddings:

for i in range(args.num_speakers):
# Exclude samples corresponding to silent speaker
if torch.sum(t_chunked_t[sigma[i]]) > 0:
vec = outputs[i+1][0].cpu().detach().numpy()
lab = chunk_data[2][sigma[i]]
all_outputs.append(vec)
all_labels.append(lab)

Even when if torch.sum(t_chunked_t[sigma[i]]) > 0, lab can still be -1 which is considered as silent speaker acroding to code in:

S_arr = -1 * np.ones(n_speakers).astype(np.int64)
for seg in filtered_segments:
speaker_index = speakers.index(self.data.utt2spk[seg['utt']])
all_speaker_index = self.all_speakers.index(
self.data.utt2spk[seg['utt']])
S_arr[speaker_index] = all_speaker_index
. (This is where makes me feels confused since it should not happen as both lab and T/t_chunked produced with info from kaldi_obj.utt2spk)

Since these silent speaker labels are -1 and the python list support negative indexing, this issue is silently ignored when dumping the embedding but will cause Exceptions when training begins.

Question

I could simply fix this issue by adding speaker label to all_labels only if lab < 0 when saving speaker embeddings and the followed training process could continue smoothly resulting in a good performing model.

But before opening any PR, I would like to know if you guys have ever come across such an issue or do you have any idea on why this will happen.

Thanks!

Performance of different net architecture

Hello, I was wondering have you evaluate with different net architecture? I modified the net according to the transformer's paper (layer numbers, heads numbers and hidden units size). And I found that the result does not get better (even worse on some unseen test wav) with the net become complicated.

Number of speakers for simulated training data

Hi,

Thanks for this amazing work and open-sourcing it!
I do have a question regarding the number of speakers for simulated training data. Do you use a fix number of 3 or max number of 3? I saw from run.sh where you used 'simu_opts_num_speaker=3' for all the elements in simu_opts_num_speaker_array. So, you should use a fix spk num of 3, right?

Btw, any chance you can shared the trained/adapted models?

Cheers,
Xiang

get invalid input shape while modified the layer and head num in train.yaml

Hello, while I modified the layer and head num of transformer in train.yaml I got a RuntimeError.
RuntimeError: shape '[128, -1, 12, 21]' is invalid for input of size 4915200

spk_loss_ratio: 0.03
spkv_dim: 256
max_epochs: 120
input_transform: logmel23_mn
lr: 0.001
optimizer: noam
num_speakers: 3
gradclip: 5
chunk_size: 150
batchsize: 128
num_workers: 8
hidden_size: 256
context_size: 7
subsampling: 10
frame_size: 200
frame_shift: 80
sampling_rate: 8000
noam_scale: 1.0
noam_warmup_steps: 25000
transformer_encoder_n_heads: 12
transformer_encoder_n_layers: 8
transformer_encoder_dropout: 0.1
seed: 777
feature_nj: 100
batchsize_per_gpu: 8
test_run: 0

I didnt go through the model structure code now. So these two parameter cannot random modified? or they are related to other para(context_size)?

which file should I pass to spkv_lab while resume training with initmodel?

Hi, for some reson my training process was interupted. I want to resume the train from the lastest ckpt and continue training on the old data. There is a para --spkv_lab: "file path of speaker vector with label and speaker ID conversion table for adaptation" . which file does it exactly mean? I tried the featlab_chunk_indices.txt but failed. I cannot find another file suitable for it... please help.
Thanks

GPU utility extremely low

When I run the callhome recipe using the default config file, the GPU utility is extremely low (less than 10%). Is this normal?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.