What audio data did you use to train uisrnn model?

How is appropriate to train uisrnn model with this dataset <a href="https://github.com

Please read the code of <a href="https://github.com/taylorlu/Speaker-Diarization/blob/

Sorry <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

pretrained uisrnn benchmark model about speaker-diarization HOT 11 CLOSED

gen35 commented on August 11, 2024

pretrained uisrnn benchmark model

from speaker-diarization.

Comments (11)

taylorlu commented on August 11, 2024 2

pretrained/weights.h5 has no relationship with dataset once you have trained the ghostvlad model( for speaker recognition ), it supports openset data, so you can use new speakers outside the training dataset for ghostvlad.

from speaker-diarization.

taylorlu commented on August 11, 2024

Please refer to https://github.com/taylorlu/Speaker-Diarization#dataset
you can use either of these datasets.
Before training uisrnn model, you should generate every embedding of speakers.

from speaker-diarization.

gen35 commented on August 11, 2024

I assume by using these datasets you are concatenating independent utterances. I wonder if it would be better to use smaller datasets with real dialogues. There are some free datasets: https://github.com/wq2012/awesome-diarization#datasets, but I haven't tested them yet.

from speaker-diarization.

taylorlu commented on August 11, 2024

Yes, the real dialogues should be more suitable to train the model since it considered the overlapping information of adjacent windows.
In uis-rnn, the embeddings seem to shuffle each other, you can read the code of uis-rnn for more detail.

from speaker-diarization.

gen35 commented on August 11, 2024

Thanks for reply.

from speaker-diarization.

Turan111 commented on August 11, 2024

How is appropriate to train uisrnn model with this dataset https://github.com/taylorlu/Speaker-Diarization#dataset ? Because there are not speaker changes in this dataset.

from speaker-diarization.

taylorlu commented on August 11, 2024

Please read the code of https://github.com/taylorlu/Speaker-Diarization/blob/master/ghostvlad/generate_embeddings.py,
I just concatenate the utterances of [10,20] speakers after VAD and generate the embeddings of each sliding window one by one. The final training data will contain the speaker change information.

from speaker-diarization.

Turan111 commented on August 11, 2024

many thanks for reply

from speaker-diarization.

giorgionanfa commented on August 11, 2024

Sorry @taylorlu , i would like to clarify a point. If i want to use my dataset, first of all i run generate_embeddings.py, in order to create training_data.npz, and then go on running train.py and speakerDiarization.py. In generate_embeddings.py, i change the path of the dataset, obviously, but i should change also the pretrained/weights.h5, or not?

Thank you in advance

from speaker-diarization.

giorgionanfa commented on August 11, 2024

Ok, thanks

from speaker-diarization.

SanaullahOfficial commented on August 11, 2024

Hi, I am trying to train my own dataset (a bunch of audio files) but the problem is I am getting errors while running generate_embeddings.py, in order to create a training_data.npz file.
error:
Not a directory: 'Dataset/.nfs000000010433337000000001/*.wav'

updated part:

`def prepare_data(SRC_PATH):
wavDir = os.listdir(SRC_PATH)
wavDir.sort()

allpath_list = []
allspk_list = []
for i,spkDir in enumerate(wavDir):   # Each speaker's directory
    spk = spkDir    # speaker name
    wavPath = os.path.join(SRC_PATH, spkDir, '*.wav')
    for wav in os.listdir(wavPath): # wavfile
        utter_path = os.path.join(wavPath, wav)
        allpath_list.append(utter_path)
        allspk_list.append(i)
    if(i>100):
        break

path_spk_list = list(zip(allpath_list, allspk_list))
return path_spk_list`

it will be great if you can suggest some possible ways to resolve this issue.

from speaker-diarization.

pretrained uisrnn benchmark model about speaker-diarization HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent