Code Monkey home page Code Monkey logo

Comments (11)

taylorlu avatar taylorlu commented on August 11, 2024 2

pretrained/weights.h5 has no relationship with dataset once you have trained the ghostvlad model( for speaker recognition ), it supports openset data, so you can use new speakers outside the training dataset for ghostvlad.

from speaker-diarization.

taylorlu avatar taylorlu commented on August 11, 2024

Please refer to https://github.com/taylorlu/Speaker-Diarization#dataset
you can use either of these datasets.
Before training uisrnn model, you should generate every embedding of speakers.

from speaker-diarization.

gen35 avatar gen35 commented on August 11, 2024

I assume by using these datasets you are concatenating independent utterances. I wonder if it would be better to use smaller datasets with real dialogues. There are some free datasets: https://github.com/wq2012/awesome-diarization#datasets, but I haven't tested them yet.

from speaker-diarization.

taylorlu avatar taylorlu commented on August 11, 2024

Yes, the real dialogues should be more suitable to train the model since it considered the overlapping information of adjacent windows.
In uis-rnn, the embeddings seem to shuffle each other, you can read the code of uis-rnn for more detail.

from speaker-diarization.

gen35 avatar gen35 commented on August 11, 2024

Thanks for reply.

from speaker-diarization.

Turan111 avatar Turan111 commented on August 11, 2024

How is appropriate to train uisrnn model with this dataset https://github.com/taylorlu/Speaker-Diarization#dataset ? Because there are not speaker changes in this dataset.

from speaker-diarization.

taylorlu avatar taylorlu commented on August 11, 2024

Please read the code of https://github.com/taylorlu/Speaker-Diarization/blob/master/ghostvlad/generate_embeddings.py,
I just concatenate the utterances of [10,20] speakers after VAD and generate the embeddings of each sliding window one by one. The final training data will contain the speaker change information.

from speaker-diarization.

Turan111 avatar Turan111 commented on August 11, 2024

many thanks for reply

from speaker-diarization.

giorgionanfa avatar giorgionanfa commented on August 11, 2024

Sorry @taylorlu , i would like to clarify a point. If i want to use my dataset, first of all i run generate_embeddings.py, in order to create training_data.npz, and then go on running train.py and speakerDiarization.py. In generate_embeddings.py, i change the path of the dataset, obviously, but i should change also the pretrained/weights.h5, or not?

Thank you in advance

from speaker-diarization.

giorgionanfa avatar giorgionanfa commented on August 11, 2024

Ok, thanks

from speaker-diarization.

SanaullahOfficial avatar SanaullahOfficial commented on August 11, 2024

Hi, I am trying to train my own dataset (a bunch of audio files) but the problem is I am getting errors while running generate_embeddings.py, in order to create a training_data.npz file.
error:
Not a directory: 'Dataset/.nfs000000010433337000000001/*.wav'

updated part:

`def prepare_data(SRC_PATH):
wavDir = os.listdir(SRC_PATH)
wavDir.sort()

allpath_list = []
allspk_list = []
for i,spkDir in enumerate(wavDir):   # Each speaker's directory
    spk = spkDir    # speaker name
    wavPath = os.path.join(SRC_PATH, spkDir, '*.wav')
    for wav in os.listdir(wavPath): # wavfile
        utter_path = os.path.join(wavPath, wav)
        allpath_list.append(utter_path)
        allspk_list.append(i)
    if(i>100):
        break

path_spk_list = list(zip(allpath_list, allspk_list))
return path_spk_list`

it will be great if you can suggest some possible ways to resolve this issue.

from speaker-diarization.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.