Comments (11)
pretrained/weights.h5
has no relationship with dataset once you have trained the ghostvlad model( for speaker recognition ), it supports openset data, so you can use new speakers outside the training dataset for ghostvlad.
from speaker-diarization.
Please refer to https://github.com/taylorlu/Speaker-Diarization#dataset
you can use either of these datasets.
Before training uisrnn model, you should generate every embedding of speakers.
from speaker-diarization.
I assume by using these datasets you are concatenating independent utterances. I wonder if it would be better to use smaller datasets with real dialogues. There are some free datasets: https://github.com/wq2012/awesome-diarization#datasets, but I haven't tested them yet.
from speaker-diarization.
Yes, the real dialogues should be more suitable to train the model since it considered the overlapping information of adjacent windows.
In uis-rnn, the embeddings seem to shuffle each other, you can read the code of uis-rnn for more detail.
from speaker-diarization.
Thanks for reply.
from speaker-diarization.
How is appropriate to train uisrnn model with this dataset https://github.com/taylorlu/Speaker-Diarization#dataset ? Because there are not speaker changes in this dataset.
from speaker-diarization.
Please read the code of https://github.com/taylorlu/Speaker-Diarization/blob/master/ghostvlad/generate_embeddings.py,
I just concatenate the utterances of [10,20] speakers after VAD and generate the embeddings of each sliding window one by one. The final training data will contain the speaker change information.
from speaker-diarization.
many thanks for reply
from speaker-diarization.
Sorry @taylorlu , i would like to clarify a point. If i want to use my dataset, first of all i run generate_embeddings.py, in order to create training_data.npz, and then go on running train.py and speakerDiarization.py. In generate_embeddings.py, i change the path of the dataset, obviously, but i should change also the pretrained/weights.h5, or not?
Thank you in advance
from speaker-diarization.
Ok, thanks
from speaker-diarization.
Hi, I am trying to train my own dataset (a bunch of audio files) but the problem is I am getting errors while running generate_embeddings.py, in order to create a training_data.npz file.
error:
Not a directory: 'Dataset/.nfs000000010433337000000001/*.wav'
updated part:
`def prepare_data(SRC_PATH):
wavDir = os.listdir(SRC_PATH)
wavDir.sort()
allpath_list = []
allspk_list = []
for i,spkDir in enumerate(wavDir): # Each speaker's directory
spk = spkDir # speaker name
wavPath = os.path.join(SRC_PATH, spkDir, '*.wav')
for wav in os.listdir(wavPath): # wavfile
utter_path = os.path.join(wavPath, wav)
allpath_list.append(utter_path)
allspk_list.append(i)
if(i>100):
break
path_spk_list = list(zip(allpath_list, allspk_list))
return path_spk_list`
it will be great if you can suggest some possible ways to resolve this issue.
from speaker-diarization.
Related Issues (20)
- How to save the Plot/Animation/Video with Audio HOT 1
- Can not reprodcut the cluster result HOT 1
- Innacurate start and till time of slices attained HOT 2
- What is the exact version of tensorflow and Keras needed if I want to run the code? HOT 1
- Cuda Out Of Memory when invoking train.py HOT 3
- I want only 2 speakers as my output ,as my sample consists of only 2 speakers,what change in code should i do to achieve this
- Where can I find ghostvlad/training_data.npz ? HOT 1
- Is there a way to fine tune the pre-trained model on another language data?
- How many utterances and iteration you use for pretrained uisrnn model
- Which version of Keras, Tensorflow and Pytorch are compatible? HOT 7
- Speaker-Diarization for 2 person conversation HOT 3
- How can I generate this training set file「./ghostvlad/training_data.npz」
- Diarization result varries as we run inference multiple time on same audio.
- what about your parameters of embeddings_per_second and overlap_rate consistent with your results in readme?
- how to get the label of rmdmy.wav to calculate the DER?
- Predicted labels doesn't match with Ground truth labels but the accuracy of test results is 0.8%
- Hello, can you provide the papers published on this model?
- How to generate a dynamic diagram to show
- spec_len = sr/hop_length/embedding_per_second
- 数据集
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from speaker-diarization.