Comments (6)
1: About the 1st question:
Sure, the clustering method was designing to eval the label of each clip of a long audio, with consideration for new coming speakers.
In realtime view, you should reimplement the uisrnnModel.predict
function, since it took a list of embeddings as input with fixed size and beam search method.
2: Had better use the same language between your training data and evaluating data, since different language has different tempo and feature, unless there is a more robust speaker recognition method.
from speaker-diarization.
hi @giorgionanfa have you tried the live diarization , i tried to do it but i always get the same speaker [0].
from speaker-diarization.
@taylorlu Could you please give more details about how to reimplement the function? What steps should I take to make it works for the realtime diarization? Thank you very much!
from speaker-diarization.
Ok, thanks so much, i will try
from speaker-diarization.
Hi @rohithkodali, i have not tried because i worked on other topics meanwhile. How have you modified the code?
from speaker-diarization.
@rohithkodali have you any update?
from speaker-diarization.
Related Issues (20)
- How to save the Plot/Animation/Video with Audio HOT 1
- Can not reprodcut the cluster result HOT 1
- Innacurate start and till time of slices attained HOT 2
- What is the exact version of tensorflow and Keras needed if I want to run the code? HOT 1
- Cuda Out Of Memory when invoking train.py HOT 3
- I want only 2 speakers as my output ,as my sample consists of only 2 speakers,what change in code should i do to achieve this
- Where can I find ghostvlad/training_data.npz ? HOT 1
- Is there a way to fine tune the pre-trained model on another language data?
- How many utterances and iteration you use for pretrained uisrnn model
- Which version of Keras, Tensorflow and Pytorch are compatible? HOT 7
- Speaker-Diarization for 2 person conversation HOT 3
- How can I generate this training set file「./ghostvlad/training_data.npz」
- Diarization result varries as we run inference multiple time on same audio.
- what about your parameters of embeddings_per_second and overlap_rate consistent with your results in readme?
- how to get the label of rmdmy.wav to calculate the DER?
- Predicted labels doesn't match with Ground truth labels but the accuracy of test results is 0.8%
- Hello, can you provide the papers published on this model?
- How to generate a dynamic diagram to show
- spec_len = sr/hop_length/embedding_per_second
- 数据集
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from speaker-diarization.