The toolkit performs speaker diarization (finding 'who spoke when?') using the information bottleneck criterion. Specifically, it tries to group speech segments (X) into clusters (C) by minimizing the mutual information between them, while maximizing the mutual information between the segments and a set of relevance variables (Y). In the case of speaker diarization, the relevance variables are typically components of a GMM trained using all the voiced frames.
Python libraries: numpy, scipy, scikit-learn, librosa, kaldi_io (optional)
An installed Kaldi toolkit is highly recommended, but not mandatory
For a quick demo, execute runAMIExample.py
or runSyntheticExample.py
without any arguments.
The excerpts from AMI Meeting corpus come alongwith manual annotations for speaker turns, labels and vad. Each audio file contains two speakers. The synthetic example provides visualization using a dendrogram.
For a more comprehensive usage, refer to infoBottleneck.py
usage: infoBottleneck.py [-h] [--beta BETA] [--segLen SEGLEN]
[--frameRate FRAMERATE] [--numCluster NUMCLUSTER]
[--library LIBRARY] [--vadFile VADFILE]
[--gmmFile GMMFILE] [--localGMM LOCALGMM]
[--kaldiRoot KALDIROOT] [--numMix NUMMIX]
[--minBlockLen MINBLOCKLEN]
[--numRealignments NUMREALIGNMENTS]
wavFile rttmFile
Execute with the help option for more information about each parameter, including default values.
All values in Diarization Error Rate (%)
Method | AMI (ihm) | ICSI |
---|---|---|
Bayesian Information Criterion | 32.64 | 41.54 |
Idiap IB Toolkit | 27.55 | 38.35 |
SAIL IB Toolkit | 28.40 | 39.50 |
D. Vijayasenan, F. Valente and H. Bourlard, "An Information Theoretic Approach to Speaker Diarization of Meeting Data," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 7, pp. 1382-1393, Sept. 2009.
Manoj Kumar ([email protected])
This project is licensed under the MIT License - see the LICENSE.md file for details