Code Monkey home page Code Monkey logo

kaldibasedspeakerverification's Introduction

KaldiBasedSpeakerVerification
========================================
Author: Qianhui Wan
Version: 1.0.0
Date   : 2018-01-23

Prerequisite
------------
1. Kaldi 5.3, as well as Altas and OpenFst required by Kaldi.
https://github.com/kaldi-asr/kaldi

2. libfvad, Voice activity detection (VAD) library, based on WebRTC's VAD engine.
https://github.com/dpirch/libfvad

Installation
------------
1. Install Kaldi 5.3:
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream
cd kaldi

2. Install Kaldi's required libraries:
cd to /kaldi/tools and follow INSTALL instructions there.

3. Compile and finish Kaldi install:
cd to /kaldi/src and follow INSTALL instructions there.

4. Install libfvad:
git clone https://github.com/dpirch/libfvad
cd libfvad
./bootstrap
./configure
make
make install (perhaps sudo at this command)

5. Install KaldiBasedSpeakerVerification

cd KaldiBasedSpeakerVerification/src
*edit makefile; provide the correct locations for this project and the libraries.
make 
(This will output 3 executables under /src: enroll, identifySpeaker and extractFeatures)


Project file structure (under KaldiBasedSpeakerVerification folder)
----------------------------------
/examples
 contains enroll and test examples, along with example data

/examples/iv
 contains i-vector features extracted from enrollment.(this can be empty before enrolling speakers, must have 2 files before testing)
 
/examples/mat
 contains background model data, must have six files.
 
/scripts
 contains scripts mainly used to create background model.
 
/src
 contains code for 3 applications: creating a background model, enrolling speakers and speaker identification.



Main applications
-------------------------------------------------
/src/enroll.cpp
 This program is used to extract speech features from one speaker.
 Usage: enroll speakerId wavefile
 The output should look like:
 Not registered speaker: speakerId. Created a new spkid
 or
 Found registered speaker: speakerId. Updated speaker model

 The wavefile should be in .wav format.

 This will create/update two files in /iv: train_iv.ark and train_num_utts.ark.

/src/identifySpeaker.cpp
 This program process a given audio clip and output person identification every ~3.2 seconds.
 Usage: identifySpeaker wavefile
 The output should look like:
 Family membmer detected! Speaker: 225
 Family membmer detected! Speaker: 225
 Stanger detected!
 Family membmer detected! Speaker: 227
 Family membmer detected! Speaker: 227
 ...

 It will also output the probability score for each segments -> this could be used to adjust the decision threshold due to different audio condition.


Examples
-------------------------------------------------
After installing all required applications, you can run the following examples to test if your installation is right.

1. make sure there is three folder in /examples
  /example_data
  /iv
  /mat (due to the file size limit of GitHub, final.ie was zipped into several parts. To unzip, do: cat iepart* -> final.ie)

2. run ./test1Enroll.sh
This will enroll all speech files in /example_data/enroll
The output should look like:

The total active speech is 1.61 seconds.
No registered speaker: 174. Create a new spkid
Done.
The total active speech is 15 seconds.
Found registered speaker: 174. Update speaker model
Done.
The total active speech is 0.88 seconds.
No registered speaker: 84. Create a new spkid
Done.
The total active speech is 3.47 seconds.
Found registered speaker: 84. Update speaker model
Done.

3. run ./test1Test.sh
This will test speech /example_data/test/84/84-121550-0030.wav against all registered speaker
The output should look like:

Effective speech length: 2.605s.No family member detected.		(score: 4.97931)
Effective speech length: 5.685s.Family member detected! Speaker: 84	(score: 33.7779)
Speech data is finished!
Done.


*Note:
There will also be outputs of kaldi log which look like:
LOG ([5.3.96~1-7ee7]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.3.96~1-7ee7]:ComputeDerivedVars():ivector-extractor.cc:204) Done.

This tells you one audio segment has been processed and can be omitted by setting kaldi verbose level.

Background Model Training
-------------------------------------
/src/extractFeatures
 The program extracts 20-dim MFCC (with energy), append deltas and double deltas, and apply CMVN
 Usage: extractFeatures wav.scp ark,scp:feat.ark,feat.scp
 Input: wav.scp, a text list of speech file name and path
 Output: feat.ark, feat.scp -> same as kaldi.

/scripts/data_prep.sh
 usage: data_prep.sh path_to_speech path_to_info
 prepare useful text file for later process, please refer to data_prep.sh for details

/scripts/utt2spk_to_spk2utt.pl
 usage: utt2spk_to_spk2utt.pl utt2spk > spk2utt 
 create the spk2utt file with given utt2spk file

/scripts/train_ubm.sh
 usage: train_ubm.sh path_to_feat path_to_mat
 output: final.dubm, final.ubm
 please refer to train_ubm.sh for details
 
/scripts/train_ivextractor.sh
 usage: train_ivextractor.sh path_to_feat path_to_mat
 output: final.ie
 please refer to train_ivextractor.sh for details
 
/scripts/train_comp_plda.sh
 usage: train_comp_plda.sh path_to_feat path_to_mat
 output: final.plda, transform.mat, mean_vec
 please refer to train_comp_plda.sh for details

The following folders will be created during running: 
 /dev_data
 contains development dataset speech information, MFCC features and i-vectors 
 
 /mat
 contains all trained models:
 final.dubm, final.ubm, final.ie, final.plda, transform.mat, mean_vec

Note: The whole process can take several hours (e.g. 5 to 6 hours from VirtualBox-run CentOS version).
Note: All scripts need to modified manually for the path (same as examples), this can be avoided if you add all paths to environmental variables.

kaldibasedspeakerverification's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

kaldibasedspeakerverification's Issues

Error whille running make...

I have installed libfvadproperly, I am having latest kaldi but getting following error while running make in KaldiBasedSpeakerVerification/src. I have provided the correct locations in makefile.

amol@amol-HP-406-G1-MT:/media/amol/data/libfvad/KaldiBasedSpeakerVerification/src$ make
g++ -std=c++11 -I/home/amol/kaldi/src -I/home/amol/kaldi/tools/openfst/include -DHAVE_ATLAS -I/home/amol/kaldi/tools/ATLAS/include -I/media/amol/data/libfvad/KaldiBasedSpeakerVerification/src -I/media/amol/data/libfvad/KaldiBasedSpeakerVerification/src/extension -I/media/amol/data/libfvad/include -I. -o enroll enroll.cpp -Lusr/local/lib -L/home/amol/kaldi/src/lib -L/home/amol/kaldi/tools/openfst/lib -L/usr/lib64/atlas -lkaldi-base -lkaldi-feat -lkaldi-util -lkaldi-matrix -lkaldi-hmm -lkaldi-gmm -lkaldi-ivector -lpthread -lkaldi-transform -lkaldi-tree -lfst -lfvad
/usr/bin/ld: cannot find -lkaldi-base
/usr/bin/ld: cannot find -lkaldi-feat
/usr/bin/ld: cannot find -lkaldi-util
/usr/bin/ld: cannot find -lkaldi-matrix
/usr/bin/ld: cannot find -lkaldi-hmm
/usr/bin/ld: cannot find -lkaldi-gmm
/usr/bin/ld: cannot find -lkaldi-ivector
collect2: error: ld returned 1 exit status
makefile:22: recipe for target 'enroll' failed
make: *** [enroll] Error 1

The test output does not look like the same as you.

I follow the instructions and run the script successfully. But there seems to be something wrong with my experiment.
Effective speech length: 2.605s.No family member detected. (score: -25.5264)
Effective speech length: 5.685s.No family member detected. (score: 11.2911)
The scores are not correct.
Could you pls give me some tips on this?
ๆ•่Žท

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.