Code Monkey home page Code Monkey logo

audio-recognition's Introduction

KaldiBasedSpeakerVerification
========================================
Author: Qianhui Wan
Version: 1.0.0
Date   : 2018-01-23

Prerequisite
------------
1. Kaldi 5.3, as well as Altas and OpenFst required by Kaldi.
https://github.com/kaldi-asr/kaldi

2. libfvad, Voice activity detection (VAD) library, based on WebRTC's VAD engine.
https://github.com/dpirch/libfvad

Installation
------------
1. Install Kaldi 5.3:
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream
cd kaldi

2. Install Kaldi's required libraries:
cd to /kaldi/tools and follow INSTALL instructions there.

3. Compile and finish Kaldi install:
cd to /kaldi/src and follow INSTALL instructions there.

4. Install libfvad:
git clone https://github.com/dpirch/libfvad
cd libfvad
./bootstrap
./configure
make
make install (perhaps sudo at this command)

5. Install KaldiBasedSpeakerVerification

cd KaldiBasedSpeakerVerification/src
*edit makefile; provide the correct locations for this project and the libraries.
make 
(This will output 3 executables under /src: enroll, identifySpeaker and extractFeatures)


Project file structure (under KaldiBasedSpeakerVerification folder)
----------------------------------
/examples
 contains enroll and test examples, along with example data

/examples/iv
 contains i-vector features extracted from enrollment.(this can be empty before enrolling speakers, must have 2 files before testing)
 
/examples/mat
 contains background model data, must have six files.
 
/scripts
 contains scripts mainly used to create background model.
 
/src
 contains code for 3 applications: creating a background model, enrolling speakers and speaker identification.



Main applications
-------------------------------------------------
/src/enroll.cpp
 This program is used to extract speech features from one speaker.
 Usage: enroll speakerId wavefile
 The output should look like:
 Not registered speaker: speakerId. Created a new spkid
 or
 Found registered speaker: speakerId. Updated speaker model

 The wavefile should be in .wav format.

 This will create/update two files in /iv: train_iv.ark and train_num_utts.ark.

/src/identifySpeaker.cpp
 This program process a given audio clip and output person identification every ~3.2 seconds.
 Usage: identifySpeaker wavefile
 The output should look like:
 Family membmer detected! Speaker: 225
 Family membmer detected! Speaker: 225
 Stanger detected!
 Family membmer detected! Speaker: 227
 Family membmer detected! Speaker: 227
 ...

 It will also output the probability score for each segments -> this could be used to adjust the decision threshold due to different audio condition.


Examples
-------------------------------------------------
After installing all required applications, you can run the following examples to test if your installation is right.

1. make sure there is three folder in /examples
  /example_data
  /iv
  /mat (due to the file size limit of GitHub, final.ie was zipped into several parts. To unzip, do: cat iepart* -> final.ie)

2. run ./test1Enroll.sh
This will enroll all speech files in /example_data/enroll
The output should look like:

The total active speech is 1.61 seconds.
No registered speaker: 174. Create a new spkid
Done.
The total active speech is 15 seconds.
Found registered speaker: 174. Update speaker model
Done.
The total active speech is 0.88 seconds.
No registered speaker: 84. Create a new spkid
Done.
The total active speech is 3.47 seconds.
Found registered speaker: 84. Update speaker model
Done.

3. run ./test1Test.sh
This will test speech /example_data/test/84/84-121550-0030.wav against all registered speaker
The output should look like:

Effective speech length: 2.605s.No family member detected.		(score: 4.97931)
Effective speech length: 5.685s.Family member detected! Speaker: 84	(score: 33.7779)
Speech data is finished!
Done.


*Note:
There will also be outputs of kaldi log which look like:
LOG ([5.3.96~1-7ee7]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.3.96~1-7ee7]:ComputeDerivedVars():ivector-extractor.cc:204) Done.

This tells you one audio segment has been processed and can be omitted by setting kaldi verbose level.

Background Model Training
-------------------------------------
/src/extractFeatures
 The program extracts 20-dim MFCC (with energy), append deltas and double deltas, and apply CMVN
 Usage: extractFeatures wav.scp ark,scp:feat.ark,feat.scp
 Input: wav.scp, a text list of speech file name and path
 Output: feat.ark, feat.scp -> same as kaldi.

/scripts/data_prep.sh
 usage: data_prep.sh path_to_speech path_to_info
 prepare useful text file for later process, please refer to data_prep.sh for details

/scripts/utt2spk_to_spk2utt.pl
 usage: utt2spk_to_spk2utt.pl utt2spk > spk2utt 
 create the spk2utt file with given utt2spk file

/scripts/train_ubm.sh
 usage: train_ubm.sh path_to_feat path_to_mat
 output: final.dubm, final.ubm
 please refer to train_ubm.sh for details
 
/scripts/train_ivextractor.sh
 usage: train_ivextractor.sh path_to_feat path_to_mat
 output: final.ie
 please refer to train_ivextractor.sh for details
 
/scripts/train_comp_plda.sh
 usage: train_comp_plda.sh path_to_feat path_to_mat
 output: final.plda, transform.mat, mean_vec
 please refer to train_comp_plda.sh for details

The following folders will be created during running: 
 /dev_data
 contains development dataset speech information, MFCC features and i-vectors 
 
 /mat
 contains all trained models:
 final.dubm, final.ubm, final.ie, final.plda, transform.mat, mean_vec

Note: The whole process can take several hours (e.g. 5 to 6 hours from VirtualBox-run CentOS version).
Note: All scripts need to modified manually for the path (same as examples), this can be avoided if you add all paths to environmental variables.

audio-recognition's People

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.