Code Monkey home page Code Monkey logo

kaldi_training's Introduction

GMM-HMM and TDNN training procedure Repository

In Kaldi all projects are present inside the egs directory from Kaldi root directory. An example is shown in the figure below

fold_structure
Ref: https://www.eleanorchodroff.com/tutorial/kaldi/familiarization.html

The explanation used in this is mainly dervied from the references given below. First let's go through the wsj folder structure and then create an example directory of our own.

Data Preparation

WSJ directory structure

In wsj directory we see other directory s5(different versions) where actual files reside.

  • utils, steps and local directories contain the necessary files for further processing.
  • exp directory contain all the model parameters be it GMM or TDNN model. It will have the acoustic model.
  • conf directory contain config files that indicate any parameters that are to be set like sampling frequency of audios, beam and lattice beam widths etc.
  • data directory contain all the input data that is needed for training, validation and testing. In ASR as input we need audios, transcripts, words and their phonetic representation, nonsilence phones, silence phones etc. These files are to be created by the user.

Inside data directory we have train, lang and dict directories.

  • In train sub-directory four files are needed to be created fundamentally. wav.scp, text, utt2spk and spk2utt. Further details of these files can be found in here and here.
  • In dict(mentioned here as local/lang ) sub-directory we need files that are mentioned in detail here and here

Creating custom directory

custom_folder
As the directories utils and steps are common to may projects we can simply create a symbolic link as shown here

After creating train(and correspondingly validation and test) and dictionary directories. We will create L.fst. For that we need an OOV entry which is used for any word that is not present in the lexicon. That OOV symbol is needed to be present in lexicon as a word. Follow the commands here to create the lang directory where L.fst is created. This will be used later. L.fst is nothing but the pronunciation model for the corpus. After we create a lanugage model in following steps a G.fst file will be created in this location.

After this we proceed to compute features from the audios. Config files are set as shown here. In the config.ini file a variable(mfcc_conf) is present. This variable has path to mfcc config file

At this stage train folder, dictionary folder and pronunciation model have been prepared. In the Hybrid ASR system an HCLG graph is obtained from four components Acoustic Model(H), Context Transducer(C), Pronunciation Model(L) and Language Model(G). All these components are individually obtained and then a decoding graph is constructed. Pronunciation Model(L) is already obtained. Now we will look at Acoustic Model(H) Training.

GMM-HMM Training

Starting with monophone training steps are mentioned here. Next we move to triphone training using the alignment obtained from monophone training. There are three levels of triphone training and parameters for each are given in config.ini file After each level we use the decode script and get the decoded output of the trained triphone system

TDNN Training

After gmm-hmm training we proceed to tdnn training. A brief overview is given here. Training parameters can be changed under stage 16 where train.py is called. The baseline training has finished. Decode any test set with this trained model.

References

kaldi_training's People

Contributors

sakethvns avatar

Watchers

 avatar  avatar

kaldi_training's Issues

Readme updates

Add details of code that should be run for training. Which file and command to run?

Go through this website https://kaldi-asr.org/doc/kaldi_for_dummies.html for initial understanding of file structure used in kaldi

From this https://kaldi-asr.org/doc/kaldi_for_dummies.html, what sections does one need to focus on (e.g. Data preparation). Are all steps necessary in that? The details for this are talked about in the next paragraph

In Kaldi Operations are executed in stage wise manner. Before proceeding to training we have to prepare four files(utt2spk, spk2utt, text, wav.scp).

of the Readme. Discuss this first and then provide the link to the above with specific sections that one should focus on.

Where should these folders be created?

Next we need dictionary files that contain lexicon, silence and non silence phones.

How are these obtained?

After creating data directory and dictionary directory we can proceed to create language model using dictionary.

How?

Add a config file with whatever parameters are available that you use for training (Data Folder for training, Within each step (MFCC feature extraction, Monophone, Triphone there are multiple parameters that are configurable). What was used and why? How does this change with different data and language?

For someone looking at this code to begin training from scratch for a new language will be extremely difficult. Need to make the steps as simple as possible.

Code running details

Add details of code that should be run for training. Which file and command to run?
What does the code expect at input.

What will be the codes output?

Include this in the Readme

More details on Training needed

How do we choose the following parameters in the config file? How does this change with language and amount of data?

[tri1]
senones=7500
gaussians=80

[tri2]
senones=7500
gaussians=80

[tri3]
senones=7500
gaussians=80

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.