Light

kenza-bouzid / ssr_labs Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 1.0 35.26 MB

Labs of DD219 Speech and Speaker Recognition @ KTH

Jupyter Notebook 99.93% Python 0.07%

ssr_labs's Introduction

DD2119 Speech and Speaker Recognition course

Author: Anna Sánchez Espunyes, Kenza Bouzid

Solutions for labs for SSR course @ KTH. Each lab contains implementation of speech recognition algorithms as well as notebooks with experiments.

Lab1 - Feature Extraction

The objective is to experiment with different features commonly used for speech analysis and recognition.

Tasks:

compute Mel Filterbank and MFCC features step-by-step
examine features
evaluate correlation between feature
compare utterances with Dynamic Time Warping
illustrate the discriminative power of the features with respect to words
perform hierarchical clustering of utterances
train and analyze a Gaussian Mixture Model of the feature vectors.

Lab2 - Hidden Markov Models with Gaussian Emissions

Objectives:

implement the algorithms for the evaluation and decoding of Hidden Markov Models (HMMs),
use your implementation to perform isolated word recognition
implement the algorithms for training Gaussian Hidden Markov Models (G-HMMs),
explain the meaning of the forward, backward and state posterior probabilities evaluated on speech utterances,

Tasks:

The overall task is to implement and test methods for isolated word recognition:

combine phonetic HMMs into word HMMs using a lexicon
implement the forward-backward algorithm,
use it compute the log likelihood of spoken utterances given a Gaussian HMM
perform isolated word recognition
implement the Viterbi algorithm, and use it to compute Viterbi path and likelihood
compare and comment Viterbi and Forward likelihoods
implement the Baum-Welch algorithm to update the parameters of the emission probability distributions

Lab3 - Phoneme Recognition with Deep Neural Networks

Objectives:

create phonetic annotations of speech recordings using predefined phonetic models
use software libraries1 to define and train Deep Neural Networks (DNNs) for phoneme recognition
explain the difference between HMM and DNN training
compare which speech features are more suitable for each model and explain why

Tasks:

using predefined Gaussian-emission HMM phonetic models, create time aligned phonetic transcriptions of the TIDIGITS database,
define appropriate DNN models for phoneme recognition using Keras,
train and evaluate the DNN models on a frame-by-frame recognition score,
repeat the training by varying model parameters and input features

ssr_labs's People

Contributors

Watchers

Forkers

annasanchez27

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.