Code Monkey home page Code Monkey logo

keyword_spotting_system's Introduction

Speech command recognition (Keyword Spotting)

In this project we use the Speech Commands dataset, which contains short (one-second long) audio clips of English commands, stored as audio files in the WAV format. More in detail, the version 0.02 of the dataset contains 105.829 utterances of 35 short words, by thousands of different people. It was released on April 11th 2018 under Creative Commons BY 4.0 license and collected using crowdsourcing, through AIY by Google. Some of these words are "yes", "no", "up", "down", "left", "right", "on", "off", "stop" and "go".

This project is developed as a final project of the course Human Data Analytics.

Notebooks

  • Data Analysis And Preprocessing Inspection
    This notebook takes care of loading and preparing the dataset, splitting it into training, validation, and testing sets. It also provides some information about the dataset with plots. It also introduces the functions used to pre-process the data (for example adding noise).

  • Keyword Spotting: general training notebook
    This notebook defines the generale training and testing pipeline, giving some information about the validation metrics used. The training is performed using our baseline model cnn-one-fpool3, taken from [Arik17].

  • Bayesian Optimization and Feature Comparison with CNN
    This notebooks is used to train our custom CNN models. With the first of these models we perform a Bayesian optimization, and we use it for inspecting the importance of dropout and batch normalization, realizing a feature comparison and studying the effect of data augmentation on the training set.

  • Keyword Spotting: ResNet architecture and Triplet Loss implementation
    In this notebook we play with ResNet models for the keyword spotting task. We start by implementing a simple ResNet architecture inspired by [Tang18] and then, motivated by [Vygon21], we modify such model and we train it to get a meaningful embedded representation of the input signals. We finally use k-NN to perform the classification task on these intermediate representations.

  • Keyword Spotting: a neural attention model for speech command recognition
    This notebook implements an attention model for speech command recognition. It is obtained as a modification of a Demo notebook prepared by the authors of the paper A neural attention model for speech command recognition.

  • Keyword Spotting: Conformer
    In this notebook, thanks to the library audio_classification_models, we implement a baseline Conformer architecture inspired by [Gulati20]. This model combines Convolutional Neural Networks and Transformers to get the best of both worlds by modeling both local and global features of an audio sequence in a parameter-efficient way. In detail, we use only one Conformer block in order to reduce the number of model parameters. Moreover, we perform hyperparameter tuning by means of Bayesian optimization in order to find, among the models with less than 2M parameters, the one that leads to the best accuracy.

  • Keyword Spotting: GAN-based classification
    In this notebook we try to implement a GAN-based classifier inspired by the paper GAN-based Data Generation for Speech Emotion Recognition. Unfortunately, to date we have not been able to figure out how to properly train the generator and discriminator in this specific case. As a result, we cannot currently test this approach.

Utils

Demo App

In this repository you can find a demo application that can be run as a python script with python demo_ks.py. It allows you to select the model you want to use and, when started, it detects the words in the Speech Commands Dataset through the microphone (or any chosen input device).

You can also find a notebook that can be used to play some commands from the dataset, in order to test such application with non-real-time signals.

Collaborators

  • Daniele Ninni
  • Nicola Zomer

keyword_spotting_system's People

Contributors

nicolazomer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.