Code Monkey home page Code Monkey logo

aind2-speech-recognition's Introduction

Artificial Intelligence Engineer Nanodegree

Different Deep Learning approaches to an Acoustic model for an ASR pipeline

Project overview

In this notebook, some different approaches are used to build the acoustic model for an end-to-end automatic speech recognition (ASR) pipeline:

  • Model 0: RNN
  • Model 1: RNN + TimeDistributed Dense
  • Model 2: CNN + RNN + TimeDistributed Dense
  • Model 3: Deeper RNN + TimeDistributed Dense
  • Model 4: Bidirectional RNN + TimeDistributed Dense
  • Model 5: CNN + Deep RNN + TimeDistributed Dense
  • Models Comparison and discussion of the models 1, 2, 3, 4 and 5
  • Final Model: Dilated Convolution + Deep RNN + TimeDistributed Dense
  • Discussion of final model architecture

The first part of this notebook investigates the LibriSpeech, the dataset that will be used to train and evaluate this pipeline. The wav signal is preprocessed in order to obtain frequencies and MFCC features. The final discussion includes observations about using either tensor as input features for the pipeline.

sample_models.py module, includes code for all the models. Once this module is imported in the notebook, the different architectures are trained within the notebook.

The second and longest part of the notebook includes discussion about the performance of all the models and compares results on using spectrogram or MFCC features.

The third part shows a predicted transcription based on the probability distribution of the chosen acoustic models, the output on the second part of the notebook.

ASR Screenshot

Install environment, Test

Requirements

FloydHub is a platform for training and deploying deep learning models in the cloud. It removes the hassle of launching your own cloud instances and configuring the environment. For example, FloydHub will automatically set up an AWS instance with TensorFlow, the entire Python data science toolkit, and a GPU. Then you can run your scripts or Jupyter notebooks on the instance. For this project:

floyd run --mode jupyter --gpu --env tensorflow-1.0

You can see your instance on the list is running and has ID XXXXXXXXXXXXXXXXXXXXXX. So you can stop this instance with Floyd stop XXXXXXXXXXXXXXXXXXXXXX. Also, if you want more information about that instance, use Floyd info XXXXXXXXXXXXXXXXXXXXXX

Environments

FloydHub comes with a bunch of popular deep learning frameworks such as TensorFlow, Keras, Caffe, Torch, etc. You can specify which framework you want to use by setting the environment. Here's the list of environments FloydHub has available, with more to come!

Datasets

With FloydHub, you are uploading data from your machine to their remote instance. It's a really bad idea to upload large datasets like CIFAR along with your scripts. Instead, you should always download the data on the FloydHub instance instead of uploading it from your machine.

Further Reading: How and Why mount data to your job

Usage

floyd run --gpu --env tensorflow-1.1 --data tontoninten/datasets/librispeech/1:LibriSpeech --mode jupyter

You only need to mount the data to your job, since the dataset may have already been uploaded by other user

Output

Often you'll be writing data out, things like TensorFlow checkpoints, updated notebooks, trained models and HDF5 files. You will find all these files, you can get links to the data with:

floyd output run_ID

Special Thanks

We(Udacity) have borrowed the create_desc_json.py and flac_to_wav.sh files from the ba-dls-deepspeech repository, along with some functions used to generate spectrograms.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.