Code Monkey home page Code Monkey logo

rnnt-speech-recognition's Introduction

RNN-Transducer Speech Recognition

End-to-end speech recognition using RNN-Transducer in Tensorflow 2.0

Overview

This speech recognition model is based off Google's Streaming End-to-end Speech Recognition For Mobile Devices research paper and is implemented in Python 3 using Tensorflow 2.0

Setup Your Environment

To setup your environment, run the following command:

git clone --recurse https://github.com/noahchalifour/rnnt-speech-recognition.git
cd rnnt-speech-recognition
pip install tensorflow==2.2.0 # or tensorflow-gpu==2.2.0 for GPU support
pip install -r requirements.txt
./scripts/build_rnnt.sh # to setup the rnnt loss

Common Voice

You can find and download the Common Voice dataset here

Convert all MP3s to WAVs

Before you can train a model on the Common Voice dataset, you must first convert all the audio mp3 filetypes to wavs. Do so by running the following command:

NOTE: Make sure you have ffmpeg installed on your computer, as it uses that to convert mp3 to wav

./scripts/common_voice_convert.sh <data_dir> <# of threads>
python scripts/remove_missing_samples.py \
    --data_dir <data_dir> \
    --replace_old

Preprocessing dataset

After converting all the mp3s to wavs you need to preprocess the dataset, you can do so by running the following command:

python preprocess_common_voice.py \
    --data_dir <data_dir> \
    --output_dir <preprocessed_dir>

Training a model

To train a simple model, run the following command:

python run_rnnt.py \
    --mode train \
    --data_dir <path to data directory>

rnnt-speech-recognition's People

Contributors

noahchalifour avatar sman1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.