Code Monkey home page Code Monkey logo

deepspeech-keras's Introduction

DeepSpeech-Keras

The DeepSpeech-Keras project helps to do the Speech-To-Text analysis easily.

from deepspeech import load

deepspeech = load('pl')
files = ['path/to/audio.wav']
sentences = deepspeech(files)

Using DeepSpeech-Keras you can:

  • perform speech-to-text analysis using pre-trained models
  • tune pre-trained models to your needs
  • create new models on your own

All of this was done using Keras API and Python 3.6. The main principle behind the project is that program and it's structure should be easy to use and understand.

DeepSpeech-Keras key features:

  • Multi GPU support: we do data parallelism via multi_gpu_model. This induces quasi-linear speedup on up to 8 GPUs.
  • CuDNN support: Model using CuDNNLSTM implementation by NVIDIA Developers. CPU devices is also supported.
  • DataGenerator: The feature extraction (on CPU) can be parallel to model training (on GPU). Moreover it can use precomputed features saved in a hdf5 file. echo ". /home/rolczynski/.anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc

Installation

You can use pip (not prepared yet):

pip install deepspeech-keras

Otherwise clone the code and create a new environment via conda:

git clone https://github.com/rolczynski/DeepSpeech-Keras.git
conda env create -f=environment.yml     # or use: environment-gpu.yml
conda activate DeepSpeech-Keras

Getting started

The speech recognition is a tough task. You don't need to know all details to use one of the pretrained models. However it's worth to understand conceptional crucial components:

  • Input: WAVE files with mono 16-bit 16 kHz (up to 5 seconds)
  • FeaturesExtractor: Convert audio files using MFCC Features
  • Model: CTC model defined in Keras (references: [1], [2])
  • Decoder: Beam search algorithm with the language model support decode a sequence of probabilities using Alphabet
  • DataGenerator: Stream data to the model via generator
  • Callbacks: Set of functions monitoring the training

Overview

Loaded pre-trained model has already all components. The prediction can be invoked implicit via __call__ method or more explicit:

from deepspeech import load

deepspeech = load('pl')             # Also can be: load(dir='model_dir')
files = ['to/test/sample.wav']

X = deepspeech.get_features(files)
y_hat = deepspeech.predict(X)
sentences = deepspeech.decode(y_hat)

Tune pre-trained model

The heart of the deepspeech is the Keras model (deepspeech.model). You can make use of all available Keras methods like predict_on_batch, get_weights ect. You can use it straightforward.

However rather than write your own training routine from scratch, you can use the deepspeech.fit method. Algorithm attributes (generators, optimizer, ctc loss, callbacks ect) are already set and they are ready to use. All you have to do is to create new or modify configuration.yaml file.

from deepspeech import DeepSpeech

deepspeech = DeepSpeech.construct('/tuned-model/configuration.yaml',
                                  '/tuned-model/alphabet.txt')
deepspeech.load('pl')       # Raise error if different architecture is defined

train_generator = deepspeech.create_generator('train.csv', batch_size=32)
dev_generator = deepspeech.create_generator('dev.csv', batch_size=32)

deepspeech.fit(train_generator, dev_generator, epochs=5)
deepspeech.save('/tuned-model/weights.hdf5')

Creating new models

If you have several GPU's and more than 500 hours labeled audio samples, you could start to create new models. You can use run.py script as a guideline. In configuration file you specify parameters for all five components:

features_extractor:       # audio.py: FeatureExtractor parameters (e.g. win_len or win_step)
model:                    # model.py: Model parameters (e.g. units)
optimizer:                # deepspeech.optimizer: Optimizer parameters (e.g. name or lr)
callbacks:                # callback.py: List of callback parameters (e.g tensorboard)
decoder:                  # ctc_decoder.py: Decoder parameters (e.g. naive)

For more details check default configuration for polish. Of course you don't have to use the configuration file. You can define all required dependencies (decoder, model ect) and pass to the deepspeech init. Moreover each component can be replaced and you can pass your own objects.

Useful scripts

In scripts folder you can find e.g.:

  • confusion_matrix.py: Investigate what mistakes were made and plot results.
  • consumer.py: Manage between different experiments. Create a queue file and in each line write command line to execute.
  • features.py: Extract features and save them in the hdf5 file ready to use by DataGenerator.
  • evaluate.py: Script to do a model evaluation. Calculate WER, CER and different statistics.

Pre-trained models

The calculations are in progress.

Related work

Amazing work was already done. Check out the implementations of Mozilla DeepSpeech (TensorFlow), deepspeech.pytorch (PyTorch) or even KerasDeepSpeech (Keras). This project is complementary to projects listed above. It tries to be more user-friendly for the newcomer users.

Contributing

Have a question? Like the tool? Don't like it? Open an issue and let's talk about it! Pull requests are appreciated! Interesting fields to improve:

  • support for different languages (or e.g. import weights from DeepSpeech pre-trained model)
  • rewrite keras model into tensorflow.keras
  • implement Eager Execution to make a debugging more interactive
  • Seq2Seq model training with language model (+attention)

The computational resource is available thanks to:

Usage

deepspeech-keras's People

Contributors

chesterthecheese avatar mpazik avatar rolczynski avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.