mozilla / deepspeech Goto Github PK

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

License: Mozilla Public License 2.0

Python 21.39% Shell 10.76% Makefile 0.91% C 11.23% C++ 46.94% JavaScript 0.45% CMake 0.90% Java 1.32% C# 2.77% Starlark 0.39% SWIG 0.46% Ruby 0.04% TypeScript 0.56% Swift 1.76% Objective-C 0.01% Awk 0.12%

deep-learning machine-learning neural-networks tensorflow speech-recognition speech-to-text deepspeech embedded on-device offline

deepspeech's People

Contributors

Stargazers

Watchers

Forkers

andrenatal lissyx cwiiis rishabhbanga lisurui6 lorenzfischer kustomzone cnevinc deepcv richardkelley yrbahn diogoaurelio warmlogic i3pw alphalfc agangzz tilmankamp liyong3forever xuqy1981 ollyja yiliangnie chagge eurismarpires py2n lewisling runngezhang robustfengbin benjamesbabala unforeseenocean tangzy7 pannous nihaoucas shyamalschandra gvoysey tongda rajivpoddar rollingstone frankiegu 0xqq nilopc-tensorflow-learning zndx opengxia xiao2mo galaxyh pandeydivesh15 moyifeng bikramjitroy ankitshah009 technologiclee suanfeng quackyquacker alexxnica kryndex mcfletch saurabhvyas mikehenrty offbit aaronlangford31 misc-git-forks nichongjia akboles baljeet acaps mnrmja007 hariom-yadaw shyamsukhamit terratenney avsolatorio tony32769 meysam9323 chetkhatri crs117 chernv tspannhw kfriesth mcopelli tensor-tang autohe guevaralb varunravi liyijin pdqn baileyqbb statml bzreinhardt think-station zzmjohn koaladaddy zentechthaingo aaronzira koyo-jakanees nanfengpo thinhptran iou2much josabandal kangkot fresty alanbekker saadmahboob arkmai

deepspeech's Issues

Instrument histograms of activations and gradients

Lesson 2: Deep Neural Networks

TensorFlow Tutorial 2: TensorFlow Mechanics 101

Work through the TensorFlow TensorFlow Mechanics 101[1] tutorial.

Creation of code to load Fisher+Switchboard and format for the python_speech_features package

Code to calculate MFCC features for LibriVox using python_speech_features and format them for the net

The code for the TED corpus is in the fork issue2. One should take this code as a starting point.

TensorFlow Tutorial 1: Deep MNIST for Experts

Work through the TensorFlow Deep MNIST for Experts[1] tutorial.

Implement "Prefix Beam Search Algorithm" of 1408.2873v2

Code to calculate MFCC features for Fisher+Switchboard using python_speech_features and format them for the net

Find a fix to tf.nn.ctc_beam_search_decoder()'s treatment of repeated characters

Add MPL license to repo

Benchmark the Warp-ctc algorithm

Benchmark the Warp-CTC algorithm relative to the Tensorflow CTC algorithm to see if it is worth our time to integrate the Warp-CTC.

As integration of the Warp-CTC algorithm will be time consuming, this relatively quick benchmark will allow us to give a go/no-go on if we should even integrate the Warp-CTC algorithm into tensorflow.

Creation of code to load TED-LIUM and format for the python_speech_features package

Deep Speech: Scaling up end-to-end speech recognition

Read through the article describing the research we are basing our implementation off of:

Hannun et. al, Deep Speech: Scaling up end-to-end speech recognition, arXiv:1412.5567 [cs.CL]

Lesson 4: Deep Models for Text and Sequences

Creation of code to load LibriVox and format for the python_speech_features package

The code for the TED corpus is in the fork issue2. One should take this code as a starting point.

TensorFlow Tutorial 3: Recurrent Neural Networks

Work through the TensorFlow Recurrent Neural Networks[1] tutorial.

Implement model serialization deserialization

Create text corpus for English language model

Implement WER algorithm

Integration of Warp CTC Algorithm into Deep Speech

Some hyperparameters are set incorrectly

Lesson 1: From Machine Learning to Deep Learning

Train DeepSpeech to obtain Librivox cClean test WER below 10% with various training data sets

Integrate the DeepSpeech servables/servable stream into TensorFlow Serving

Instrument DeepSpeech to display epoch's top N most confident results with WER!=0 orderer by WER

Integration of TensorFlow CTC Algorithm into Deep Speech

Inability to overfit TED corpus

As a result of the merge, commit 84c030a, the ability to overfit the TED data disappeared. It appears as if some bug was introduced in the merge.

Instrument DeepSpeech to graph training set size vs validation set WER

Lesson 3: Convolutional Neural Networks

Parallelize model training on multiple GPU's

Hyperparameter logging refers to ted_lium.validation not ted_lium.dev

Integrate the WER calculation into train, validation, and test sets

Clonning master and running train on gpu raises 'InvalidArgumentError'

andre@linuxdnn:~/DeepSpeech$ git pull && runipy DeepSpeech.ipynb
Already up-to-date.
10/15/2016 06:00:04 PM INFO: Reading notebook DeepSpeech.ipynb
10/15/2016 06:00:05 PM INFO: Running cell:
import os
import time
import json
import datetime
import tempfile
import subprocess
import numpy as np
from math import ceil
import tensorflow as tf
from util.log import merge_logs
from util.gpu import get_available_gpus
from util.importers.ted_lium import read_data_sets
from util.text import sparse_tensor_value_to_texts, wers
from tensorflow.python.ops import ctc_ops

I tensorflow/stream_executor/dso_loader.cc:116] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:116] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:116] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:116] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:116] successfully opened CUDA library libcurand.so.8.0 locally
10/15/2016 06:00:05 PM INFO: Cell returned
10/15/2016 06:00:05 PM INFO: Running cell:
learning_rate = 0.001 # TODO: Determine a reasonable value for this
beta1 = 0.9 # TODO: Determine a reasonable value for this
beta2 = 0.999 # TODO: Determine a reasonable value for this
epsilon = 1e-8 # TODO: Determine a reasonable value for this
training_iters = 15 # TODO: Determine a reasonable value for this
batch_size = 64 # TODO: Determine a reasonable value for this
display_step = 1 # TODO: Determine a reasonable value for this
validation_step = 1 # TODO: Determine a reasonable value for this
checkpoint_step = 5 # TODO: Determine a reasonable value for this
checkpoint_dir = tempfile.gettempdir() # TODO: Determine a reasonable value for this

10/15/2016 06:00:05 PM INFO: Cell returned
10/15/2016 06:00:05 PM INFO: Running cell:
dropout_rate = 0.05 # TODO: Validate this is a reasonable value

This global placeholder will be used for all dropout definitions

dropout_rate_placeholder = tf.placeholder(tf.float32)

The feed_dict used for training employs the given dropout_rate

feed_dict_train = { dropout_rate_placeholder: dropout_rate }

While the feed_dict used for validation, test and train progress reporting employs zero dropout

feed_dict = { dropout_rate_placeholder: 0.0 }

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
relu_clip = 20 # TODO: Validate this is a reasonable value

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
n_input = 26 # TODO: Determine this programatically from the sample rate

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
n_context = 9 # TODO: Determine the optimal value using a validation data set

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
n_hidden_1 = n_input + 2_n_input_n_context # Note: This value was not specified in the original paper
n_hidden_2 = n_input + 2_n_input_n_context # Note: This value was not specified in the original paper
n_hidden_5 = n_input + 2_n_input_n_context # Note: This value was not specified in the original paper

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
n_cell_dim = n_input + 2_n_input_n_context # TODO: Is this a reasonable value

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
n_hidden_3 = 2 * n_cell_dim

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
n_character = 29 # TODO: Determine if this should be extended with other punctuation

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
n_hidden_6 = n_character

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
def variable_on_cpu(name, shape, initializer):
# Use the /cpu:0 device for scoped operations
with tf.device('/cpu:0'):
# Create or get apropos variable
var = tf.get_variable(name=name, shape=shape, initializer=initializer)
return var

10/15/2016 06:00:06 PM INFO: Cell returned
10/15/2016 06:00:06 PM INFO: Running cell:
def BiRNN(batch_x, n_steps):
# Input shape: [batch_size, n_steps, n_input + 2_n_input_n_context]
batch_x = tf.transpose(batch_x, [1, 0, 2]) # Permute n_steps and batch_size
# Reshape to prepare input for first layer
batch_x = tf.reshape(batch_x, [-1, n_input + 2_n_input_n_context]) # (n_steps_batch_size, n_input + 2_n_input*n_context)

#Hidden layer with clipped RELU activation and dropout
b1 = variable_on_cpu('b1', [n_hidden_1], tf.random_normal_initializer())
h1 = variable_on_cpu('h1', [n_input + 2*n_input*n_context, n_hidden_1], tf.random_normal_initializer())
layer_1 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(batch_x, h1), b1)), relu_clip)
layer_1 = tf.nn.dropout(layer_1, (1.0 - dropout_rate_placeholder))
#Hidden layer with clipped RELU activation and dropout
b2 = variable_on_cpu('b2', [n_hidden_2], tf.random_normal_initializer())
h2 = variable_on_cpu('h2', [n_hidden_1, n_hidden_2], tf.random_normal_initializer())
layer_2 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(layer_1, h2), b2)), relu_clip)
layer_2 = tf.nn.dropout(layer_2, (1.0 - dropout_rate_placeholder))
#Hidden layer with clipped RELU activation and dropout
b3 = variable_on_cpu('b3', [n_hidden_3], tf.random_normal_initializer())
h3 = variable_on_cpu('h3', [n_hidden_2, n_hidden_3], tf.random_normal_initializer())
layer_3 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(layer_2, h3), b3)), relu_clip)
layer_3 = tf.nn.dropout(layer_3, (1.0 - dropout_rate_placeholder))

# Define lstm cells with tensorflow
# Forward direction cell
lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_cell_dim, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_cell_dim, forget_bias=1.0)

# Split data because rnn cell needs a list of inputs for the BRNN inner loop
layer_3 = tf.split(0, n_steps, layer_3)

# Get lstm cell output
outputs, output_state_fw, output_state_bw = tf.nn.bidirectional_rnn(cell_fw=lstm_fw_cell,
                                                                    cell_bw=lstm_bw_cell,
                                                                    inputs=layer_3,
                                                                    dtype=tf.float32)

# Reshape outputs from a list of n_steps tensors each of shape [batch_size, 2*n_cell_dim]
# to a single tensor of shape [n_steps*batch_size, 2*n_cell_dim]
outputs = tf.pack(outputs)
outputs = tf.reshape(outputs, [-1, 2*n_cell_dim])

#Hidden layer with clipped RELU activation and dropout
b5 = variable_on_cpu('b5', [n_hidden_5], tf.random_normal_initializer())
h5 = variable_on_cpu('h5', [(2 * n_cell_dim), n_hidden_5], tf.random_normal_initializer())
layer_5 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(outputs, h5), b5)), relu_clip)
layer_5 = tf.nn.dropout(layer_5, (1.0 - dropout_rate_placeholder))
#Hidden layer of logits
b6 = variable_on_cpu('b6', [n_hidden_6], tf.random_normal_initializer())
h6 = variable_on_cpu('h6', [n_hidden_5, n_hidden_6], tf.random_normal_initializer())
layer_6 = tf.add(tf.matmul(layer_5, h6), b6)

# Reshape layer_6 from a tensor of shape [n_steps*batch_size, n_hidden_6]
# to a tensor of shape [batch_size, n_steps, n_hidden_6]
layer_6 = tf.reshape(layer_6, [n_steps, batch_size, n_hidden_6])
layer_6 = tf.transpose(layer_6, [1, 0, 2])  # Permute n_steps and batch_size

# Return layer_6
return layer_6

10/15/2016 06:00:07 PM INFO: Cell returned
10/15/2016 06:00:07 PM INFO: Running cell:
def calculate_accuracy_and_loss(batch_set):
# Obtain the next batch of data
batch_x, batch_y, n_steps = batch_set.next_batch()

# Set batch_seq_len for the batch
batch_seq_len = batch_x.shape[0] * [n_steps]

# Calculate the logits of the batch using BiRNN
logits = BiRNN(batch_x, n_steps)

# CTC loss requires the logits be time major
logits = tf.transpose(logits, [1, 0, 2])

# Compute the CTC loss
total_loss = ctc_ops.ctc_loss(logits, batch_y, batch_seq_len)

# Calculate the average loss across the batch
avg_loss = tf.reduce_mean(total_loss)

# Beam search decode the batch
decoded, _ = ctc_ops.ctc_beam_search_decoder(logits, batch_seq_len)

# Compute the edit (Levenshtein) distance 
distance = tf.edit_distance(tf.cast(decoded[0], tf.int32), batch_y)

# Compute the accuracy 
accuracy = tf.reduce_mean(distance)

# Return results to the caller
return avg_loss, accuracy, decoded, batch_y

10/15/2016 06:00:07 PM INFO: Cell returned
10/15/2016 06:00:07 PM INFO: Running cell:
def create_optimizer():
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate,
beta1=beta1,
beta2=beta2,
epsilon=epsilon)
return optimizer

10/15/2016 06:00:07 PM INFO: Cell returned
10/15/2016 06:00:07 PM INFO: Running cell:

Get a list of the available gpu's ['/gpu:0', '/gpu:1'...]

available_devices = get_available_gpus()

If there are no GPU's use the CPU

if 0 == len(available_devices):
available_devices = ['/cpu:0']

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX 950
major: 5 minor: 2 memoryClockRate (GHz) 1.405
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.92GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950, pci bus id: 0000:01:00.0)
10/15/2016 06:00:07 PM INFO: Cell returned
10/15/2016 06:00:07 PM INFO: Running cell:
def get_tower_results(batch_set, optimizer=None):
# Tower decodings to return
tower_decodings = []
# Tower labels to return
tower_labels = []
# Tower gradients to return
tower_gradients = []

# Loop over available_devices
for i in xrange(len(available_devices)):
    # Execute operations of tower i on device i
    with tf.device(available_devices[i]):
        # Create a scope for all operations of tower i
        with tf.name_scope('tower_%d' % i) as scope:
            # Calculate the avg_loss and accuracy and retrieve the decoded 
            # batch along with the original batch's labels (Y) of this tower
            avg_loss, accuracy, decoded, labels = calculate_accuracy_and_loss(batch_set)

            # Allow for variables to be re-used by the next tower
            tf.get_variable_scope().reuse_variables()

            # Retain tower's decoded batch
            tower_decodings.append(decoded)

            # Retain tower's labels (Y)
            tower_labels.append(labels)

            # If we are in training, there will be an optimizer given and 
            # only then we will compute and retain gradients on base of the loss
            if optimizer is not None:
                # Compute gradients for model parameters using tower's mini-batch
                gradients = optimizer.compute_gradients(avg_loss)

                # Retain tower's gradients
                tower_gradients.append(gradients)

# Return results to caller
return tower_decodings, tower_labels, tower_gradients, avg_loss, accuracy

10/15/2016 06:00:07 PM INFO: Cell returned
10/15/2016 06:00:07 PM INFO: Running cell:
def average_gradients(tower_gradients):
# List of average gradients to return to the caller
average_grads = []

# Loop over gradient/variable pairs from all towers
for grad_and_vars in zip(*tower_gradients):
    # Introduce grads to store the gradients for the current variable
    grads = []

    # Loop over the gradients for the current variable
    for g, _ in grad_and_vars:
        # Add 0 dimension to the gradients to represent the tower.
        expanded_g = tf.expand_dims(g, 0)
        # Append on a 'tower' dimension which we will average over below.
        grads.append(expanded_g)

    # Average over the 'tower' dimension
    grad = tf.concat(0, grads)
    grad = tf.reduce_mean(grad, 0)

    # Create a gradient/variable tuple for the current variable with its average gradient
    grad_and_var = (grad, grad_and_vars[0][1])

    # Add the current tuple to average_grads
    average_grads.append(grad_and_var)

#Return result to caller
return average_grads

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:
def apply_gradients(optimizer, average_grads):
apply_gradient_op = optimizer.apply_gradients(average_grads)
return apply_gradient_op

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:
def log_variable(variable, gradient=None):
name = variable.name
mean = tf.reduce_mean(variable)
tf.scalar_summary(name + '/mean', mean)
tf.scalar_summary(name + '/sttdev', tf.sqrt(tf.reduce_mean(tf.square(variable - mean))))
tf.scalar_summary(name + '/max', tf.reduce_max(variable))
tf.scalar_summary(name + '/min', tf.reduce_min(variable))
tf.histogram_summary(name, variable)
if gradient is not None:
if isinstance(gradient, tf.IndexedSlices):
grad_values = gradient.values
else:
grad_values = gradient
if grad_values is not None:
tf.histogram_summary(name + "/gradients", grad_values)

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:
def log_grads_and_vars(grads_and_vars):
for gradient, variable in grads_and_vars:
log_variable(variable, gradient=gradient)

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:
logs_dir = "logs"
log_dir = '%s/%s' % (logs_dir, time.strftime("%Y%m%d-%H%M%S"))

def get_git_revision_hash():
return subprocess.check_output(['git', 'rev-parse', 'HEAD']).strip()

def get_git_branch():
return subprocess.check_output(['git', 'rev-parse', '--abbrev-ref', 'HEAD']).strip()

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:
def decode_batch(data_set):
# Get gradients for each tower (Runs across all GPU's)
tower_decodings, tower_labels, _, _, _ = get_tower_results(data_set)
return tower_decodings, tower_labels

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:
def calculate_wer(session, tower_decodings, tower_labels):
originals = []
results = []

# Normalization
tower_decodings = [j for i in tower_decodings for j in i]

# Iterating over the towers
for i in range(len(tower_decodings)):
    decoded, labels = session.run([tower_decodings[i], tower_labels[i]], feed_dict)
    originals.extend(sparse_tensor_value_to_texts(labels))
    results.extend(sparse_tensor_value_to_texts(decoded))

# Pairwise calculation of all rates
rates, mean = wers(originals, results)
return zip(originals, results, rates), mean

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:
def print_wer_report(session, caption, tower_decodings, tower_labels, show_example=True):
items, mean = calculate_wer(session, tower_decodings, tower_labels)
print "%s WER: %f09" % (caption, mean)
if len(items) > 0 and show_example:
print "Example (WER = %f09)" % items[0][2]
print " - source: "%s"" % items[0][0]
print " - result: "%s"" % items[0][1]
return items, mean

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:
def train(session, data_sets):
# Calculate the total number of batches
total_batches = data_sets.train.total_batches

# Create optimizer
optimizer = create_optimizer()

# Get gradients for each tower (Runs across all GPU's)
tower_decodings, tower_labels, tower_gradients, tower_loss, accuracy = \
    get_tower_results(data_sets.train, optimizer)

# Validation step preparation
validation_tower_decodings, validation_tower_labels = decode_batch(data_sets.dev)

# Average tower gradients
avg_tower_gradients = average_gradients(tower_gradients)

# Add logging of averaged gradients
log_grads_and_vars(avg_tower_gradients)

# Apply gradients to modify the model
apply_gradient_op = apply_gradients(optimizer, avg_tower_gradients)

# Create a saver to checkpoint the model
saver = tf.train.Saver(tf.all_variables())

# Prepare tensor board logging
merged = tf.merge_all_summaries()
writer = tf.train.SummaryWriter(log_dir, session.graph)

# Init all variables in session
session.run(tf.initialize_all_variables())

# Init recent word error rate levels
last_train_wer = 0.0
last_validation_wer = 0.0

# Loop over the data set for training_epochs epochs
for epoch in range(training_iters):
    # Define total accuracy for the epoch
    total_accuracy = 0

    # Validation step
    if epoch % validation_step == 0:
        _, last_validation_wer = print_wer_report(session, "Validation", validation_tower_decodings, validation_tower_labels)
        print

    # Loop over the batches
    for batch in range(int(ceil(float(total_batches)/len(available_devices)))):
        # Compute the average loss for the last batch
        _, batch_avg_loss = session.run([apply_gradient_op, tower_loss], feed_dict_train)

        # Add batch to total_accuracy
        total_accuracy += session.run(accuracy, feed_dict_train)

        # Log all variable states in current step
        step = epoch * total_batches + batch * len(available_devices)
        summary_str = session.run(merged, feed_dict_train)
        writer.add_summary(summary_str, step)
        writer.flush()

    # Print progress message
    if epoch % display_step == 0:
        print "Epoch:", '%04d' % (epoch+1), "avg_cer=", "{:.9f}".format((total_accuracy / total_batches))
        _, last_train_wer = print_wer_report(session, "Training", tower_decodings, tower_labels)
        print

    # Checkpoint the model
    if (epoch % checkpoint_step == 0) or (epoch == training_iters - 1):
        checkpoint_path = os.path.join(checkpoint_dir, 'model.ckpt')
        print "Checkpointing in directory", "%s" % checkpoint_dir
        saver.save(session, checkpoint_path, global_step=epoch)
        print

# Indicate optimization has concluded
print "Optimization Finished!"
return last_train_wer, last_validation_wer

10/15/2016 06:00:08 PM INFO: Cell returned
10/15/2016 06:00:08 PM INFO: Running cell:

Define CPU as device on which the muti-gpu training is orchestrated

with tf.device('/gpu:0'):
# Obtain ted lium data
ted_lium = read_data_sets(tf.get_default_graph(), './data/ted', batch_size, n_input, n_context)

# Create session in which to execute
session = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))

# Take start time for time measurement
time_started = datetime.datetime.utcnow()

# Train the network
last_train_wer, last_validation_wer = train(session, ted_lium)

# Take final time for time measurement
time_finished = datetime.datetime.utcnow()

# Calculate duration in seconds
duration = time_finished - time_started
duration = duration.days * 86400 + duration.seconds

10/15/2016 11:52:33 PM INFO: Cell raised uncaught exception:

InvalidArgumentError Traceback (most recent call last)
in ()
2 with tf.device('/gpu:0'):
3 # Obtain ted lium data
----> 4 ted_lium = read_data_sets(tf.get_default_graph(), './data/ted', batch_size, n_input, n_context)
5
6 # Create session in which to execute

/home/andre/DeepSpeech/util/importers/ted_lium.pyc in read_data_sets(graph, data_dir, batch_size, numcep, numcontext, thread_count)
114 TED_DATA = "TEDLIUM_release2.tar.gz"
115 TED_DATA_URL = "http://www.openslr.org/resources/19/TEDLIUM_release2.tar.gz"
--> 116 local_file = base.maybe_download(TED_DATA, data_dir, TED_DATA_URL)
117
118 # Conditionally extract TED data

/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.pyc in maybe_download(filename, work_directory, source_url)
140 temp_file_name = tmpfile.name
141 urllib.request.urlretrieve(source_url, temp_file_name)
--> 142 gfile.Copy(temp_file_name, filepath)
143 with gfile.GFile(filepath) as f:
144 size = f.size()

/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.pyc in copy(oldpath, newpath, overwrite)
299 with errors.raise_exception_on_not_ok_status() as status:
300 pywrap_tensorflow.CopyFile(
--> 301 compat.as_bytes(oldpath), compat.as_bytes(newpath), overwrite, status)
302
303

/usr/lib/python2.7/contextlib.pyc in exit(self, type, value, traceback)
22 if type is None:
23 try:
---> 24 self.gen.next()
25 except StopIteration:
26 return

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors.pyc in raise_exception_on_not_ok_status()
461 None, None,
462 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 463 pywrap_tensorflow.TF_GetCode(status))
464 finally:
465 pywrap_tensorflow.TF_DeleteStatus(status)

InvalidArgumentError: /tmp/tmpnlUgej
10/15/2016 11:52:33 PM INFO: Shutdown kernel
10/15/2016 11:52:36 PM WARNING: Exiting with nonzero exit status
andre@linuxdnn:/DeepSpeech$
andre@linuxdnn:/DeepSpeech$ git pull && runipy DeepSpeech.ipynb

Sheppard Exxactcorp Workstation Purchase to Delivery

Create DeepSpeech gRPC client for TensorFlow Serving

Instrument DeepSpeech to graph hyperparameters vs validation set WER

Setup TensorFlow Serving on DeepSpeech's server

Integration of Warp-CTC Algorithm into TensorFlow

Separate code for training from validation and test

Various little errors leftover from commit 84c030a

In the merge of commit 84c030a introduced several small typos and errors.

Code to calculate MFCC features for TED-LIUM using python_speech_features and format them for the net

Obtain Static IP for Exxactcorp Workstation in Berlin

Deep Speech 0.1.0-alpha

Completion of Deep Speech 0.1.0-alpha

Tune hyperparameters of English language model integration

WER is calculated using Levenshtein distance on chars, not words

WER should be calculated using the Levenshtein distance on words not chars[1].

However, the current code calculates the WER using the Levenshtein distance on chars not words[2][3].

Thus the current code should be fixed to calculate the WER using the Levenshtein distance on words not chars.

Please add MPL2 license to this repository

Per Mofoco policy, all original open source works should (where possible) be licensed under MPL2, but this public repository is missing the LICENSE file necessary to determine its usability by other projects. You can use the plain-text MPL2 at https://www.mozilla.org/media/MPL/2.0/index.815ca599c9df.txt (or, if your specific project requires it, the equivalent plain-text form of whatever license you're required to use.)

Epoch: 0012 avg_cer= 0.013866566
Training WER: 0.20625009
Example (WER = 66.00000009)
 - source: "we have"
 - result: "we hage"

As the source is "we have" and the target "we hage", the WER for the example should not be 66.000 but should be 0.5 or 50%

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.