Code Monkey home page Code Monkey logo

deepasr's Introduction

DeepAsr

DeepAsr is an open-source & Keras (Tensorflow) implementation of end-to-end Automatic Speech Recognition (ASR) engine and it supports multiple Speech Recognition architectures.

Supported Asr Architectures:

  • Baidu's Deep Speech 2
  • DeepAsrNetwork1

Using DeepAsr you can:

  • perform speech-to-text using pre-trained models
  • tune pre-trained models to your needs
  • create new models on your own

DeepAsr key features:

  • Multi GPU support: You can do much more like distribute the training using the Strategy, or experiment with mixed precision policy.
  • CuDNN support: Model using CuDNNLSTM implementation by NVIDIA Developers. CPU devices is also supported.
  • DataGenerator: The feature extraction during model training for large the data.

Installation

You can use pip:

pip install deepasr

Getting started

The speech recognition is a tough task. You don't need to know all details to use one of the pretrained models. However it's worth to understand conceptional crucial components:

  • Input: Audio files (WAV or FLAC) with mono 16-bit 16 kHz (up to 5 seconds)
  • FeaturesExtractor: Convert audio files using MFCC Features or Spectrogram
  • Model: CTC model defined in Keras (references: [1], [2])
  • Decoder: Greedy or BeamSearch algorithms with the language model support decode a sequence of probabilities using Alphabet
  • DataGenerator: Stream data to the model via generator
  • Callbacks: Set of functions monitoring the training
import numpy as np
import pandas as pd
import tensorflow as tf
import deepasr as asr

# get CTCPipeline
def get_config(feature_type: str = 'spectrogram', multi_gpu: bool = False):
    # audio feature extractor
    features_extractor = asr.features.preprocess(feature_type=feature_type, features_num=161,
                                                 samplerate=16000,
                                                 winlen=0.02,
                                                 winstep=0.025,
                                                 winfunc=np.hanning)

    # input label encoder
    alphabet_en = asr.vocab.Alphabet(lang='en')
    # training model
    model = asr.model.get_deepspeech2(
        input_dim=161,
        output_dim=29,
        is_mixed_precision=True
    )
    # model optimizer
    optimizer = tf.keras.optimizers.Adam(
        lr=1e-4,
        beta_1=0.9,
        beta_2=0.999,
        epsilon=1e-8
    )
    # output label deocder
    decoder = asr.decoder.GreedyDecoder()
    # decoder = asr.decoder.BeamSearchDecoder(beam_width=100, top_paths=1)
    # CTCPipeline
    pipeline = asr.pipeline.ctc_pipeline.CTCPipeline(
        alphabet=alphabet_en, features_extractor=features_extractor, model=model, optimizer=optimizer, decoder=decoder,
        sample_rate=16000, mono=True, multi_gpu=multi_gpu
    )
    return pipeline


train_data = pd.read_csv('train_data.csv')

pipeline = get_config(feature_type = 'fbank', multi_gpu=False)

# train asr model
history = pipeline.fit(train_dataset=train_data, batch_size=128, epochs=500)
# history = pipeline.fit_generator(train_dataset = train_data, batch_size=32, epochs=500)

pipeline.save('./checkpoint')

Loaded pre-trained model has all components. The prediction can be invoked just by calling pipline.predict().

import pandas as pd
import deepasr as asr
import numpy as np
test_data = pd.read_csv('test_data.csv')

# get testing audio and transcript from dataset
index = np.random.randint(test_data.shape[0])
data = test_data.iloc[index]
test_file = data[0]
test_transcript = data[1]
# Test Audio file
print("Audio File:",test_file)
# Test Transcript
print("Audio Transcript:", test_transcript)
print("Transcript length:",len(test_transcript))

pipeline = asr.pipeline.load('./checkpoint')
print("Prediction", pipeline.predict(test_file))

References

The fundamental repositories:

deepasr's People

Contributors

eric-mc2 avatar scionoftech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deepasr's Issues

Problem with training on DeepSpeech2 model

Hi there.
I am trying to use DeepSpeech2 model in DeepAsr_CTC_Pipeline.ipynb but I get the following error.
Could you please help me on that?

Model: "DeepAsr"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          [(None, None, 161)]  0                                            
__________________________________________________________________________________________________
BN_1 (BatchNormalization)       (None, None, 161)    644         the_input[0][0]                  
__________________________________________________________________________________________________
Conv1D_1 (Conv1D)               (None, None, 512)    412672      BN_1[0][0]                       
__________________________________________________________________________________________________
CBN_1 (BatchNormalization)      (None, None, 512)    2048        Conv1D_1[0][0]                   
__________________________________________________________________________________________________
Conv1D_2 (Conv1D)               (None, None, 512)    1311232     CBN_1[0][0]                      
__________________________________________________________________________________________________
CBN_2 (BatchNormalization)      (None, None, 512)    2048        Conv1D_2[0][0]                   
__________________________________________________________________________________________________
Conv1D_3 (Conv1D)               (None, None, 512)    1311232     CBN_2[0][0]                      
__________________________________________________________________________________________________
BN_2 (BatchNormalization)       (None, None, 512)    2048        Conv1D_3[0][0]                   
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 1600)   6307200     BN_2[0][0]                       
__________________________________________________________________________________________________
dropout (Dropout)               (None, None, 1600)   0           bidirectional_1[0][0]            
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, None, 1600)   11529600    dropout[0][0]                    
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, None, 1600)   0           bidirectional_2[0][0]            
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, None, 1600)   11529600    dropout_1[0][0]                  
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, None, 1600)   0           bidirectional_3[0][0]            
__________________________________________________________________________________________________
bidirectional_4 (Bidirectional) (None, None, 1600)   11529600    dropout_2[0][0]                  
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, None, 1600)   0           bidirectional_4[0][0]            
__________________________________________________________________________________________________
bidirectional_5 (Bidirectional) (None, None, 1600)   11529600    dropout_3[0][0]                  
__________________________________________________________________________________________________
BN_3 (BatchNormalization)       (None, None, 1600)   6400        bidirectional_5[0][0]            
__________________________________________________________________________________________________
time_distributed (TimeDistribut (None, None, 1024)   1639424     BN_3[0][0]                       
__________________________________________________________________________________________________
the_output (TimeDistributed)    (None, None, 29)     29725       time_distributed[0][0]           
__________________________________________________________________________________________________
the_labels (InputLayer)         [(None, None)]       0                                            
__________________________________________________________________________________________________
input_length (InputLayer)       [(None, 1)]          0                                            
__________________________________________________________________________________________________
label_length (InputLayer)       [(None, 1)]          0                                            
__________________________________________________________________________________________________
ctc (Lambda)                    (None, 1)            0           the_output[0][0]                 
                                                                 the_labels[0][0]                 
                                                                 input_length[0][0]               
                                                                 label_length[0][0]               
==================================================================================================
Total params: 57,143,073
Trainable params: 57,136,479
Non-trainable params: 6,594
__________________________________________________________________________________________________
Feature Extraction in progress...
Feature Extraction completed.
input features:  (6347, 659, 161)
input labels:  (6347, 149)
Model training initiated...
Epoch 1/800

---------------------------------------------------------------------------

CancelledError                            Traceback (most recent call last)

<ipython-input-14-7be27ee27e0c> in <module>()
      1 # train asr model
----> 2 history = pipeline.fit(train_dataset = train_data, batch_size=128, epochs=800)
      3 
      4 # history = pipeline.fit_iter(train_dataset = train_data, batch_size=32, epochs=3,iter_num=500,checkpoint=project_path+'checkpoints')
      5 # history = pipeline.fit_generator(train_dataset = train_data, batch_size=32, epochs=500)

7 frames

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

CancelledError:  [_Derived_]RecvAsync is cancelled.
	 [[{{node DeepAsr/ctc/Log/_64}}]] [Op:__inference_train_function_27699]

Function call stack:
train_function

Multi-gpu context is lost before training

When the CTC Pipeline fit() methods are called, they compile the model, overwriting the model variable, and clearing the distributed data parallel context. Thus, the model only runs on a single GPU even if multiple GPUs are visible to tensorflow.

error while saving trained model in pipeline jupyter notebook

OSError Traceback (most recent call last)

in ()
1 # save deepasr ctc pipeline
----> 2 pipeline.save('checkpoints')

5 frames

/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
177 fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
178 elif mode == 'w':
--> 179 fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
180 elif mode == 'a':
181 # Open in append mode (read/write).

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.create()

OSError: Unable to create file (unable to open file: name = 'checkpoints/network.h5', errno = 2, error message = 'No such file or directory', flags = 13, o_flags = 242)

Cannot convert a symbolic Tensor

While running the following code on colab

model = asr.model.get_deepasrnetwork1(
input_dim=161,
output_dim=29,
is_mixed_precision = True
)
The error i receive is as follows
image

I could not proceed from here, i need help on what to do to procced from this point. Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.