Code Monkey home page Code Monkey logo

deepbhashan's Introduction

DeepBhashan

DeepBhashan - Personalised Speech Recognition Using Deep Neural Networks

About :

Speech recognition problem can be solved using traditional ways like HMMs. With increased computational power and bigger datasets we can get better accuracy on DNN than HMM. Broad Structure:

  • Speech
  • Feature Extraction using pre trained base model
  • Fine Tuning Encoder Network(LSTM/RNNs)
  • Fine tuning English LM for Hindi Speech
  • Repeating Tasks for Multiple Users.

Pre-processing data :

We have taken hindi audiobook from librivox, which had 33 chapters of about 20-25 mins along with their mp3's and text files. Each hindi-typed text file has been converted to english data using google translate API using the script csv_mapping_EnglishToHindi.py Each sentence has been seperated indivisually and special characters and uni-codes have been removes from the text. Aeneas library has been used to preprocess the text files and generate a json syncmap for corresponding text and audio files.

Preparing the dataset :

In the folder named dataset -> text_files, chapter wise are present. To prepare the json files run the script Preparing_jsons.py. After creating the json files manually fine tune them to ensure proper matching of the text with the audio files in jsons. Once done with jsons, run the script Preparing_csvs.py to prepare csvs for each chapter. Merge all csvs to create a final one and then using Splitting_data.py seperate it into training, testing and validation data with a split ratio of 70:20:10. For seperating wav files in each csv seperately use Splitting_wav_files.py.

Preparing Hindi devnagri dataset:

In the folder named dataset -> Hindi dictionary and english dictionary is present. To prepare the one-to-one mapping, an array of unique english words can be formed using English_dictionary_fromHindi.py. These arrays can be used to make the final csv's of devnagri text by using csv_mapping_EnglishToHindi.py.

Training :

  • Pre-tained model DeepSpeech has been used to fine tune the parameters for the pre-processed data. All the training has been done from scratch.

  • Language Model is generated using the KenLM module KenLM for both Devanagari and English alphabet for the Hindi Speech Dataset.

  • The DeepSpeech model has been trained on on the English alphabet Hindi Speech Dataset. This ran successfully and the following is the validation and training loss curves shown below.

Loss curve

The model was trained for 20 epochs, as the validation loss was beginning to be stagnant.

The code, trained Models, training-test-validation data, LMs can be found here : Code

The link contains seperate folders for all the wav files and corresponding csv files used for training testing and validation along with the models used for training. Two seperate models are there, one for english and one for hindi data. The english model has been trained on 11 hours of data along with fine-tuning of parameters. The hindi model, made using transfer learning has been trained on 1 hour of data(personalised data). Both these folders contain respective languages models, alphabet files, vocabulary files, LM files and LM Scorers along with necessary data for training and inference.

Results :

Sample output to some of the sentences are shown below :

Audio Ground Truth DeepSpeech output with LM
Audio_1 na apane kisee bhaee se aur na hee mere kisee parivaar ke sadasy se na apane kisee bhaee se aur na hee mere kisee parivaar ke sadasy se
Audio_2 bas keval ek hee aap aaj ke baad kisee se bhee is vishay mein koee baat nahin karoge bas keval ek hee ki aap aaj ke baad kisee se bhee is vishay mein koee baat nahin karenge

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.