Code Monkey home page Code Monkey logo

airtrafficcontrol-automaticspeechrecognition-project's Introduction

Air Traffic Control Automatic Speech Recognition Project

Here are the steps for implementing an automatic speech recognition system for air traffic control communication dataset in the internship at the Asr Gooyesh Company

  • the first phase; searching for relevant articles and presenting them in PowerPoint format
  • second phase: fine-tuning the wav2vec2-large-xlsr-53 model on the ShEMO (Persian Speech Emotion Detection) database
  • third phase: fine-tuning the wav2vec2-base model on the English Timit dataset
  • fourth phase: fine-tuning the wav2vec2-large-robust model on air traffic dataset

The first phase; searching for relevant journals and presenting them.

After searching for various journals on this subject with the help of my group mate, I came to this google sheet and then myself in this google sheet lead to select the reference article titled How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications. Moreover, I presented a summary of what I received in this PowerPoint.

Second phase: Fine-tuning the wav2vec2-large-xlsr-53 model on the ShEMO (Persian Speech Emotion Detection) database

Run the code available on HuggingFace, to understand what fine-tuning the model is and how it should be done in my mother tongue to make the the preprocessing steps and results more comprehensible.

Third phase: Fine-tuning the wav2vec2-base model on the English Timit dataset

Running the code available on HuggingFace, to get closer to the main project, which was in English..

Fourth phase: Fine-tuning the wav2vec2-large-robust model on the air traffic dataset

The dataset that was used was ATCOSIM. It consists of ten hours of speech data recorded during ATC real-time simulations, automatically segmented, and orthographically transcribed. The utterances are in English language and pronounced by ten non-native operational controllers.

The most important stages: * Prepare Data, Tokenizer, Feature Extractor: * Generate a new CSV file so that it has a column of audio file path

  • Load Train and Test dataset:
  • Separating the dataset to train and test sets Train set would look like this:

  • Create Wav2Vec2 Feature Extractor:

    • Downsample the data because the ATCOSIM dataset sampled with 32kHz but our fine-tuning dataset sampled with 16kHz
  • Preprocess Data

  • Add the "speech" column to the dataset to read the audio files.

  • Training and Evaluation

  • Preparing arguments for our pre-trained model

  • After training, we reach WER around 0.3, which is reasonable:

  • In the final step, we evaluate the model. Here are ten random examples of our results with 35% WER:

airtrafficcontrol-automaticspeechrecognition-project's People

Contributors

zahrarahimii avatar

Stargazers

KoLiBer avatar Jignesh Patel avatar

Watchers

Paul Grach avatar  avatar

airtrafficcontrol-automaticspeechrecognition-project's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.