Code Monkey home page Code Monkey logo

bert-lid's Introduction

BERT-LID: Leveraging BERT to Improve Spoken Language Identification

We propose a BERT-based language identification system (BERT-LID) to improve language identification performance, especially on short-duration speech segments. We extend the original BERT model by taking the phonetic posteriorgrams (PPG) derived from the front-end phone recognizer as input. Then we deployed the optimal deep classifier followed by it for language identification.

Datasets

We use OLR20, TIMIT&THCHS-30, TAL_ASR datasets for experiments. Among them, in order to test in segment audio, we perform segmentation processing on TIMIT&THCHS-30, where the window length is 1s and the window movement is 1s for segmentation. The specific usage of data is shown in data. The TAL_ASR data is forcibly aligned.

Feature extraction

We directly obtains the phoneme features of the audio through Phoneme recognizer based on long temporal context, and then obtains tokens through the BertTokenizer model provided by BERT. We take this feature as input. For the specific processing of data, see the 'get_data.py'.

Experiments

Load data

Load the data by adjusting the path parameters in 'load_data.py'. In the experiment, we divided the dataset into three datasets: train, test, and dev.

Models

Our model is saved in BertCNN, BertRCNN, BertDPCNN, BertLSTM, and pay attention to whether bert in the model-related file is turned on during use.

Program

Run the program using the following command:

python run_me.py

Important parameters are as follows:

  • model_name: Select the input model

  • label_list:Given label list

  • bert_voacb_file:Input the bert model

Or you can use the following command to run multiple times and adjust the super parameters at the same time:

bash run_me.sh

References

[1] P. Schwarz, "Phoneme Recognition based on Long Temporal Context, PhD Thesis", Brno University of Technology, 2009

[2] TAL_ASR: https://ai.100tal.com/dataset

[2] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

[4] https://github.com/codertimo/BERT-pytorch

[5] Schwarz, Petr, et al. "Phoneme recognizer based on long temporal context." Speech Processing Group, Faculty of Information Technology, Brno University of Technology.[Online]. Available: http://speech. fit. vutbr. cz/en/software (2006).

Citation

Please cite our paper if you find this work useful:

@InProceedings{NieYT2022_ISCSLP,
  author    = {Yuting Nie and Junhong Zhao and Wei-Qiang Zhang and Jinfeng Bai},
  booktitle = {International Symposium on Chinese Spoken Language Processing (ISCSLP)},
  title     = {BERT-LID: Leveraging BERT to improve spoken language identification},
  address   = {Singapore},
  month     = {12},
  year      = {2022},
  url       = {https://arxiv.org/abs/2203.00328},
}

bert-lid's People

Contributors

lydianyt avatar zwq1979 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.