Code Monkey home page Code Monkey logo

speaker_embedding_moco's Introduction

Introduction

This repository contains the code release for our paper Learning Speaker Embedding with Momentum Contrast.

The code has been developed using Kaldi and Pytorch. Kaldi is only used for feature extraction and post-processings. All neural networks are trained using Pytorch.

The purpose of the project is to make researches on neural network based speaker verification easier. We also try to reproduce some results in our papers.

Requirements

The code is tested with the following dependencies.

  • Python: 3.6.8
  • Kaldi: 5.5
  • kaldi-io: 0.9.4
  • numpy: 1.16.4
  • Pillow: 6.2.1
  • scikit-learn: 0.22.2.post1
  • six: 1.13.0
  • tensorboardX: 1.1
  • torch: 1.2.0

In addition to Kaldi, you can install the requirements via pip with:
pip install -r requirements.txt

Usage

Prerequisites

Set KALDI_ROOT properly in path.sh. Link $KALDI_ROOT/egs/wsj/s5/utils to utils.

Step1: Train MoCo Model

To train the MoCo Model with SpecAugment, run:

sh train_moco.sh \
  --voxceleb1_root [voxceleb1 dir] \
  --voxceleb2_root [voxceleb2 dir] \
  --rirs_noises_root [rirs_noises dir] \
  --musan_root [musan dir] \
  --data [train data dir] \
  --exp [exp dir]

Step2: Train AAM-Softmax

To train AAM-Softmax with the pretrained MoCo model, run:

sh train_xvector.sh \
  --voxceleb1_root [voxceleb1 dir] \
  --voxceleb2_root [voxceleb2 dir] \
  --rirs_noises_root [rirs_noises dir] \
  --musan_root [musan dir] \
  --moco_model [pretrained MoCo model] \
  --data [train data dir] \
  --exp [exp dir]

Note: If the MoCo Model doesn't exist, run_xvector.sh will be trained like standard Xvector recipe.

Step3: Evaluate the performace with trained model

After training, you can evaluate the performance on the test set, run:

sh test.sh \
  --data [test data dir] \
  --exp [exp] \
  --dir [trained model dir] \
  --mdl [model name, default final.pkl] \
  --plda_score [apply plda if true else apply cosine score, default true]

Setting

  • Training data: All of Voxceleb2, plus the training portion of Voxceleb1.
  • Test data: The test portion of Voxceleb1.
  • For standard Xvector and MoCo, the learning rate is initially set to 1e-4 and gradually reduced to 1e-5 along the training process, for AAM-Softmax, the learning rate is initially set to 1e-5 and gradually reduced to 1e-6 along the training process.
  • Chunk Size: From 200 ~ 400.
  • Batch Size: 1024.
  • Backend Classifier: PLDA and Cosine.
  • Evaluate Model: We selet epoch=900(checkpoint_e900.pkl) to evaluate the performance, although the AAM-Softmax trained with pretrained MoCo model would converge faster.

Performance

PLDA

Method EER(%) minDCT(0.01) minDCT(0.001)
Ivector 5.467 0.4859 0.6213
Xvector 3.34 0.3795 0.6138
Xvector-AAM 2.55 0.3464 0.5848
Xvector-AAM-MoCo 2.423 0.2856 0.3850

Cosine

Method EER(%) minDCT(0.01) minDCT(0.001)
Ivector 14.65 0.7195 0.8661
Xvector 7.349 0.5799 0.7418
Xvector-AAM 2.306 0.2647 0.3372
Xvector-AAM-MoCo 2.402 0.2232 0.3573

DET

DET

Citation

If you used this code please kindly cite the following paper:
Ke Ding, Xuanji He, Guanglu Wan. Learning Speaker Embedding with Momentum Contrast. arXiv preprint arXiv:2001.01986 (2020)

Contact

If you have any question, please feel free to contact us:

Auther E-mail
Ke Ding [email protected]
Xuanji He [email protected]
Guanglu Wan [email protected]

License

The code is BSD-style licensed, as found in the LICENSE file.

speaker_embedding_moco's People

Contributors

dingke avatar xuanjihe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.