Code Monkey home page Code Monkey logo

rawnet's Introduction

Overview

This repository includes implementations of speaker verification systems that input raw waveforms.

Currently, it includes four systems in python. Detailed instructions on each system are described in individual ReadME files.

RawNet3 in ESPnet

As a part of an open-source project, ESPnet-SPK, pre-trained RawNet3 using the ESPnet-SPK framework is supported for easy access. Albeit the same architecture, with an enhanced framework, the performance has further improved slightly.

  • Performance
    • Vox1-O: EER 0.73%

Usage

As mentioned in Figure 3 of the ESPnet-SPK paper, the below few lines of code are sufficient to extract RawNet3 embeddings. Refer to the code snippet below and replace np.zeros with your raw waveform.

  • ESPnet installation is a prerequisite
import numpy as np 
from espnet2.bin.spk_inference import Speech2Embedding

speech2spk_embed = Speech2Embedding.from_pretrained(model_tag="espnet/voxcelebs12_rawnet3")
embedding = speech2spk_embed(np.zeros(16500)) 		 

ESPnet-SPK is currently on arXiv.

@article{jung2024espnet,
  title={ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models},
  author={Jung, Jee-weon and Zhang, Wangyou and Shi, Jiatong and Aldeneh, Zakaria and Higuchi, Takuya and Theobald, Barry-John and Abdelaziz, Ahmed Hussen and Watanabe, Shinji},
  journal={arXiv preprint arXiv:2401.17230},
  year={2024}
}

RawNet3

  • PyTorch implementation
  • Performance
    • supervised learning with AAM-Softmax: EER 0.89%
    • self-supervised learning: EER 5.40%
  • Training recipe
  • Inference
    • Pre-trained weight parameters are stored in HuggingFace and is included as a submodule.
    • Vox1-O benchmark is available in RawNet3.
    • Extracting speaker embedding from any 16k 16bit mono utterance is supported.
  • Published as a conference paper in Interspeech 2022.
@article{jung2022pushing,
  title={Pushing the limits of raw waveform speaker recognition},
  author={Jung, Jee-weon and Kim, You Jin and Heo, Hee-Soo and Lee, Bong-Jin and Kwon, Youngki and Chung, Joon Son},
  journal={Proc. Interspeech},
  year={2022}
}

RawNet2_modified

  • Code refactoring
  • Performance
    • EER 1.91%
      • Trained using VoxCeleb2
      • VoxCeleb1 original trial
    • Will be used as a baseline system for authors' future works

RawNet2

@article{jung2020improved,
  title={Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms},
  author={Jung, Jee-weon and Kim, Seung-bin and Shim, Hye-jin and Kim, Ju-ho and Yu, Ha-Jin},
  journal={Proc. Interspeech},
  pages={3583--3587},
  year={2020}
}

RawNet

@article{jung2019RawNet,
  title={RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification},
  author={Jung, Jee-weon and Heo, Hee-soo and Kim, ju-ho and Shim, Hye-jin and Yu, Ha-jin},
  journal={Proc. Interspeech},
  pages={1268--1272},
  year={2019}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.