Code Monkey home page Code Monkey logo

rawnet's Introduction

Overview

This github project includes PyTorch implementation for reproducing experiments and DNN models used in the paper Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms which is submitted to Interspeech2020 as a conference paper. Trained model is available at 'Pre-trained_model/rawnet2_best_weights.pt' and extracted speaker embeddings are available at spk_embd/.

For reproduction of the original RawNet paper, please refer to 'RawNet1' folder.

Usage

Environment Setting

We used Nvidia GPU Cloud for conducting our experiments. We used the 'nvcr.io/nvidia/pytorch:19.10-py3' image. Refer to launch_ngc.sh. We used two Titan V GPUs for training.

Training RawNet2

  1. Download VoxCeleb1&2 datasets and move to DB/.
    (or just give directories to your DB as arguments using --DB DIR_TO_VOX1 and --DB_vox2 DIR_TO_VOX2)
    Filetree will be added as reference in meantime.

  2. (selectively) Enter virtual environment using NGC.

  3. Run train_RawNet2.py -name NAME

Evaluating the Trained Model to achieve EER reported in the paper.

  1. Go into Pre-trained_model folder.
  2. Download extracted RawNet2 speaker embeddings for the VoxCeleb1 devset Here (Too big to upload in Github)
  3. Move downloaded speaker embedding to spk_embd/
  4. Run evaluate_pretrained_RawNet2.py

Utilizing Extracted Speaker Embeddings.

We encourage to use the extracted speaker embeddings for further speaker embedding enhancement studies or back-end studies since RawNet2 paper adopts simple cosine similarity for back-end classification.

Speaker embeddings are located under spk_embd/ and are saved using pickle, where it contains a dictionay.
Key : Utterance ID (Spk/videoID/segID) Value : Speaker embedding

Email [email protected] for other details :-).

BibTex

This reposity provides the code for reproducing below papers.

@article{jung2020improved,
  title={Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms},
  author={Jung, Jee-weon and Kim, Seung-bin and Shim, Hye-jin and Kim, Ju-ho and Yu, Ha-Jin},
  journal={arXiv preprint arXiv:2004.00526},
  year={2020}
}
@article{jung2019RawNet,
  title={RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification},
  author={Jung, Jee-weon and Heo, Hee-soo and Kim, ju-ho and Shim, Hye-jin and Yu, Ha-jin},
  journal={Proc. Interspeech 2019},
  pages={1268--1272},
  year={2019}
}

TO-DO

  1. Add comments to codes.
  2. Add filetree of Datasets

Log

  • 2020.04.01. : Initial commit
  • 2020.04.02. : Evaluate Pre-trained Model validated
  • 2020.04.02. : Evaluated training

rawnet's People

Contributors

jungjee avatar kimho1wq avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.