Code Monkey home page Code Monkey logo

speakergan's Introduction

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network, by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

Step 1: for vad preprocess.

$python vad.py filelist_with_absolute_path   #It will get vad file saved in the same directory with '_vad' for filename.
$cat filelist_with_absolute_path
/datasdc/librispeech/train-clean-100/458/126305/458-126305-0041.wav
/datasdc/librispeech/train-clean-100/4051/11218/4051-11218-0009.wav
/datasdc/librispeech/train-clean-100/7635/105409/7635-105409-0022.wav
.
.
.

Step 2: for train / test / generate:

python speakergan.py  #You may need to change the path of vad preprocessed wav files.

It costs us about 65 hours to train on NVIDIA Ampere A100 1 card with help of redis cache.

Our results

acc: 98.1955% on testset. Fixed first 1.6 seconds on testset, model/2200_D.pkl.

Alt accuracy
Alt loss_d_loss_g
Alt learning_rate

Generated samples with Generator on model/2200_G.pkl: Alt generated_feature

Details of paper

The following are details about this paper.

================ input ==================

  1. feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)

  2. dataset: librispeech-100 train-clean-100 POI:251

  3. data preprocess: vad、mean and variance normalization, shuffled.

  4. 60% train. 40% test.

================ model architecture ==================

  1. dataflow: data -> feature extraction -> G & D

  2. model architecture:

    G: gated CNN, encoder-decoder, Huber loss + adversarial loss

    D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss

  3. G: shuffler layer, GLU

  4. D: ReLU

================ training ==================

  1. lr: 0-9, 0.0005 | 9-49, 0.0002

  2. L(d): λ1 λ2 = 1

  3. batch_size: 128 # diff with paper.

  4. epoch: 2200 #diff with paper

  5. D_train steps / G_train steps = 4

  6. Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

  1. weights,bias initialize function, we use: xavier_uniform and zeros

  2. pytorch huber_loss.

  3. for shorter wav, paper: padded with zero. we: padded with feature again.

  4. gated cnn architecture.

  5. we use webrtcvad mode(3) for vad preprocess.

  6. Paper error 1: we think the paper missing a plus sign in formula (5)

  7. Paper error 2: we think the structure of conv6 in Generator is wrong, the output channel should be 64.

speakergan's People

Contributors

dyyzhmm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.