Code Monkey home page Code Monkey logo

pytorch-speech-commands's Introduction

Convolutional neural networks for Google speech commands data set with PyTorch.

General

We, xuyuan and tugstugi, have participated in the Kaggle competition TensorFlow Speech Recognition Challenge and reached the 10-th place. This repository contains a simplified and cleaned up version of our team's code.

Features

  • 1x32x32 mel-spectrogram as network input
  • single network implementation both for CIFAR10 and Google speech commands data sets
  • faster audio data augmentation on STFT
  • Kaggle private LB scores evaluated on 150.000+ audio files

Results

Due to time limit of the competition, we have trained most of the nets with sgd using ReduceLROnPlateau for 70 epochs. For the training parameters and dependencies, see TRAINING.md. Earlier stopping the train process will sometimes produce a better score in Kaggle.

        Model         CIFAR10
test set
accuracy
Speech Commands
test set
accuracy
Speech Commands
test set
accuracy with crop
Speech Commands
Kaggle private LB
score
Speech Commands
Kaggle private LB
score with crop
        Remarks        
VGG19 BN 93.56% 97.337235% 97.527432% 0.87454 0.88030
ResNet32 - 96.181419% 96.196050% 0.87078 0.87419
WRN-28-10 - 97.937089% 97.922458% 0.88546 0.88699
WRN-28-10-dropout 96.22% 97.702999% 97.717630% 0.89580 0.89568
WRN-52-10 - 98.039503% 97.980980% 0.88159 0.88323 another trained model has 97.52%/0.89322
ResNext29 8x64 - 97.190929% 97.161668% 0.89533 0.89733 our best model during competition
DPN92 - 97.190929% 97.249451% 0.89075 0.89286
DenseNet-BC (L=100, k=12) 95.52% 97.161668% 97.147037% 0.88946 0.89134
DenseNet-BC (L=190, k=40) - 97.117776% 97.147037% 0.89369 0.89521

Results with Mixup

After the competition, some of the networks were retrained using mixup: Beyond Empirical Risk Minimization by Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin and David Lopez-Paz.

        Model         CIFAR10
test set
accuracy
Speech Commands
test set
accuracy
Speech Commands
test set
accuracy with crop
Speech Commands
Kaggle private LB
score
Speech Commands
Kaggle private LB
score with crop
        Remarks        
VGG19 BN - 97.483541% 97.542063% 0.89521 0.89839
WRN-52-10 - 97.454279% 97.498171% 0.90273 0.90355 same score as the 16-th place in Kaggle

pytorch-speech-commands's People

Contributors

tugstugi avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.