Code Monkey home page Code Monkey logo

project-acoustic-scene-classification-dcase's Introduction

DL-DIY potential project ideas

  • check out challenge description [here]
  • take into consideration information of different sensors during training, e.g. [residual normalization]
  • train a larger network and then compress it into a smaller one with pruning, distillation or lottery ticket hypothesis, [example]
  • train over multiple domaines using adversarial domain adaptation [ref]
  • train network in a multi-task manner, e.g., adding new granularity of classes [ref]

Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification (DCASE21 Challenge)

SEPARABLE CONVOLUTIONS AND TEST-TIME AUGMENTATIONS FOR LOW-COMPLEXITY AND CALIBRATED ACOUSTIC SCENE CLASSIFICATION

Gilles Puy, Himalaya Jain, Andrei Bursuc
valeo.ai, Paris, France

This repo contains the code to reproduce the results of the systems we submitted to the Task1a of the DCASE21 challenge. Please refer to link1 and link2 for more information about the challenge.

If you find this code useful, please cite our technical report:

@techreport{vai21dcase,
  title={Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification},
  author={Puy, Gilles and Jain, Himalaya and Bursuc, Andrei},
  institution={{DCASE2021 Challenge}},
  year={2021},
}

Preparation

Environment

  • Python >= 3.7
  • CUDA >= 10.2
$ pip install torch===1.7.1 torchvision===0.8.2 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install tqdm scikit-learn tensorboard pandas pyaml torchlibrosa
$ apt install -y libsndfile1

To help you re-create this environment, we also provide the dockerfile used to run the experiments in /path/to/SP4ASC/Dockerfile.

Installation

  1. Clone the repo:
$ git clone https://github.com/valeoai/SP4ASC
  1. Optional. Install this repository:
$ pip install -e /path/to/SP4ASC

You can edit the code on the fly and import function and classes of sp4asc in other project as well.

  1. If needed, you can uninstall this package by typing:
$ pip uninstall sp4asc

DCASE21 Datasets

If not already done, please download the development and evaluation datasets from here.

We suppose that these datasets are stored in /path/to/SP4ASC/data , which should thus contains the following sub-directories:

/path/to/SP4ASC/data/TAU-urban-acoustic-scenes-2020-mobile-development/ 
/path/to/SP4ASC/data/TAU-urban-acoustic-scenes-2021-mobile-evaluation/

If the dataset are stored elsewhere on your system, you can create soft links as follows:

$ mkdir /path/to/SP4ASC/data/
$ ln -s /path/to/TAU-urban-acoustic-scenes-2020-mobile-development/ /path/to/SP4ASC/data/
$ ln -s /path/to/TAU-urban-acoustic-scenes-2021-mobile-evaluation/ /path/to/SP4ASC/data/

Running the code

Testing

Our trained models are available in /path/to/SP4ASC/trained_models/.

  1. The results of the model trained with cross entropy and without mixup can be reproduced by typing:
$ cd /path/to/SP4ASC/
$ python test.py --config configs/cnn6_small_dropout_2_specAugment_128_2_32_2.py --nb_aug 30
  1. The results of the model trained with cross entropy and mixup can be reproduced by typing:
$ cd /path/to/SP4ASC/
$ python test.py --config configs/cnn6_small_dropout_2_specAugment_128_2_16_2_mixup_2.py --nb_aug 30
  1. The results of the model trained with focal loss will be made available.

These models were trained and saved using 32-bit floats. Each model is compressed in test.py by combining all convolutional and batchnorm layers to reach 62'474 parameters and quantized using 16-bit floats.

Training

A script to train a model with cross-entropy and mixup is available at /path/to/SP4ASC/train.py. This script should be called with a config file that sets the training parameters. An example of such a file is given configs/example.py.

One can train a model by typing:

$ cd /path/to/SP4ASC/
$ python train.py --config configs/example.py

Once trained, this model can be evaluated by typing:

$ cd /path/to/SP4ASC/
$ python test.py --config configs/example.py --nb_aug 10

The argument XX after --nb_aug defines the number of augmentations done at test time. Set XX to 0 to remove all augmentations.

Retraining the provided models

The training parameters used for each of the provided models can be found in /path/to/SP4ASC/configs.

  1. The model trained with cross entropy and without mixup can be retrained by typing
$ cd /path/to/SP4ASC/
$ python train.py --config configs/cnn6_small_dropout_2_specAugment_128_2_32_2.py

BEWARE: This will erase the provided checkpoint in the directory /path/to/SP4ASC/trained_model/configs.cnn6_small_dropout_2_specAugment_128_2_32_2! This can be avoided by making a copy of configs/cnn6_small_dropout_2_specAugment_128_2_32_2.py, editing the field -out_dir in the copied file, and using this new file after the argument --config.

  1. The model trained with cross entropy and with mixup can be retrained by typing
$ cd /path/to/SP4ASC/
$ python train.py --config configs/cnn6_small_dropout_2_specAugment_128_2_16_2_mixup_2.py

BEWARE: This will erase the provided checkpoint in the directory /path/to/SP4ASC/trained_model/configs.cnn6_small_dropout_2_specAugment_128_2_16_2_mixup_2.py! This can be avoided by making a copy of configs/cnn6_small_dropout_2_specAugment_128_2_16_2_mixup_2.py and editing the field -out_dir in the copied file, and using this new file after the argument --config.

  1. The pre-trained model with focal loss will be made available.

Using sp4asc model

You can reuse sp4asc in your own project by first installing this package (see above). Then, import the model by typing

from sp4asc.models import Cnn6_60k

The constructor takes two arguments dropout, a scalar indicating the dropout rate, and spec_aug, a list of the form [T, n, F, m], where T is the size of the mask on the time axis, n the number of mask of the time axis, F the size of the mask on the frequency axis, and m the number of masks on the frequency axis.

net = Cnn6_60k(dropout=0.2, spec_aug=[128, 2, 16, 2])

Acknowledgements

Our architecture is based on CNN6 which is described in PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. We modified the original CNN6 architecture by using separable convolutions and changing the number channels per layer to meet the complexity constraints of the DCASE21 Task1a challenge.

The original implementation of CNN6 without separable convolutions is available here. We are grateful to the authors for providing this implementation.

We are also grateful to the authors of torchlibrosa which we use to compute the log-mel spectrograms.

License

The repository is released under the Apache 2.0 license

project-acoustic-scene-classification-dcase's People

Contributors

gpuy avatar abursuc avatar himalayajain avatar andjelatodorovic avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.