Code Monkey home page Code Monkey logo

pooya-mohammadi / audio-classification-pytorch Goto Github PK

View Code? Open in Web Editor NEW
32.0 1.0 3.0 830 KB

In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other audio classification task by simply changing the number of classes and the input dataset.

Python 21.55% Jupyter Notebook 78.45%
audio-classification deep-learning deep-utils python pytorch lstm transformers wav2vec2

audio-classification-pytorch's Introduction

Audio Classification

In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other classification by changing the number of classes and the input dataset.

Dataset format

Dataset should be a csv file that has two columns: audio_path and lable.

                                          audio_path   label
0  /home/ai/projects/speech/dataset/asr/new-raw-0.wav  female
1  /home/ai/projects/speech/dataset/asr/samples_1.wav  male
2  /home/ai/projects/speech/dataset/asr/new-raw-2.wav  female
3  /home/ai/projects/speech/dataset/asr/new-raw-3.wav  male
4  /home/ai/projects/speech/dataset/asr/new-raw-4.wav  female

Models

  1. LSTM_Model: uses mfccs to train a lstm model for audio classification. Trained using pytorchlightning.
    1. the idea of this structure is taken from LearnedVector repository which contains a wakeup model.
  2. transformer_scratch: Uses a transformer block for training an audio classification model with mfccs taken as inputs. Trained using pytorchlightning.
    1. main implementation is taken from AnubhavGupta3377's repo called Text-Classification-Models-Pytorch
    2. It's modified to train audio samples.
  3. wav2vec2: Fine-tuning wav2vec2-base as an audio classification model using huggingface trainer.

Result on Gender Recognition

Trained and evaluated on a custom dataset. You can simply download common-voice dataset and use the samples.

Model Train ACC Val Acc Train F1-score Val-F1-score
LSTM 89 90 90.83 91
Wav2vec2 - 96.4 - 96.4
transfomer 85.1 81.7 87.1 84.6

references:

  1. https://github.com/LearnedVector/A-Hackers-AI-Voice-Assistant
  2. https://github.com/huggingface/transformers
  3. https://github.com/AnubhavGupta3377/Text-Classification-Models-Pytorch
  4. https://pytorch.org/tutorials/beginner/transformer_tutorial.html
  5. https://github.com/pooya-mohammadi/deep_utils

audio-classification-pytorch's People

Contributors

pooya-mohammadi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

audio-classification-pytorch's Issues

IndexError: index -1 is out of bounds for dimension 1 with size 0

Could you please tell me how I should modify the training file when I replace the binary classification task with a multi-classification task? Thank you for your open source.

python train.py
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/sty/audio/train.py", line 147, in
main()
File "/sty/audio/train.py", line 135, in main
trainer.fit(model=lit_model, train_dataloaders=train_loader, val_dataloaders=val_loader)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1103, in _run
results = self._run_stage()
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1182, in _run_stage
self._run_train()
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1195, in _run_train
self._run_sanity_check()
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1267, in _run_sanity_check
val_loop.run()
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 152, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 137, in advance
output = self._evaluation_step(**kwargs)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 234, in _evaluation_step
output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1485, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/sty/audio/train.py", line 62, in validation_step
return self.get_step_metrics(batch)
File "/sty/audio/train.py", line 36, in get_step_metrics
logits = self.model(mfcc)
File "/root/anaconda3/envs/audio/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/sty/audio/model/transformer_cls.py", line 37, in forward
final_feature_map = encoded_features[:, -1, :]
IndexError: index -1 is out of bounds for dimension 1 with size 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.