yeyupiaoling / audioclassification-paddlepaddle Goto Github PK

View Code? Open in Web Editor NEW

82.0 4.0 14.0 588 KB

基于PaddlePaddle实现的音频分类，支持EcapaTdnn、PANNS、TDNN、Res2Net、ResNetSE等各种模型，还有多种预处理方法

License: Apache License 2.0

Python 99.40% Shell 0.60%

paddlepaddle librosa urbansound8k audio-classification ecapa-tdnn res2net tdnn panns resnet-se

audioclassification-paddlepaddle's Introduction

开发者，你们好！

核心项目

项目类型	Pytorch版本	PaddlePaddle版本
语音识别	MASR	PPASR
声纹识别	VoiceprintRecognition-Pytorch	VoiceprintRecognition-PaddlePaddle
声音分类	AudioClassification-Pytorch	AudioClassification-PaddlePaddle
语音情感识别	SpeechEmotionRecognition-Pytorch	SpeechEmotionRecognition-PaddlePaddle
语音合成	VITS-Pytorch	VITS-PaddlePaddle

语音项目

基于PaddlePaddle动态图实现的语音识别项目：PPASR
基于Pytorch实现的语音识别项目：MASR
微调Whisper模型和加速推理：Whisper-Finetune
基于PaddlePaddle静态图实现的语音识别项目：PaddlePaddle-DeepSpeech
基于Pytorch实现的声音分类项目：AudioClassification-Pytorch
基于PaddlePaddle实现声音分类项目：AudioClassification-PaddlePaddle
基于PaddlePaddle实现声纹识别项目：VoiceprintRecognition-PaddlePaddle
基于Pytorch实现声纹识别项目：VoiceprintRecognition-Pytorch
基于Tensorflow实现声纹识别项目：VoiceprintRecognition-Tensorflow
基于Keras实现声纹识别项目：VoiceprintRecognition-Keras
基于PaddlePaddle实现的语音情感识别：SpeechEmotionRecognition-PaddlePaddle
基于Pytorch实现的语音情感识别：SpeechEmotionRecognition-Pytorch
基于PaddlePaddle实现的VIST语音合成：VITS-PaddlePaddle
基于Pytorch实现的VIST语音合成：VITS-Pytorch

视觉项目

基于PaddlePaddle实现的人脸识别项目：PaddlePaddle-MobileFaceNets
基于Pytorch实现的人脸识别项目：Pytorch-MobileFaceNet
基于PaddlePaddle实现的SSD目标检测模型：PaddlePaddle-SSD
基于Pytorch实现的人脸关键点检测MTCNN模型：Pytorch-MTCNN
基于PaddlePaddle实现的人脸关键点检测MTCNN模型：PaddlePaddle-MTCNN
基于PaddlePaddle实现的文字识别CRNN模型：PaddlePaddle-CRNN
基于PaddlePaddle实现的人流密度CrowdNet模型：PaddlePaddle-CrowdNet
基于MXNET实现的年龄性别识别项目：Age-Gender-MXNET
使用Tensorflow Lite、Paddle Lite、MNN、TNN框架在Android上不是图像分类模型：ClassificationForAndroid
基于PaddlePaddle实现的PP-YOLOE模型：PP-YOLOE
在Android部署的人脸检测、口罩识别、关键检测模型：FaceKeyPointsMask
在Android上部署语义分割模型实现换人物背景：ChangeHumanBackground
使用Tensorflow实现的人脸识别项目：Tensorflow-FaceRecognition

系列教程

PaddlePaddle V2版本系列教程：LearnPaddle
PaddlePaddle Fluid版本系列教程：LearnPaddle2

书籍源码

《PaddlePaddle从入门到实战》源码：PaddlePaddleCourse
《深度学习应用实战之PaddlePaddle》源码：BookSource

github contribution grid snake animation

audioclassification-paddlepaddle's People

Contributors

Stargazers

Watchers

Forkers

gt-acerzhang weimingzero 942980649 livingbody edencfc wang-vip antonizdp wwwdddaafghjkl usethisforwhat solid-sea chloesusu 93yh 8-diagrams mnmnun

audioclassification-paddlepaddle's Issues

训练时报错

输出如下

-----------  Configuration Arguments -----------
batch_size: 32
gpus: 0
input_shape: (None, 1, 128, 128)
learning_rate: 0.001
num_classes: 10
num_epoch: 50
num_workers: 4
save_model: models/
test_list_path: dataset/test_list.txt
train_list_path: dataset/train_list.txt
------------------------------------------------
W1230 03:44:49.589563  1696 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 6.0, Driver API Version: 11.2, Runtime API Version: 10.2
W1230 03:44:49.593922  1696 device_context.cc:422] device: 0, cuDNN Version: 7.6.
-------------------------------------------------------------------------------
   Layer (type)         Input Shape          Output Shape         Param #    
===============================================================================
     Conv2D-1        [[1, 1, 128, 128]]    [1, 64, 64, 64]         3,136     
   BatchNorm2D-1     [[1, 64, 64, 64]]     [1, 64, 64, 64]          256      
      ReLU-1         [[1, 64, 64, 64]]     [1, 64, 64, 64]           0       
    MaxPool2D-1      [[1, 64, 64, 64]]     [1, 64, 32, 32]           0       
     Conv2D-2        [[1, 64, 32, 32]]     [1, 64, 32, 32]        36,864     
   BatchNorm2D-2     [[1, 64, 32, 32]]     [1, 64, 32, 32]          256      
      ReLU-2         [[1, 64, 32, 32]]     [1, 64, 32, 32]           0       
     Conv2D-3        [[1, 64, 32, 32]]     [1, 64, 32, 32]        36,864     
   BatchNorm2D-3     [[1, 64, 32, 32]]     [1, 64, 32, 32]          256      
   BasicBlock-1      [[1, 64, 32, 32]]     [1, 64, 32, 32]           0       
     Conv2D-4        [[1, 64, 32, 32]]     [1, 64, 32, 32]        36,864     
   BatchNorm2D-4     [[1, 64, 32, 32]]     [1, 64, 32, 32]          256      
      ReLU-3         [[1, 64, 32, 32]]     [1, 64, 32, 32]           0       
     Conv2D-5        [[1, 64, 32, 32]]     [1, 64, 32, 32]        36,864     
   BatchNorm2D-5     [[1, 64, 32, 32]]     [1, 64, 32, 32]          256      
   BasicBlock-2      [[1, 64, 32, 32]]     [1, 64, 32, 32]           0       
     Conv2D-6        [[1, 64, 32, 32]]     [1, 64, 32, 32]        36,864     
   BatchNorm2D-6     [[1, 64, 32, 32]]     [1, 64, 32, 32]          256      
      ReLU-4         [[1, 64, 32, 32]]     [1, 64, 32, 32]           0       
     Conv2D-7        [[1, 64, 32, 32]]     [1, 64, 32, 32]        36,864     
   BatchNorm2D-7     [[1, 64, 32, 32]]     [1, 64, 32, 32]          256      
   BasicBlock-3      [[1, 64, 32, 32]]     [1, 64, 32, 32]           0       
     Conv2D-9        [[1, 64, 32, 32]]     [1, 128, 16, 16]       73,728     
   BatchNorm2D-9     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
      ReLU-5         [[1, 128, 16, 16]]    [1, 128, 16, 16]          0       
     Conv2D-10       [[1, 128, 16, 16]]    [1, 128, 16, 16]       147,456    
  BatchNorm2D-10     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
     Conv2D-8        [[1, 64, 32, 32]]     [1, 128, 16, 16]        8,192     
   BatchNorm2D-8     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
   BasicBlock-4      [[1, 64, 32, 32]]     [1, 128, 16, 16]          0       
     Conv2D-11       [[1, 128, 16, 16]]    [1, 128, 16, 16]       147,456    
  BatchNorm2D-11     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
      ReLU-6         [[1, 128, 16, 16]]    [1, 128, 16, 16]          0       
     Conv2D-12       [[1, 128, 16, 16]]    [1, 128, 16, 16]       147,456    
  BatchNorm2D-12     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
   BasicBlock-5      [[1, 128, 16, 16]]    [1, 128, 16, 16]          0       
     Conv2D-13       [[1, 128, 16, 16]]    [1, 128, 16, 16]       147,456    
  BatchNorm2D-13     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
      ReLU-7         [[1, 128, 16, 16]]    [1, 128, 16, 16]          0       
     Conv2D-14       [[1, 128, 16, 16]]    [1, 128, 16, 16]       147,456    
  BatchNorm2D-14     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
   BasicBlock-6      [[1, 128, 16, 16]]    [1, 128, 16, 16]          0       
     Conv2D-15       [[1, 128, 16, 16]]    [1, 128, 16, 16]       147,456    
  BatchNorm2D-15     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
      ReLU-8         [[1, 128, 16, 16]]    [1, 128, 16, 16]          0       
     Conv2D-16       [[1, 128, 16, 16]]    [1, 128, 16, 16]       147,456    
  BatchNorm2D-16     [[1, 128, 16, 16]]    [1, 128, 16, 16]         512      
   BasicBlock-7      [[1, 128, 16, 16]]    [1, 128, 16, 16]          0       
     Conv2D-18       [[1, 128, 16, 16]]     [1, 256, 8, 8]        294,912    
  BatchNorm2D-18      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
      ReLU-9          [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-19        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-19      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
     Conv2D-17       [[1, 128, 16, 16]]     [1, 256, 8, 8]        32,768     
  BatchNorm2D-17      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
   BasicBlock-8      [[1, 128, 16, 16]]     [1, 256, 8, 8]           0       
     Conv2D-20        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-20      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
      ReLU-10         [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-21        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-21      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
   BasicBlock-9       [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-22        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-22      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
      ReLU-11         [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-23        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-23      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
   BasicBlock-10      [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-24        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-24      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
      ReLU-12         [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-25        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-25      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
   BasicBlock-11      [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-26        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-26      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
      ReLU-13         [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-27        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-27      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
   BasicBlock-12      [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-28        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-28      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
      ReLU-14         [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-29        [[1, 256, 8, 8]]      [1, 256, 8, 8]        589,824    
  BatchNorm2D-29      [[1, 256, 8, 8]]      [1, 256, 8, 8]         1,024     
   BasicBlock-13      [[1, 256, 8, 8]]      [1, 256, 8, 8]           0       
     Conv2D-31        [[1, 256, 8, 8]]      [1, 512, 4, 4]       1,179,648   
  BatchNorm2D-31      [[1, 512, 4, 4]]      [1, 512, 4, 4]         2,048     
      ReLU-15         [[1, 512, 4, 4]]      [1, 512, 4, 4]           0       
     Conv2D-32        [[1, 512, 4, 4]]      [1, 512, 4, 4]       2,359,296   
  BatchNorm2D-32      [[1, 512, 4, 4]]      [1, 512, 4, 4]         2,048     
     Conv2D-30        [[1, 256, 8, 8]]      [1, 512, 4, 4]        131,072    
  BatchNorm2D-30      [[1, 512, 4, 4]]      [1, 512, 4, 4]         2,048     
   BasicBlock-14      [[1, 256, 8, 8]]      [1, 512, 4, 4]           0       
     Conv2D-33        [[1, 512, 4, 4]]      [1, 512, 4, 4]       2,359,296   
  BatchNorm2D-33      [[1, 512, 4, 4]]      [1, 512, 4, 4]         2,048     
      ReLU-16         [[1, 512, 4, 4]]      [1, 512, 4, 4]           0       
     Conv2D-34        [[1, 512, 4, 4]]      [1, 512, 4, 4]       2,359,296   
  BatchNorm2D-34      [[1, 512, 4, 4]]      [1, 512, 4, 4]         2,048     
   BasicBlock-15      [[1, 512, 4, 4]]      [1, 512, 4, 4]           0       
     Conv2D-35        [[1, 512, 4, 4]]      [1, 512, 4, 4]       2,359,296   
  BatchNorm2D-35      [[1, 512, 4, 4]]      [1, 512, 4, 4]         2,048     
      ReLU-17         [[1, 512, 4, 4]]      [1, 512, 4, 4]           0       
     Conv2D-36        [[1, 512, 4, 4]]      [1, 512, 4, 4]       2,359,296   
  BatchNorm2D-36      [[1, 512, 4, 4]]      [1, 512, 4, 4]         2,048     
   BasicBlock-16      [[1, 512, 4, 4]]      [1, 512, 4, 4]           0       
AdaptiveAvgPool2D-1   [[1, 512, 4, 4]]      [1, 512, 1, 1]           0       
     Linear-1            [[1, 512]]            [1, 10]             5,130     
===============================================================================
Total params: 21,300,554
Trainable params: 21,266,506
Non-trainable params: 34,048
-------------------------------------------------------------------------------
Input size (MB): 0.06
Forward/backward pass size (MB): 28.00
Params size (MB): 81.26
Estimated Total Size (MB): 109.32
-------------------------------------------------------------------------------

Epoch 0: StepDecay set learning rate to 0.001.
/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
ERROR:root:DataLoader reader thread raised an exception!
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 411, in _thread_loop
    batch = self._get_data()
  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 525, in _get_data
    batch.reraise()
  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/worker.py", line 168, in reraise
    raise self.exc_type(msg)
ValueError: DataLoader worker(2) caught ValueError with message:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/worker.py", line 320, in _worker_loop
    batch = fetcher.fetch(indices)
  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/fetcher.py", line 99, in fetch
    data = [self.dataset[idx] for idx in batch_indices]
  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/fetcher.py", line 99, in <listcomp>
    data = [self.dataset[idx] for idx in batch_indices]
  File "/content/AudioClassification-PaddlePaddle/reader.py", line 36, in __getitem__
    spec_mag = load_audio(audio_path, mode=self.model, spec_len=self.spec_len)
  File "/content/AudioClassification-PaddlePaddle/reader.py", line 14, in load_audio
    crop_start = random.randint(0, spec_mag.shape[1] - spec_len)
  File "/usr/lib/python3.7/random.py", line 222, in randint
    return self.randrange(a, b+1)
  File "/usr/lib/python3.7/random.py", line 200, in randrange
    raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0,-15, -15)


Traceback (most recent call last):
  File "train.py", line 125, in <module>
    train(args)
  File "train.py", line 85, in train
    for batch_id, (spec_mag, label) in enumerate(train_loader()):
  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 585, in __next__
    data = self._reader.read_next_var_list()
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)

/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")

BaiduCN1.2k Model预训练声学模型

您好，请问可以加载预训练声学模型进行音频标签的分类吗

运行train.py出现这个错误KeyError: "attribute 'fft_window' already exists."怎么解决？

声音异常检测

如果我想添加一个声音异常检测功能该怎么做呢，比如我划分了十个类别，当检测到声音不在这十个类别中，就输出其他类别，类似这样的怎么实现呢，我的想法是检测声音对比后返回一个和各类别相似度的分数，低于阈值的就是其他类别。问题是如何用你的代码得到这个分数呢。

训练时报错，提示ValueError: invalid literal for int() with base 10: '知更鸟'

我看代码的逻辑是要在这里把取到的指转为int
return spec_mag, np.array(int(label), dtype=np.int64)
请问是要求train_list.txt 音频路径\t音频对应的类别标签中的【音频对应的类别标签】必须是数字吗？

请问音频分类的预测过程C++实现的版本有么？

关于urbansound8k数据集

训练标签是否正确？fold1-fold10不是类别标签吧

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.