yeyupiaoling / audioclassification-pytorch Goto Github PK

The Pytorch implementation of sound classification supports EcapaTdnn, PANNS, TDNN, Res2Net, ResNetSE and other models, as well as a variety of preprocessing methods.

License: Apache License 2.0

Python 99.37% Shell 0.63%

audioclassification ecapa-tdnn panns pytorch res2net resnet-se urbansound8k

audioclassification-pytorch's People

Contributors

Stargazers

Watchers

Forkers

chenghui0229 insensiblee yang-heu heyuewen muzihuole bingqzzz alice201795 onlysdy jayendsff durianlian jackyin68 sober-6 sinofs wuxiaolianggit zqwezy168 antonizdp jzbcoding tjb12345 sdyf yqchen8 zsffuture sywee conor-yang-cn perfectstorm88 huyihaihaha zhouhihi guomin olalalahhh wofandeyibi seanfight wangbingsum lyan-ing pgolds x2ysw bg4xsd shmuel44 lyrhy zhaoke1024 schlang akiyama621 seo813 tlwzzy httang1224 kratorado 8-diagrams zmcc18 yourengod w28501589 placebeyondtheclouds xiaoaaa2 gaopeng-bai quliulangle burchell1992 honglinchu joey-xx kathy883 ethanyhzhang tangzhimiao jannymon scofir linhong00316 snowy-su adams521 dc-j zhangsanfeng86

audioclassification-pytorch's Issues

训练自己的数据集

请问在训练自己的数据集的时候，我的音频是30s的，我要自己分割成若干子音频吗，比如3s一个，还是可以直接训练30s的就行，如果要分，多长一个比较好呢

cuda一定要使用11.6吗？安装CUDA 12.1后训练报错

torch==2.2.2，问题是我想安装11.6又不好装……难搞，我显卡cuda版本都12.1了

StepLr

in your code, you're using steplr scheduler = StepLR(optimizer, step_size=args.learning_rate, gamma=0.8, verbose=True) . It's mean that you're using step_size=1e-3. Something wrong here???

你好，我在使用自己的数据集进行训练时，出现了以下报错，请问要怎么解决呢？
size mismatch for fc.weight: copying a param with shape torch.Size([10, 192]) from checkpoint, the shape in current model is torch.Size([6, 192]).
size mismatch for fc.bias: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([6]).
用自己的数据集训练报错.txt

Multi-gpu training RuntimeError

return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torchaudio/transforms/transforms.py", line 106, i$
forward
return F.spectrogram(
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 112, in
spectrogram
spec_f = torch.stft(
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torch/functional.py", line 606, in stft
return VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: stft input and window must be on the same device but got self on cuda:1 and window on cuda:0
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1237 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 1238) of binary: /r
oot/miniconda3/envs/ac/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/ac/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.12.1', 'console_scripts', 'torchrun')())
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/
init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)

如何分类一个10毫秒左右的音频

一段分割出来的10毫秒左右的语音，能否做分类？

.

A same torchaudio warning issued repeatedly in trainning and infering

I'm so appreciated for your excelent work of this project that help me a lot in my learning in the audio AI field. As mentioned above, the torchaudio warning showed as following:

UserWarning: At least one mel filterbank has all zero values. The value for n_mels(64) may be set too high. Or, the value forn_freqs (513) may be set too low.

Whether the model's accuracy is affected by the issue is my most concern. Can you help me please?

没有 PANNS 模型

缺少 readme 中提到的 PANNS 模型

learning

大佬，请教程序对同训练的类别完全不相关的声音如何反馈识别结果的？
当我使用infer_recoder.py这个程序的时候如果盖住话筒，系统仍然反馈识别结果是训练中的类别中的一个，但是类别很固定。

配置文件中的数据预处理参数有问题， feature_method: 'Spectrogram'，报错，参数解析有问题

如题，在数据预处理中选择了Spectrogram，原始的是MelSpectrogram没问题，但是选择了Spectrogram就报错，从错误看是数据解析有问题，具体如下，希望大神更新一下

预训练模型

你好，有没有预训练模型提供下测试

How to detect and classify multiple sound at the same time?

Hey !
Thank you so much for the interesting project.
I have some questions about the Audio Classification, could you please make me clearly understand?
Your project working properly! But, in case the WAV file contains many kinds of sounds at the same time, how to detect and classification??
For, Example below figure is the spectrogram of the sample WAV file, which contains Cricket and Dog barking sounds at some moments.
But your project only detected Dog barking.

Thanks !

KeyError: 'n_mfcc'

Traceback (most recent call last):
  File "train.py", line 20, in <module>
    trainer = MAClsTrainer(configs=args.configs, use_gpu=args.use_gpu)
  File "/data/disk1/ybZhang/AudioClassification-Pytorch/macls/trainer.py", line 67, in __init__
    self.audio_featurizer = AudioFeaturizer(feature_conf=self.configs.feature_conf, **self.configs.preprocess_conf)
  File "/data/disk1/ybZhang/AudioClassification-Pytorch/macls/data_utils/featurizer.py", line 28, in __init__
    n_mfcc=self._feature_conf.n_mfcc,
KeyError: 'n_mfcc'

can flac format be used as dataset

Great thanks for providing such an awesome project ！
I have a batch of audio files with flac format, can they used as dataset to run this project

Fixing A Broken Link

I saw on the PyDigger site that there is a broken link to github.

In the setup.py file,
url='https://github.com/yeyupiaoling/AudioClassification_Pytorch',

the url is broken,
i am opening a pr to fix it.

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

我更改了输入为tensorfloat但是还是报相似的错误

如何提交并发量和减少cpu的资源

如下：
from fastapi import FastAPI, File, UploadFile
from macls.predict import MAClsPredictor
from macls.utils.utils import add_arguments, print_arguments
import io

创建 FastAPI 应用

app = FastAPI()

定义全局变量用于存储模型

predictor = None

加载模型和配置

def load_model():
global predictor
#configs = 'configs/resnet_se.yml'
configs = 'configs/cam++.yml'
model_path = 'models/CAMPPlus_Fbank/best_model/'
#model_path = 'models/ResNetSE_Fbank/best_model/'
use_gpu = False
predictor = MAClsPredictor(configs=configs, model_path=model_path, use_gpu=use_gpu)

定义路由接口

@app.post("/predict")
async def predict_audio(file: UploadFile):
global predictor
if predictor is None:
load_model()
audio_data = await file.read()
label, score = predictor.predict(audio_data=audio_data)
return {"label": label, "score": score}

if name == "main":
import uvicorn
load_model() # 启动时加载模型
uvicorn.run(app, host="0.0.0.0", port=8898)

在使用postman调用时占用的cpu较高且无法支持多并发

RuntimeWarning when train

I got a warning when training, the detailed message is as follow
/data/aigc/speech/AudioClassification-Pytorch/macls/data_utils/audio.py:501: RuntimeWarning: divide by zero encountered in log10 return 10 * np.log10(mean_square)
should it be noticed?

训练完成后，UrbanSound8K测试集的准确率可以达到多少

这个训练出来的模型可以移植到嵌入式设备中使用吗

数据集里没有train_list.txt 和test_list.txt

标签文件 label_list.txt 怎么生成的呢

训练次数

请问训练多少次比较合适，如何保留最优的模型，或者在最优结果时停止呢。

您好，利用音频分类方法能把一段语音中想要的语音字段能给分出来嘛

How to run with features MFCC

I can't runing with MFCC features

My config:
dataset_conf:
batch_size: 32
num_class: 10
num_workers: 8
min_duration: 0.5
chunk_duration: 3
do_vad: False
sample_rate: 16000
use_dB_normalization: True
target_dB: -20
train_list: 'dataset/train_list.txt'
test_list: 'dataset/test_list.txt'
label_list_path: 'dataset/label_list.txt'

preprocess_conf:
feature_method: 'MFCC'

feature_conf:
sample_rate: 16000
n_mfcc: 40
melkwargs:
n_fft: 1024
hop_length: 320
win_length: 1024
f_min: 50.0
f_max: 14000.0
n_mels: 64

optimizer_conf:
learning_rate: 0.001
weight_decay: 1e-6

model_conf:
embd_dim: 192
channels: 256

train_conf:
max_epoch: 30
log_interval: 10

use_model: 'ecapa_tdnn'

RuntimeError: Given groups=1, weight of size [256, 64, 5], expected input[32, 40, 151] to have 64 channels, but got 40 channels instead