Code Monkey home page Code Monkey logo

audioclassification-pytorch's People

Contributors

shmuel44 avatar yeyupiaoling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

audioclassification-pytorch's Issues

训练自己的数据集

请问在训练自己的数据集的时候,我的音频是30s的,我要自己分割成若干子音频吗,比如3s一个,还是可以直接训练30s的就行,如果要分,多长一个比较好呢

StepLr

in your code, you're using steplr scheduler = StepLR(optimizer, step_size=args.learning_rate, gamma=0.8, verbose=True) . It's mean that you're using step_size=1e-3. Something wrong here???

自选数据集训练报错

你好,我在使用自己的数据集进行训练时,出现了以下报错,请问要怎么解决呢?
size mismatch for fc.weight: copying a param with shape torch.Size([10, 192]) from checkpoint, the shape in current model is torch.Size([6, 192]).
size mismatch for fc.bias: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([6]).
用自己的数据集训练报错.txt

Multi-gpu training RuntimeError

return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torchaudio/transforms/transforms.py", line 106, i$
forward
return F.spectrogram(
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 112, in
spectrogram
spec_f = torch.stft(
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torch/functional.py", line 606, in stft
return VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: stft input and window must be on the same device but got self on cuda:1 and window on cuda:0
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1237 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 1238) of binary: /r
oot/miniconda3/envs/ac/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/ac/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.12.1', 'console_scripts', 'torchrun')())
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/

init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/ac/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)

A same torchaudio warning issued repeatedly in trainning and infering

I'm so appreciated for your excelent work of this project that help me a lot in my learning in the audio AI field. As mentioned above, the torchaudio warning showed as following:

UserWarning: At least one mel filterbank has all zero values. The value for n_mels(64) may be set too high. Or, the value forn_freqs (513) may be set too low.

Whether the model's accuracy is affected by the issue is my most concern. Can you help me please?

learning

大佬,请教程序对同训练的类别完全不相关的声音如何反馈识别结果的?
当我使用infer_recoder.py这个程序的时候如果盖住话筒,系统仍然反馈识别结果是训练中的类别中的一个,但是类别很固定。

How to detect and classify multiple sound at the same time?

Hey !
Thank you so much for the interesting project.
I have some questions about the Audio Classification, could you please make me clearly understand?
Your project working properly! But, in case the WAV file contains many kinds of sounds at the same time, how to detect and classification??
For, Example below figure is the spectrogram of the sample WAV file, which contains Cricket and Dog barking sounds at some moments.
But your project only detected Dog barking.
image
Thanks !

KeyError: 'n_mfcc'

Traceback (most recent call last):
  File "train.py", line 20, in <module>
    trainer = MAClsTrainer(configs=args.configs, use_gpu=args.use_gpu)
  File "/data/disk1/ybZhang/AudioClassification-Pytorch/macls/trainer.py", line 67, in __init__
    self.audio_featurizer = AudioFeaturizer(feature_conf=self.configs.feature_conf, **self.configs.preprocess_conf)
  File "/data/disk1/ybZhang/AudioClassification-Pytorch/macls/data_utils/featurizer.py", line 28, in __init__
    n_mfcc=self._feature_conf.n_mfcc,
KeyError: 'n_mfcc'

can flac format be used as dataset

Great thanks for providing such an awesome project !
I have a batch of audio files with flac format, can they used as dataset to run this project

Fixing A Broken Link

I saw on the PyDigger site that there is a broken link to github.

In the setup.py file,
url='https://github.com/yeyupiaoling/AudioClassification_Pytorch',

the url is broken,
i am opening a pr to fix it.

如何提交并发量和减少cpu的资源

如下:
from fastapi import FastAPI, File, UploadFile
from macls.predict import MAClsPredictor
from macls.utils.utils import add_arguments, print_arguments
import io

创建 FastAPI 应用

app = FastAPI()

定义全局变量用于存储模型

predictor = None

加载模型和配置

def load_model():
global predictor
#configs = 'configs/resnet_se.yml'
configs = 'configs/cam++.yml'
model_path = 'models/CAMPPlus_Fbank/best_model/'
#model_path = 'models/ResNetSE_Fbank/best_model/'
use_gpu = False
predictor = MAClsPredictor(configs=configs, model_path=model_path, use_gpu=use_gpu)

定义路由接口

@app.post("/predict")
async def predict_audio(file: UploadFile):
global predictor
if predictor is None:
load_model()
audio_data = await file.read()
label, score = predictor.predict(audio_data=audio_data)
return {"label": label, "score": score}

if name == "main":
import uvicorn
load_model() # 启动时加载模型
uvicorn.run(app, host="0.0.0.0", port=8898)

在使用postman调用时 占用的cpu较高 且无法支持多并发

RuntimeWarning when train

I got a warning when training, the detailed message is as follow
/data/aigc/speech/AudioClassification-Pytorch/macls/data_utils/audio.py:501: RuntimeWarning: divide by zero encountered in log10 return 10 * np.log10(mean_square)
should it be noticed?

训练次数

请问训练多少次比较合适,如何保留最优的模型,或者在最优结果时停止呢。

How to run with features MFCC

I can't runing with MFCC features

My config:
dataset_conf:
batch_size: 32
num_class: 10
num_workers: 8
min_duration: 0.5
chunk_duration: 3
do_vad: False
sample_rate: 16000
use_dB_normalization: True
target_dB: -20
train_list: 'dataset/train_list.txt'
test_list: 'dataset/test_list.txt'
label_list_path: 'dataset/label_list.txt'

preprocess_conf:
feature_method: 'MFCC'

feature_conf:
sample_rate: 16000
n_mfcc: 40
melkwargs:
n_fft: 1024
hop_length: 320
win_length: 1024
f_min: 50.0
f_max: 14000.0
n_mels: 64

optimizer_conf:
learning_rate: 0.001
weight_decay: 1e-6

model_conf:
embd_dim: 192
channels: 256

train_conf:
max_epoch: 30
log_interval: 10

use_model: 'ecapa_tdnn'

RuntimeError: Given groups=1, weight of size [256, 64, 5], expected input[32, 40, 151] to have 64 channels, but got 40 channels instead

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.