ksanjeevan / crnn-audio-classification Goto Github PK

View Code? Open in Web Editor NEW

370.0 8.0 80.0 3.56 MB

UrbanSound classification using Convolutional Recurrent Networks in PyTorch

License: MIT License

Python 73.79% Jupyter Notebook 26.21%

audio lstm crnn spectrogram melspectrogram convnet rnn audio-classification pytorch

crnn-audio-classification's People

Contributors

Stargazers

Watchers

Forkers

elyesmanai geochri appletree123123 fkqw dearleiii davidko3 huzhangron mathematiguy ashishpatel26 develooper1994 okanlv kuonanhong aiainui hahaxun cri5castro mantek-chadha joe-nano dorucioclea donghwa-kim manojkl fortuneseeker linhong00316 vettel555 eloqute zzfon light-dawn naomieab mun3im xinsuinizhuan oriankeith001 wenwanchen dahiyaaneesh dendisuhubdy brilliant-stars baldbodybuilder chester-w-xie yuriyarabskyy liuguoyou thelou1s rockerstone michaelldd abhishekchoudhary20141150 doandongnguyen jmhuer yilinw92 road2018 ferugit yingz-e turchaev skshahnawaz javadba russellizadi k-bs kyhoolee zhangwq740 dongkeon leeshd freefrit muskbing lanpang1 miblue119 muzihuole guang-yao yeok-c jackyin68 kacel33 alexalvis 627847108 kimsanyu for3st321 ovuruska maoyuexin work-kelv brynnzhou maplestar2099 5l1v3r1 jeffowino

crnn-audio-classification's Issues

ValueError: optimizer got an empty parameter list

Hi, when I try to train the model, I get the following output. Any idea how to handle it?
./run.py train -c config.json --cfg arch.cfg

Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
AudioCRNN(
(spec): MelspectrogramStretch(num_mels=128, fft_length=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])
(net): ModuleDict(
(main): Sequential()
)
)
Trainable parameters: 0
Traceback (most recent call last):
File "./run.py", line 176, in
train_main(config, args.resume)
File "./run.py", line 97, in train_main
optimizer = getattr(torch.optim, opt_name)(trainable_params, **opt_args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/optim/adam.py", line 42, in init
super(Adam, self).init(params, defaults)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/optim/optimizer.py", line 46, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list

Trainable parameters: 0

when I run the run.py, I log the model,and get the print
"AudioCRNN(
(spec): MelspectrogramStretch()
(net): ModuleDict(
(main): Sequential()
)
)
Trainable parameters: 0
Traceback (most recent call last):
File "./run.py", line 175, in
train_main(config, args.resume)
File "./run.py", line 96, in train_main
optimizer = getattr(torch.optim, opt_name)(trainable_params, **opt_args)
File "/home/wyanqing/.conda/envs/yq/lib/python3.7/site-packages/torch/optim/adam.py", line 48, in init
super(Adam, self).init(params, defaults)
File "/home/wyanqing/.conda/envs/yq/lib/python3.7/site-packages/torch/optim/optimizer.py", line 47, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list".
it seems that there is some problems about the model . how should i fix this ,thanks!

how can I get the input size

now,I want to covert the mode to caffe; so I want to the input size; like this input = torch.ones([1, 3, 224, 224])

No License

Hi, can you please add a License to the project?

Thanks.

macOS?

Can this project run on macOS? If so can some notes be added? I get the following errors (on catalina). It appears to be a problem with loading the model

AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'

tqdm _trange= 0%| | 0/5 [00:00<?, ?it/s]
0%| | 0/5 [00:00<?, ?it/s]Traceback (most recent call last):
File "/Users/steve/git/crnn-audio-classification/run.py", line 194, in
train_main(config, args.resume)
File "/Users/steve/git/crnn-audio-classification/run.py", line 131, in train_main
trainer.train()
File "/Users/steve/git/crnn-audio-classification/train/base_trainer.py", line 89, in train
result = self._train_epoch(epoch)
File "/Users/steve/git/crnn-audio-classification/train/trainer.py", line 62, in _train_epoch
for batch_idx, batch in enumerate(_trange):
File "/usr/local/lib/python3.8/site-packages/tqdm/std.py", line 1102, in iter
for obj in iterable:
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 352, in iter
return self._get_iterator()
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 801, in init
w.start()
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'

questions about transforms

Hi, thanks for your excellent work.
I noticed that there is a class called 'class ImageTransforms' in transforms.py
I want to know if this is an image transformation operation on the spectrogram
In other words, I want to know whether the image transformation is applicable to the spectrogram?

Error while training using notebook

!python run.py train -c my-config.json --cfg crnn.cfg

Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
Traceback (most recent call last):
File "run.py", line 176, in
train_main(config, args.resume)
File "run.py", line 85, in train_main
model = getattr(net_module, m_name)(classes, config=config)
File "/Users/dk/projects/ns/misc_git_projects/crnn-audio-classification/net/model.py", line 29, in init
self.net = parse_cfg(config['cfg'], in_shape=[in_chan, self.spec.num_mels, 400])
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 139, in parse_cfg
return CFGParser(fname).get_modules(in_shape)
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 120, in get_modules
model = self._flow(in_shape)
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 108, in _flow
in_shape = layer.get_out_shape()
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/base_layers.py", line 40, in get_out_shape
return torch.cat([channel, spatial])
RuntimeError: Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors

How to use self-made audio files for training

Error: Kernel size can't be greater than actual input size

in inference, if I choose audios with small size (e.g. fold10/7913-3-3-0.wav, fold10/7913-3-1-0.wav), the error will occur:
Kernel size can't be greater than actual input size

How shall I fix the bug?

Pretrained weights availability

Hi,

I was wondering if you could provide pretrained weights for the pytorch models.

Thanks

Model

Hello, is there an existing model to use? And can this be run on Jetson Nano? Thank you.

How can i change the batch size?

how can i change the batch size in your code?

thank you

kindly regards

EOFError: Ran out of input

I don't know why there is this error,
————————————————————
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\backend\utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
'The interface of "soundfile" backend is planned to change in 0.8.0 to '

0%| | 0/311 [00:00<?, ?it/s]
0%| | 0/311 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 175, in
train_main(config, args.resume)
File "run.py", line 115, in train_main
trainer.train()
File "D:\pycharm_work\crnn-audio-classification-master\train\base_trainer.py", line 88, in train
result = self._train_epoch(epoch)
File "D:\pycharm_work\crnn-audio-classification-master\train\trainer.py", line 61, in _train_epoch
for batch_idx, batch in enumerate(_trange):
File "D:\Anaconda3\envs\CRNN\lib\site-packages\tqdm\std.py", line 1180, in iter
for obj in iterable:
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\backend\utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
'The interface of "soundfile" backend is planned to change in 0.8.0 to '
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

When I was running the task, I found that the GPU was not used, but the CPU. How should I modify it to make the code run on the GPU?

how to transfowm the model file to Torchscript

hello , i wonder if i can know how to transfowm the model file to Torchscript , so i can call the function by C++ .
i looked up a lot of information , but what i found is too simple case .

for example:
model = torchvision.models.resnet18()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)

or like this:
my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

but that is not suit for us , i can't transform model like that , so ,can you help me with this problem

RuntimeError: mat1 and mat2 shapes cannot be multiplied (24600x1 and 1025x128)

I just install all the pkgs needed and downloaded the dataset, and then I got

$ python run.py train -c myconfigs/config.json --cfg crnn.cfg
Compose(
    ProcessChannels(mode=avg)
    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
    ToTensorAudio()
)
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py:917: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs` and `torch.angle`. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type.
  warnings.warn(
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchparse-0.1-py3.9.egg/torchparse/utils.py:54: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  return (spatial + p2 - k)//s + 1
  0%|                                                               | 0/311 [00:00<?, ?it/s]/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/audio.py:10: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  return (lengths + 2 * pad - fft_length + hop_length) // hop_length
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs`. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type.
  return F.complex_norm(complex_tensor, self.power)
  0%|                                                               | 0/311 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/run.py", line 175, in <module>
    train_main(config, args.resume)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/run.py", line 115, in train_main
    trainer.train()
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/train/base_trainer.py", line 88, in train
    result = self._train_epoch(epoch)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/train/trainer.py", line 68, in _train_epoch
    output = self.model(data)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/model.py", line 52, in forward
    xt, lengths = self.spec(xt, lengths)                
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/audio.py", line 55, in forward
    x = self.mel_scale(x)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py", line 386, in forward
    mel_specgram = torch.matmul(specgram.transpose(-1, -2), self.fb).transpose(-1, -2)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (24600x1 and 1025x128)

which is sad

Need Help as I am beignner in Audio classification

I am unable to get what I should return for Custom audio data forn CRNN model...

Like for image dataset class we return image array in numpy and its label through the get_item function in custom dataset class..
Likewise what should I return in Custom dataset class with label of audio for my custom audio dataset.

How can I customize the model arch crnn.cfg

There seem to be possible to modify the number in crnn.cfg
But how do I add more layers in crnn.cfg
I am newbee for this type of implementation
REPEATx2 is predefined keyword in Pytoch or hardcoded variable name in your code?
If I would like to make 10 CNN before LSTM then, how can I modify?
Aldo if I would like to use different input size, where should I change?

[convs_module]
    [conv2d]
        out_channels=16
        kernel_size=3
        stride=1
        padding=valid
    [batchnorm2d]
    [elu]
    [maxpool2d]
        kernel_size=3
        stride=3
    [dropout]
        p=0.1

    REPEATx2
        [conv2d]
            out_channels=32
            kernel_size=4
            stride=1
            padding=valid
        [batchnorm2d]
        [elu]
        [maxpool2d]
            kernel_size=4
            stride=4
        [dropout]
            p=0.1
    END

[moddims]
    permute=[2,1,0]
    collapse=[1,2]

[recur_module]
    [lstm]
        hidden_size = 64
        num_layers = 3
        bidirectional=True

[moddims]
    permute=[1]

About UrbanSound8K

About the train dataset urbansound8K, I saw the dataset introduction. In my training, whether use 9 folds to train and the left one fold should to be validate, and repeat this process？The process doesn't have truly test dataset?

I have a problem while I run this project

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 176, in
train_main(config, args.resume)
File "run.py", line 78, in train_main
data_manager = getattr(data_module, config['data']['type'])(config['data'])
File "/tf/soundclassify/crnn-audio-git/data/data_manager.py", line 131, in init
self.metadata_df = self._remove_too_small(metadata_df, 1)
File "/tf/soundclassify/crnn-audio-git/data/data_manager.py", line 140, in _remove_too_small
dur_cond = (df['end'] - df['start'])>=min_sec
File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 2980, in getitem
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end'