ksanjeevan / crnn-audio-classification Goto Github PK
View Code? Open in Web Editor NEWUrbanSound classification using Convolutional Recurrent Networks in PyTorch
License: MIT License
UrbanSound classification using Convolutional Recurrent Networks in PyTorch
License: MIT License
Hi, when I try to train the model, I get the following output. Any idea how to handle it?
./run.py train -c config.json --cfg arch.cfg
Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
AudioCRNN(
(spec): MelspectrogramStretch(num_mels=128, fft_length=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])
(net): ModuleDict(
(main): Sequential()
)
)
Trainable parameters: 0
Traceback (most recent call last):
File "./run.py", line 176, in
train_main(config, args.resume)
File "./run.py", line 97, in train_main
optimizer = getattr(torch.optim, opt_name)(trainable_params, **opt_args)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/optim/adam.py", line 42, in init
super(Adam, self).init(params, defaults)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/optim/optimizer.py", line 46, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list
when I run the run.py, I log the model,and get the print
"AudioCRNN(
(spec): MelspectrogramStretch()
(net): ModuleDict(
(main): Sequential()
)
)
Trainable parameters: 0
Traceback (most recent call last):
File "./run.py", line 175, in
train_main(config, args.resume)
File "./run.py", line 96, in train_main
optimizer = getattr(torch.optim, opt_name)(trainable_params, **opt_args)
File "/home/wyanqing/.conda/envs/yq/lib/python3.7/site-packages/torch/optim/adam.py", line 48, in init
super(Adam, self).init(params, defaults)
File "/home/wyanqing/.conda/envs/yq/lib/python3.7/site-packages/torch/optim/optimizer.py", line 47, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list".
it seems that there is some problems about the model . how should i fix this ,thanks!
now,I want to covert the mode to caffe; so I want to the input size; like this input = torch.ones([1, 3, 224, 224])
Hi, can you please add a License to the project?
Thanks.
Can this project run on macOS? If so can some notes be added? I get the following errors (on catalina). It appears to be a problem with loading the model
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'
tqdm _trange= 0%| | 0/5 [00:00<?, ?it/s]
0%| | 0/5 [00:00<?, ?it/s]Traceback (most recent call last):
File "/Users/steve/git/crnn-audio-classification/run.py", line 194, in
train_main(config, args.resume)
File "/Users/steve/git/crnn-audio-classification/run.py", line 131, in train_main
trainer.train()
File "/Users/steve/git/crnn-audio-classification/train/base_trainer.py", line 89, in train
result = self._train_epoch(epoch)
File "/Users/steve/git/crnn-audio-classification/train/trainer.py", line 62, in _train_epoch
for batch_idx, batch in enumerate(_trange):
File "/usr/local/lib/python3.8/site-packages/tqdm/std.py", line 1102, in iter
for obj in iterable:
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 352, in iter
return self._get_iterator()
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 801, in init
w.start()
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/local/Cellar/[email protected]/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'
Hi, thanks for your excellent work.
I noticed that there is a class called 'class ImageTransforms' in transforms.py
I want to know if this is an image transformation operation on the spectrogram
In other words, I want to know whether the image transformation is applicable to the spectrogram?
!python run.py train -c my-config.json --cfg crnn.cfg
Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
Traceback (most recent call last):
File "run.py", line 176, in
train_main(config, args.resume)
File "run.py", line 85, in train_main
model = getattr(net_module, m_name)(classes, config=config)
File "/Users/dk/projects/ns/misc_git_projects/crnn-audio-classification/net/model.py", line 29, in init
self.net = parse_cfg(config['cfg'], in_shape=[in_chan, self.spec.num_mels, 400])
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 139, in parse_cfg
return CFGParser(fname).get_modules(in_shape)
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 120, in get_modules
model = self._flow(in_shape)
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/parser.py", line 108, in _flow
in_shape = layer.get_out_shape()
File "/Users/dk/py_env/ts/lib/python3.7/site-packages/torchparse/base_layers.py", line 40, in get_out_shape
return torch.cat([channel, spatial])
RuntimeError: Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors
in inference, if I choose audios with small size (e.g. fold10/7913-3-3-0.wav, fold10/7913-3-1-0.wav), the error will occur:
Kernel size can't be greater than actual input size
How shall I fix the bug?
Hi,
I was wondering if you could provide pretrained weights for the pytorch models.
Thanks
Hello, is there an existing model to use? And can this be run on Jetson Nano? Thank you.
Hi
how can i change the batch size in your code?
thank you
kindly regards
I don't know why there is this error,
————————————————————
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\backend\utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
'The interface of "soundfile" backend is planned to change in 0.8.0 to '
0%| | 0/311 [00:00<?, ?it/s]
0%| | 0/311 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 175, in
train_main(config, args.resume)
File "run.py", line 115, in train_main
trainer.train()
File "D:\pycharm_work\crnn-audio-classification-master\train\base_trainer.py", line 88, in train
result = self._train_epoch(epoch)
File "D:\pycharm_work\crnn-audio-classification-master\train\trainer.py", line 61, in _train_epoch
for batch_idx, batch in enumerate(_trange):
File "D:\Anaconda3\envs\CRNN\lib\site-packages\tqdm\std.py", line 1180, in iter
for obj in iterable:
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\Anaconda3\envs\CRNN\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AugmentationTransform._get_dist..'
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
D:\Anaconda3\envs\CRNN\lib\site-packages\torchaudio\backend\utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
before setting the backend to "soundfile". Please refer to pytorch/audio#903 for the detail.
'The interface of "soundfile" backend is planned to change in 0.8.0 to '
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda3\envs\CRNN\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
hello , i wonder if i can know how to transfowm the model file to Torchscript , so i can call the function by C++ .
i looked up a lot of information , but what i found is too simple case .
for example:
model = torchvision.models.resnet18()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
or like this:
my_module = MyModule(10,20)
sm = torch.jit.script(my_module)
but that is not suit for us , i can't transform model like that , so ,can you help me with this problem
I just install all the pkgs needed and downloaded the dataset, and then I got
$ python run.py train -c myconfigs/config.json --cfg crnn.cfg
Compose(
ProcessChannels(mode=avg)
AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
ToTensorAudio()
)
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py:917: UserWarning: torchaudio.transforms.ComplexNorm has been deprecated and will be removed from future release.Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs` and `torch.angle`. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type.
warnings.warn(
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchparse-0.1-py3.9.egg/torchparse/utils.py:54: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
return (spatial + p2 - k)//s + 1
0%| | 0/311 [00:00<?, ?it/s]/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/audio.py:10: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
return (lengths + 2 * pad - fft_length + hop_length) // hop_length
/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py:936: UserWarning: torchaudio.functional.functional.complex_norm has been deprecated and will be removed from 0.11 release. Please convert the input Tensor to complex type with `torch.view_as_complex` then use `torch.abs`. Please refer to https://github.com/pytorch/audio/issues/1337 for more details about torchaudio's plan to migrate to native complex type.
return F.complex_norm(complex_tensor, self.power)
0%| | 0/311 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/run.py", line 175, in <module>
train_main(config, args.resume)
File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/run.py", line 115, in train_main
trainer.train()
File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/train/base_trainer.py", line 88, in train
result = self._train_epoch(epoch)
File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/train/trainer.py", line 68, in _train_epoch
output = self.model(data)
File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/model.py", line 52, in forward
xt, lengths = self.spec(xt, lengths)
File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gpu-server/disk/disk1/xuxin_workspace/projects/crnn-audio-classification-master/net/audio.py", line 55, in forward
x = self.mel_scale(x)
File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gpu-server/anaconda3/envs/audio_cls/lib/python3.9/site-packages/torchaudio/transforms.py", line 386, in forward
mel_specgram = torch.matmul(specgram.transpose(-1, -2), self.fb).transpose(-1, -2)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (24600x1 and 1025x128)
which is sad
I am unable to get what I should return for Custom audio data forn CRNN model...
Like for image dataset class we return image array in numpy and its label through the get_item function in custom dataset class..
Likewise what should I return in Custom dataset class with label of audio for my custom audio dataset.
There seem to be possible to modify the number in crnn.cfg
But how do I add more layers in crnn.cfg
I am newbee for this type of implementation
REPEATx2 is predefined keyword in Pytoch or hardcoded variable name in your code?
If I would like to make 10 CNN before LSTM then, how can I modify?
Aldo if I would like to use different input size, where should I change?
[convs_module]
[conv2d]
out_channels=16
kernel_size=3
stride=1
padding=valid
[batchnorm2d]
[elu]
[maxpool2d]
kernel_size=3
stride=3
[dropout]
p=0.1
REPEATx2
[conv2d]
out_channels=32
kernel_size=4
stride=1
padding=valid
[batchnorm2d]
[elu]
[maxpool2d]
kernel_size=4
stride=4
[dropout]
p=0.1
END
[moddims]
permute=[2,1,0]
collapse=[1,2]
[recur_module]
[lstm]
hidden_size = 64
num_layers = 3
bidirectional=True
[moddims]
permute=[1]
About the train dataset urbansound8K, I saw the dataset introduction. In my training, whether use 9 folds to train and the left one fold should to be validate, and repeat this process?The process doesn't have truly test dataset?
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 176, in
train_main(config, args.resume)
File "run.py", line 78, in train_main
data_manager = getattr(data_module, config['data']['type'])(config['data'])
File "/tf/soundclassify/crnn-audio-git/data/data_manager.py", line 131, in init
self.metadata_df = self._remove_too_small(metadata_df, 1)
File "/tf/soundclassify/crnn-audio-git/data/data_manager.py", line 140, in _remove_too_small
dur_cond = (df['end'] - df['start'])>=min_sec
File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 2980, in getitem
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'end'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.