vvvb-github / avsegformer Goto Github PK

View Code? Open in Web Editor NEW

45.0 2.0 4.0 496 KB

[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer

Home Page: https://arxiv.org/abs/2307.01146

Python 72.62% Shell 0.39% C++ 2.69% Cuda 24.30%

audio-visual-segmentation multimodal-deep-learning semantic-segmentation vision-transformer

avsegformer's Issues

Problem shape '[1029, 320, 32]' is invalid for input of size 1317120

There is a problem with the reproduction results

Hello, this model is on the S4 data set, image size (224, 224), but the reproducible result is only 0.734. I did not modify the configuration file.

请问是否被MM23录用？

您好，请问论文是否被MM录用？

Question about the AVSS pre-training

When training the model on the AVSS Datasets, we find that the MIOU is about 20 with Res50 backbone and is about 30 with PVT-v2 backbone at 11 epochs. Could you please confirm if this is a normal occurrence? We have completed training for a total of 30 epochs, and in the subsequent 20 epochs, we observed an increase of approximately 6 points.

Problem shape '[1029, 320, 32]' is invalid for input of size 1317120

When training with the avss dataset, the audio_fea extracted by vggish is bs * 10 in the first dimension, which will not match the subsequent feature matrix with bs in the first dimension. The specific problem appears in "out2 = self.cross_attn (query, src, src, key_padding_mask = padding_mask) [0]",it showing this error:
File "/home/ptr/hzw/AVSegFormer-master/model/AVSegFormer.py", line 75, in forward
pred, mask_feature = self.head(img_feat, audio_feat)
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/head/AVSegHead.py", line 223, in forward
memory, outputs = self.transformer(query, src_flatten, spatial_shapes,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 160, in forward
outputs = self.decoder(query, memory, reference_points,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 139, in forward
out = layer(out, src, reference_points, spatial_shapes,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 117, in forward
out2 = self.cross_attn(
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1003, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/functional.py", line 5044, in multi_head_attention_forward
k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: shape '[1029, 320, 32]' is invalid for input of size 1317120

cannot find /lib//x86_64-linux-gnu/libmvec.so.1 inside /

einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (1) for operand 1 and no ellipsis was given

File "/home/hwh/Project/AVS/AVSegFormer-master/model/head/AVSegHead.py", line 238, in forward
mask_feature = self.fusion_block(mask_feature, audio_feat)
File "/home/hwh/anaconda3/envs/AVS39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hwh/Project/AVS/AVSegFormer-master/model/utils/fusion_block.py", line 44, in forward
fusion_map = torch.einsum('bchw,bc->bchw', feature_map, x.squeeze())
File "/home/hwh/anaconda3/envs/AVS39/lib/python3.9/site-packages/torch/functional.py", line 378, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (1) for operand 1 and no ellipsis was given

change the batch_size=1

When I set batchsize=1, the bug RuntimeError: Caught RuntimeError in replica 2 on device 2 will be displayed.

Question about the img_size？

I found your paper said that your img_size is 224x224, however, in your code, the img_size is 512, which one is right？https://github.com/vvvb-github/AVSegFormer/blob/master/dataloader/s4_dataset.py

vvvb-github / avsegformer Goto Github PK

avsegformer's Issues

Problem shape '[1029, 320, 32]' is invalid for input of size 1317120

There is a problem with the reproduction results

请问是否被MM23录用？

Question about the AVSS pre-training

Problem shape '[1029, 320, 32]' is invalid for input of size 1317120

cannot find /lib//x86_64-linux-gnu/libmvec.so.1 inside /

einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (1) for operand 1 and no ellipsis was given

change the batch_size=1

Question about the img_size？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent