vvvb-github / avsegformer Goto Github PK
View Code? Open in Web Editor NEW[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
Home Page: https://arxiv.org/abs/2307.01146
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
Home Page: https://arxiv.org/abs/2307.01146
Hello, this model is on the S4 data set, image size (224, 224), but the reproducible result is only 0.734. I did not modify the configuration file.
您好,请问论文是否被MM录用?
When training the model on the AVSS Datasets, we find that the MIOU is about 20 with Res50 backbone and is about 30 with PVT-v2 backbone at 11 epochs. Could you please confirm if this is a normal occurrence? We have completed training for a total of 30 epochs, and in the subsequent 20 epochs, we observed an increase of approximately 6 points.
When training with the avss dataset, the audio_fea extracted by vggish is bs * 10 in the first dimension, which will not match the subsequent feature matrix with bs in the first dimension. The specific problem appears in "out2 = self.cross_attn (query, src, src, key_padding_mask = padding_mask) [0]",it showing this error:
File "/home/ptr/hzw/AVSegFormer-master/model/AVSegFormer.py", line 75, in forward
pred, mask_feature = self.head(img_feat, audio_feat)
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/head/AVSegHead.py", line 223, in forward
memory, outputs = self.transformer(query, src_flatten, spatial_shapes,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 160, in forward
outputs = self.decoder(query, memory, reference_points,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 139, in forward
out = layer(out, src, reference_points, spatial_shapes,
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/hzw/AVSegFormer-master/model/utils/transformer.py", line 117, in forward
out2 = self.cross_attn(
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1003, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/ptr/anaconda3/envs/AVS/lib/python3.8/site-packages/torch/nn/functional.py", line 5044, in multi_head_attention_forward
k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: shape '[1029, 320, 32]' is invalid for input of size 1317120
File "/home/hwh/Project/AVS/AVSegFormer-master/model/head/AVSegHead.py", line 238, in forward
mask_feature = self.fusion_block(mask_feature, audio_feat)
File "/home/hwh/anaconda3/envs/AVS39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hwh/Project/AVS/AVSegFormer-master/model/utils/fusion_block.py", line 44, in forward
fusion_map = torch.einsum('bchw,bc->bchw', feature_map, x.squeeze())
File "/home/hwh/anaconda3/envs/AVS39/lib/python3.9/site-packages/torch/functional.py", line 378, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (1) for operand 1 and no ellipsis was given
When I set batchsize=1, the bug RuntimeError: Caught RuntimeError in replica 2 on device 2 will be displayed.
I found your paper said that your img_size is 224x224, however, in your code, the img_size is 512, which one is right?https://github.com/vvvb-github/AVSegFormer/blob/master/dataloader/s4_dataset.py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.