Thanks a lot for the paper and sharing the code. It seems that for CIFAR10 dataset

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Thanks a lot <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Attentional pooling for CIFAR10/100, STL10 dataset about attentionalpoolingaction HOT 3 CLOSED

rohitgirdhar commented on September 7, 2024

Attentional pooling for CIFAR10/100, STL10 dataset

from attentionalpoolingaction.

Comments (3)

rohitgirdhar commented on September 7, 2024 2

Thanks @LiliMeng for the feedback and trying out attentional pooling on CIFAR-10!
Here's what I think might be happening here:

Given the connection (low-rank approx) we show to 2nd order pooling, a variant of which has been shown to be useful for fine-grained recognition (bilinear cnn), and concurrently for other fine-grained tasks like VQA (relation nets), the 10-way classification task on CIFAR might not be "fine-grained" enough.
Again using the connection to 2nd order pooling, I think attentional pooling would be most useful for tasks that require "interaction" features ($X^TX$); i.e. pair-wise features of one part of the image/video with another. This I think is especially true in action recognition and VQA, which might explain the recent success of similar self-attention methods for these tasks (Attend and Interact, Non-local Neural Networks)
The resolution of images is also important. If your base network has high receptive field, you might want to increase the resolution of the input images (simple resizing would work), so that the last layer neurons look at different regions of the image, and attentional pooling can down-weight certain regions.

That said, our attention module is super light-weight, and seems to mostly maintain or improve performance; it might be useful to keep around in your network architectures 🙂

from attentionalpoolingaction.

LiliMeng commented on September 7, 2024

Thanks a lot @rohitgirdhar for your kind and detailed reply! :)

I also tried on attentional pooling for CIFAR100 with ResNet-32, the result is: 69.44% (with attentional pooling) vs.70.05% (without). Although CIAFR100 has 10 super classes, each super class has 10 sub-classes, such as rabbit and squirrel. Maybe it is not "fine-grained" as well? The CIFAR10/100 feature map before the pooling layer is [batch_size, 8, 8, 64]. Maybe 8x8 is too small for weighting certain regions. Or because the CIFAR10/100, the object needed to recognize have already taken up most space of the image, additional attention may not be helpful?
Is resolution of images also important? I also tried on STL10, it's 96x96 (three times of CIFAR10/100), the result is 79.69% (with attentional pooling) Vs. 80.92% (without).
I'll have a try to use large images and activity datasets.

from attentionalpoolingaction.

rohitgirdhar commented on September 7, 2024

Thanks for trying the other experiments. Yes, the resolution of the image is quite important, as you want different features at the last layer focus on different areas of the image.

from attentionalpoolingaction.

Attentional pooling for CIFAR10/100, STL10 dataset about attentionalpoolingaction HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent