Code Monkey home page Code Monkey logo

Comments (3)

rohitgirdhar avatar rohitgirdhar commented on September 7, 2024 2

Thanks @LiliMeng for the feedback and trying out attentional pooling on CIFAR-10!
Here's what I think might be happening here:

  1. Given the connection (low-rank approx) we show to 2nd order pooling, a variant of which has been shown to be useful for fine-grained recognition (bilinear cnn), and concurrently for other fine-grained tasks like VQA (relation nets), the 10-way classification task on CIFAR might not be "fine-grained" enough.
  2. Again using the connection to 2nd order pooling, I think attentional pooling would be most useful for tasks that require "interaction" features ($X^TX$); i.e. pair-wise features of one part of the image/video with another. This I think is especially true in action recognition and VQA, which might explain the recent success of similar self-attention methods for these tasks (Attend and Interact, Non-local Neural Networks)
  3. The resolution of images is also important. If your base network has high receptive field, you might want to increase the resolution of the input images (simple resizing would work), so that the last layer neurons look at different regions of the image, and attentional pooling can down-weight certain regions.

That said, our attention module is super light-weight, and seems to mostly maintain or improve performance; it might be useful to keep around in your network architectures 🙂

from attentionalpoolingaction.

LiliMeng avatar LiliMeng commented on September 7, 2024

Thanks a lot @rohitgirdhar for your kind and detailed reply! :)

  1. I also tried on attentional pooling for CIFAR100 with ResNet-32, the result is: 69.44% (with attentional pooling) vs.70.05% (without). Although CIAFR100 has 10 super classes, each super class has 10 sub-classes, such as rabbit and squirrel. Maybe it is not "fine-grained" as well? The CIFAR10/100 feature map before the pooling layer is [batch_size, 8, 8, 64]. Maybe 8x8 is too small for weighting certain regions. Or because the CIFAR10/100, the object needed to recognize have already taken up most space of the image, additional attention may not be helpful?

  2. Is resolution of images also important? I also tried on STL10, it's 96x96 (three times of CIFAR10/100), the result is 79.69% (with attentional pooling) Vs. 80.92% (without).

  3. I'll have a try to use large images and activity datasets.

from attentionalpoolingaction.

rohitgirdhar avatar rohitgirdhar commented on September 7, 2024

Thanks for trying the other experiments. Yes, the resolution of the image is quite important, as you want different features at the last layer focus on different areas of the image.

from attentionalpoolingaction.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.