Code Monkey home page Code Monkey logo

Comments (3)

iyaja avatar iyaja commented on July 20, 2024

Hi, sorry for the late reply.

Can you talk about the inspiration for doing this, or explain it, I am confused about it

Sure. The main idea is to use regular convolution-based operations across three different "rotated" copies of the 3D input tensor. We hypothesized that this allows the network to capture cross-dimensional features in a way that regular 2D convolutions could not. For example, if the channel dimension represents time (such as in a video where frames are stacked and passed through a CNN), the network could learn features that correspond to an object moving up and down or side-to-side.

You can find more information about this idea of cross-dimensional interaction, the motivation for our method, and our analysis in our paper.

from triplet-attention.

Mr-Da-Yang avatar Mr-Da-Yang commented on July 20, 2024

Thank you very much for your full response!
What I mean is that this operation is actually a variant of spatial attention. So, have you considered doing the same on channel attention?

from triplet-attention.

iyaja avatar iyaja commented on July 20, 2024

What I mean is that this operation is actually a variant of spatial attention.

Yep, you're right. Triplet Attention applies a variant of CBAM's spatial attention across different permutations of the input.

So, have you considered doing the same on channel attention?

No, not really. But I suppose this would work as well, to a degree.

As we mention in the paper, one of the main motivations of Triplet Attention is efficiently learning how to extract "cross-dimentional" features. So by using spatial attention, each branch is able to learn features that span across two of the three dimensions.

If you used three channel attention branches, you wouldn't be able to learn those kinds of cross-dimensional feature extractors, since the canonical channel attention only operates across one dimension of a tensor.

from triplet-attention.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.