Code Monkey home page Code Monkey logo

soft's People

Contributors

andy-zd avatar lzrobots avatar victorllu avatar yjhmitweb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soft's Issues

Ask for pretrain model

Dear Jiachen,
Thanks for your work! Would you like to share the pretrained models on ImageNet? Best wishes!

Code for DeiT

Thank you for publishing your code. I saw the DeiT ablation in your paper. Is there a chance you could also provide code to reproduce that? If you'd prefer to contact me in private, my email is [email protected]

Thanks again!

如何理解论文中提到的线性复杂度?

很抱歉我图方便直接用中文提问。

论文里线性开销的关键在于用stride conv下采样,但是conv训练完以后kernel size和stride就固定了,那采样的比例也固定了。
那么训练完以后,如果用更长的序列进行测试,m的长度会随着序列长度n增长,复杂度还是O(n^2)而不是O(n)。
我看了下openreview的审稿意见,似乎有审稿人问到这问题,但rebuttal中提到固定m=49,但当测试序列更长时,这似乎在不改变stride的情况下是无法做到的?感觉Nystromformer的adaptive pooling更符合landmark的意义。
另外,用于生成landmark的conv后面还跟着norm和GELU,是不是这才是收敛的关键?

Substitute regular attention module with sofmax-free attention module

Hello,

The background is that due to the limitation of the computation platform I'm using, where the softmax operator costs a lot of time, I'm trying to substitute the regular attention modules into sofmax-free attention module.

I have one question about the structure of SOFT. The core of the softmax-free attention module runs like this:

    def forward(self, X, H, W):

        Q = self.split_heads(self.W_q(X))
        V = self.split_heads(self.W_v(X))
        attn_out = self.attn(Q, V, H, W)
        attn_out = self.combine_heads(attn_out)

        out = self.ff(attn_out)
        return out

As Q and V are generated from X, does that mean this attention module is keen to a self-attention module rather than the cross-attention module where the Q, K, V are from different domains? If that is the case, is there any suggestion on regular cross-attention module substitution with softmax-free attention? Thanks.

Best,
Chenxi

Regular softmax attention in last block

Hello,

First, nice job on the work, I think this is a really interesting paper, with a lot of potential to enable further theoretical investigations into deep attention mechanisms.

Looking into the code, I noticed that in the final block of SOFT there is a normal softmax attention layer. Is there a reason for this? Also did you notice any quantitative or qualitative differences in the attention heatmaps produced by this regular softmax layer and the other approximated, softmax-free attention layers?

Thanks in advance for your time and work

some issue

Tranks your work. in your work,the file < substraction> of line 6 always occours errors.please help me

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.