Code Monkey home page Code Monkey logo

Comments (9)

Dieguli avatar Dieguli commented on August 19, 2024 1

@ridgerchu I really appreciate the effort you have made to answer my questions. I think that everything is clear now. I will get back to you in case that anything else arises. Thanks!

from spikegpt.

ridgerchu avatar ridgerchu commented on August 19, 2024

Hi, thank you for reaching out and for your interest in our work. I'd like to clarify that in the latest version of our paper, which you can find at this link, we no longer use the SynOps metric. We've decided that it wasn't the most appropriate measure for our purposes. Instead, we've switched to using the theoretical power consumption and have provided detailed steps for its calculation in the paper. Please refer to the linked document for more in-depth information.

from spikegpt.

Dieguli avatar Dieguli commented on August 19, 2024

Hi @ridgerchu thank you for your previous answer. I have taken a look to the paper and I am able to replicate almost everything, but the energy-consumption estimate, which is what interests me the most. I would appreciate if you could explain how you can get the spiking firing rate from the code provided or already trained model. Furthermore, I do not manage to infer the attention values reported in Table 1. Could you explain why for the second row, MACs are "2T^2d vs 6Td" and why for the first row MACs are 3d^2T, based on the general equations for the SRWKV and SRFNN blocks as well as that for the self-attention mechanism?. I would eally appreciate your help.

from spikegpt.

ridgerchu avatar ridgerchu commented on August 19, 2024

Hi,

To measure the spiking rate, you can utilize the hook function in PyTorch. This function allows you to record the outputs of network layers, thereby enabling you to calculate the output firing rate effectively.

Regarding the MACs: the term '3Td^2' refers to the computational consumption for the matrices Q, K, and V in a neural network. Each of these matrices requires 'Td^2' operations for computation. Specifically, in the context of the attention mechanism, the product of matrices Q and K involves a matrix multiplication operation, which results in a computational cost proportional to the square of T (T^2). This is due to the matrix multiplication dynamics in the attention process.

I hope this explanation clarifies your queries. Feel free to reach out if you have more questions!

from spikegpt.

Dieguli avatar Dieguli commented on August 19, 2024

Hi @ridgerchu thanks a lot for the help with the spiking rate calculation!

But I am still struggling with defining the computational complexity of the model to derive the energy consumption from it. I will try to expose my doubts properly:

  1. I understand that for the self-attention mechanism you have 3 operations involved: dot product of Q and K, scaling of this dot product and multiplication of the attention scores with V, which gives a total of
    2T^2d + T^2 FLOPs. Therefore, we have to multiply the resulting number with Emac to get the energy consumption. As you can see, I do not understand where the additional two terms of rows 1 and 4 comes from. I understand that for SpikeGPT, the number of FLOPs of f(Q/R,K,V) is 6Td, as here you use the RWKV version inspired in the attention free transformer. Can you explain what does the 'Q/R,K,V' contribution mean in the case of both Vanilla-GPT and SpikeGPT as well as how you compute its different values?. Also, why in the case of SpikeGPT it does not involve MAC operations but just AC?.

  2. Finally, I would like to know how you compute the FLOPs of the 3 MLPs (values included in rows 5, 6 and 7). Firstly, I would like to make sure that they are a contribution from the SRFNN block. Secondly, I would like to make sure that they are the computations related to Mp, Mg and Ms matrices. I would appreciate if you could give the FLMLP_i values in terms of T and d.

Sorry for such a long question, as I understand that answering it means explaining step by step all the calculations involved in that section of the paper, but maybe it is also helpful for you to include in supplementary materials as a clarification to reviewers.

from spikegpt.

ridgerchu avatar ridgerchu commented on August 19, 2024

Hi, thank you for reaching out with your questions!

Self-Attention Mechanism Complexity: In reference to the self-attention mechanism's computational complexity and its relation to energy consumption, we align our methodology with the approach used in Spike-Driven Transformer. Specifically, we employ Eac and Emac calculations similar to theirs. In their Spike Neural Network (SNN) model, they utilize Eac, which we have also adopted. For the Td calculation, we followed the precedent set in models like AFT, RWKV, and SpikeGPT, where the combination of R/Q, K, and V variables involves element-wise products, leading to a complexity level of Td.

Calculations of Mp, Mg, and Ms Matrices: The additional terms you're inquiring about originate from the Mp, Mg, and Ms matrices. For Mg, its computation is based on 4Td^2 (with d=512, T=3072), resulting in 3221225472. This number, when multiplied by R=0.15 and Eac = 0.9, yields a value of 434865438.72. For Mp and Ms, we initially considered their calculation to be Td^2 since their matrix size is four times smaller. This was our calculation at the time, and while we strove for accuracy, we acknowledge there might be areas that lack rigor. If you find any discrepancies or have concerns, please feel free to point them out!

from spikegpt.

Dieguli avatar Dieguli commented on August 19, 2024

Hi @ridgerchu I was wondering if you new where I could find the wkv implementation class in pytorch, not the current cuda one. Thanks!

from spikegpt.

ridgerchu avatar ridgerchu commented on August 19, 2024

Hi, you can find PyTorch-Style RWKV code here: link

from spikegpt.

Dieguli avatar Dieguli commented on August 19, 2024

@ridgerchu really appreciate your support. Thanks!

from spikegpt.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.