Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, you can find PyTorch-Style RWKV code here: <a href="https://github.com/Hannibal046

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

SynOps Calculation about spikegpt HOT 9 OPEN

Dieguli commented on August 19, 2024

SynOps Calculation

from spikegpt.

Comments (9)

Dieguli commented on August 19, 2024 1

@ridgerchu I really appreciate the effort you have made to answer my questions. I think that everything is clear now. I will get back to you in case that anything else arises. Thanks!

from spikegpt.

ridgerchu commented on August 19, 2024

Hi, thank you for reaching out and for your interest in our work. I'd like to clarify that in the latest version of our paper, which you can find at this link, we no longer use the SynOps metric. We've decided that it wasn't the most appropriate measure for our purposes. Instead, we've switched to using the theoretical power consumption and have provided detailed steps for its calculation in the paper. Please refer to the linked document for more in-depth information.

from spikegpt.

Dieguli commented on August 19, 2024

Hi @ridgerchu thank you for your previous answer. I have taken a look to the paper and I am able to replicate almost everything, but the energy-consumption estimate, which is what interests me the most. I would appreciate if you could explain how you can get the spiking firing rate from the code provided or already trained model. Furthermore, I do not manage to infer the attention values reported in Table 1. Could you explain why for the second row, MACs are "2T^2d vs 6Td" and why for the first row MACs are 3d^2T, based on the general equations for the SRWKV and SRFNN blocks as well as that for the self-attention mechanism?. I would eally appreciate your help.

from spikegpt.

ridgerchu commented on August 19, 2024

Hi,

To measure the spiking rate, you can utilize the hook function in PyTorch. This function allows you to record the outputs of network layers, thereby enabling you to calculate the output firing rate effectively.

Regarding the MACs: the term '3Td^2' refers to the computational consumption for the matrices Q, K, and V in a neural network. Each of these matrices requires 'Td^2' operations for computation. Specifically, in the context of the attention mechanism, the product of matrices Q and K involves a matrix multiplication operation, which results in a computational cost proportional to the square of T (T^2). This is due to the matrix multiplication dynamics in the attention process.

I hope this explanation clarifies your queries. Feel free to reach out if you have more questions!

from spikegpt.

Dieguli commented on August 19, 2024

Hi @ridgerchu thanks a lot for the help with the spiking rate calculation!

But I am still struggling with defining the computational complexity of the model to derive the energy consumption from it. I will try to expose my doubts properly:

I understand that for the self-attention mechanism you have 3 operations involved: dot product of Q and K, scaling of this dot product and multiplication of the attention scores with V, which gives a total of
2T^2d + T^2 FLOPs. Therefore, we have to multiply the resulting number with Emac to get the energy consumption. As you can see, I do not understand where the additional two terms of rows 1 and 4 comes from. I understand that for SpikeGPT, the number of FLOPs of f(Q/R,K,V) is 6Td, as here you use the RWKV version inspired in the attention free transformer. Can you explain what does the 'Q/R,K,V' contribution mean in the case of both Vanilla-GPT and SpikeGPT as well as how you compute its different values?. Also, why in the case of SpikeGPT it does not involve MAC operations but just AC?.
Finally, I would like to know how you compute the FLOPs of the 3 MLPs (values included in rows 5, 6 and 7). Firstly, I would like to make sure that they are a contribution from the SRFNN block. Secondly, I would like to make sure that they are the computations related to Mp, Mg and Ms matrices. I would appreciate if you could give the FLMLP_i values in terms of T and d.

Sorry for such a long question, as I understand that answering it means explaining step by step all the calculations involved in that section of the paper, but maybe it is also helpful for you to include in supplementary materials as a clarification to reviewers.

from spikegpt.

ridgerchu commented on August 19, 2024

Hi, thank you for reaching out with your questions!

Self-Attention Mechanism Complexity: In reference to the self-attention mechanism's computational complexity and its relation to energy consumption, we align our methodology with the approach used in Spike-Driven Transformer. Specifically, we employ Eac and Emac calculations similar to theirs. In their Spike Neural Network (SNN) model, they utilize Eac, which we have also adopted. For the Td calculation, we followed the precedent set in models like AFT, RWKV, and SpikeGPT, where the combination of R/Q, K, and V variables involves element-wise products, leading to a complexity level of Td.

Calculations of Mp, Mg, and Ms Matrices: The additional terms you're inquiring about originate from the Mp, Mg, and Ms matrices. For Mg, its computation is based on 4Td^2 (with d=512, T=3072), resulting in 3221225472. This number, when multiplied by R=0.15 and Eac = 0.9, yields a value of 434865438.72. For Mp and Ms, we initially considered their calculation to be Td^2 since their matrix size is four times smaller. This was our calculation at the time, and while we strove for accuracy, we acknowledge there might be areas that lack rigor. If you find any discrepancies or have concerns, please feel free to point them out!

from spikegpt.

Dieguli commented on August 19, 2024

Hi @ridgerchu I was wondering if you new where I could find the wkv implementation class in pytorch, not the current cuda one. Thanks!

from spikegpt.

ridgerchu commented on August 19, 2024

Hi, you can find PyTorch-Style RWKV code here: link

from spikegpt.

Dieguli commented on August 19, 2024

@ridgerchu really appreciate your support. Thanks!

from spikegpt.

SynOps Calculation about spikegpt HOT 9 OPEN

Comments (9)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent