Hi. Thanks for sharing this work. I have a question regarding the wr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question: Global write tokens or recurrent about recurrent-memory-transformer-pytorch HOT 7 CLOSED

lucidrains commented on June 8, 2024 1

Question: Global write tokens or recurrent

from recurrent-memory-transformer-pytorch.

Comments (7)

IcarusWizard commented on June 8, 2024 1

My thought is, if using the recurrent structure, the skip connection in transformer can implicitly make the next memory close to the previous memory.

from recurrent-memory-transformer-pytorch.

IcarusWizard commented on June 8, 2024 1

LGTM. There is one small detail that I don't know whether you need stop gradient. In Fig. 2 of the RMT paper, the gradient arrow is pointing to the write, but in Fig. 4 of the RMDT paper, the gradient arrow is point to the read. I think that won't be a big issue as long as the training is stable.

from recurrent-memory-transformer-pytorch.

lucidrains commented on June 8, 2024

@IcarusWizard ah that's interesting

would be nice if they ablated that

my take is, should make little difference, as in the first layer, the write memories have access to all the read memories

from recurrent-memory-transformer-pytorch.

lucidrains commented on June 8, 2024

@IcarusWizard but i can get that change in, to be faithful to the paper

from recurrent-memory-transformer-pytorch.

lucidrains commented on June 8, 2024

@IcarusWizard yup

do you want to see if the latest commit lines up with the paper better?

from recurrent-memory-transformer-pytorch.

lucidrains commented on June 8, 2024

@IcarusWizard i'm pretty sure it won't do anything. the way i had it before is akin to an attention pooling step on the first layer, and we know that works well from some other papers

from recurrent-memory-transformer-pytorch.

lucidrains commented on June 8, 2024

@IcarusWizard ok, i've made it a hyperparameter

thanks for reporting this!

from recurrent-memory-transformer-pytorch.

Recommend Projects

Question: Global write tokens or recurrent about recurrent-memory-transformer-pytorch HOT 7 CLOSED

Comments (7)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent