Code Monkey home page Code Monkey logo

Comments (5)

Jamie-Stirling avatar Jamie-Stirling commented on August 23, 2024 2

Hi all, thanks very much for raising this and identifying the issue. I'll update the comments when I get time.

from retnet.

DinoMan avatar DinoMan commented on August 23, 2024 1

@Qiu30 Just had a closer look at the code (and the tests.py) and you need to note that s_n_1s is a list for MultiScaleRetention. What the comment means to say is that each element of the list has the shape (batch_size, heads, head_size, head_size). As for RetNet the state is a list of lists with each element being (batch_size, heads, head_size, head_size). So to summarize:

Retention --> Sn-1: (batch x head_dim x head_dim)
MultiscaleRetention --> Sn-1s: List of with num_head elements (tensors) each with shape (batch x head_dim x head_dim)
RetNet --> Sn-1s: List with num_layers elements. Each element is a list with elements tensors with shape (batch x head_dim x head_dim)

I hope this helps.

from retnet.

Qiu30 avatar Qiu30 commented on August 23, 2024

@DinoMan @Jamie-Stirling Thank you for your reply. I have the same idea as you, but I have a question, what is the initial value of s_n_1? I searched the paper and did not see the relevant initial value.

from retnet.

Jamie-Stirling avatar Jamie-Stirling commented on August 23, 2024

Hi, in the code I initialize this to zeros, however this detail is not mentioned in the paper. I'm not sure of the impact of the choice of the initial value on training, but setting to zeros ensures only the keys and values computed from the first token effect the state at t=1, akin to a transformer.

I would say, setting a nonzero constant or trainable value for the initial state is analogous to introducing a bias term, and so may affect the way the RetNet trains. Though I'm not an expert so it may be best to ask the authors of the original paper to make sure.

from retnet.

Qiu30 avatar Qiu30 commented on August 23, 2024

@Jamie-Stirling I understand, thanks for your reply!

from retnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.