Code Monkey home page Code Monkey logo

Comments (7)

takgto avatar takgto commented on June 29, 2024 1

Thank you for your continued support.
According to the technical website of jetmoe ( https://research.myshell.ai/jetmoe ), jetmoe has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP exports (MoE) looks like ModuleFormer ( https://arxiv.org/abs/2306.04640 ). So, LlamaMoE model might not be fit to jetmoe.
Separately, I am asking the jetmoe website to provide parameter mapping information ( myshell-ai/JetMoE#11 ). Unfortunately, I haven't received a reply yet.

from litgpt.

rasbt avatar rasbt commented on June 29, 2024

Hi there,
thanks for suggesting! New models are always welcome. JetMoE is currently not on the priority list due to many other requests and features to be added, but if you want to contribute it, that'd be welcome!

from litgpt.

rasbt avatar rasbt commented on June 29, 2024

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

from litgpt.

takgto avatar takgto commented on June 29, 2024

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Thanks so much for your information. It is really valuable for me.
Currently, I have a difficulty in updating the checkpoint conversion script (convert_hf_checkpoint.py) for the new model (jetmoe/jetmoe-8b). I think It needs another weight_map in the script. However, I don't find out some keys of the new model as follows.
weight_map = {
"model.embed_tokens.weight": "transformer.wte.weight",
"model.layers.{}.mlp.output_linear.weight": ?, # ? mark means unknown key
"model.layers.{}.mlp.router.layer.weight": ?,
"model.layers.{}.input_layernorm.weight":"transformer.h.{}.norm_1.weight",
"model.layers.{}.mlp.bias": ?,
"model.layers.{}.mlp.input_linear.weight": ?,
"model.layers.{}.post_attention_layernorm.weight":"transformer.h.{}.norm_2.weight",
"model.layers.{}.self_attention.experts.bias": ? ,
"model.layers.{}.self_attention.experts.input_linear.weight": ? ,
"model.layers.{}.self_attention.experts.output_linear.weight": ? ,
"model.layers.{}.self_attention.experts.router.layer.weight":"transformer.h.{}.attn.experts.out_proj.weight",
"model.layers.{}.self_attention.kv_proj.weight": ? ,
"model.norm.weight": "transformer.ln_f.weight",
"model.layers.{}.self_attention.q_proj.weight":"transformer.h.{}.attn.q_proj.weight",
"model.layers.{}.self_attention.k_proj.weight":"transformer.h.{}.attn.k_proj.weight",
"model.layers.{}.self_attention.v_proj.weight":"transformer.h.{}.attn.v_proj.weight",
}
Do you know any tools or documentations to find out those unknown keys?

from litgpt.

rasbt avatar rasbt commented on June 29, 2024

That's a good question and usually the tricky part. It can be pretty hard to find the corresponding layer some times due to naming convention and sometimes because it may not be supported yet. I think in this case the LlamaMoE might be a good template to look at:

if config.mlp_class_name == "LLaMAMoE":

from litgpt.

rasbt avatar rasbt commented on June 29, 2024

I haven't read the JetMoE paper, do they also have different attention experts? In this case, this would not be supported yet. The LlamaMoE is only for the MLP layers as in the Mixtral.

from litgpt.

rasbt avatar rasbt commented on June 29, 2024

Oh I see, the Mixture of Attention heads (MoA) part will be a bit tricky then, that's currently not supported by LitGPT and would have to be coded. It might be a bit tricky for a contribution like this

from litgpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.