Do you have a plan to support JetMoE model (<a href="https://github.com/myshell-ai/Jet

Support a new model about litgpt HOT 7 OPEN

takgto commented on June 29, 2024

Support a new model

from litgpt.

Comments (7)

takgto commented on June 29, 2024 1

Thank you for your continued support.
According to the technical website of jetmoe ( https://research.myshell.ai/jetmoe ), jetmoe has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP exports (MoE) looks like ModuleFormer ( https://arxiv.org/abs/2306.04640 ). So, LlamaMoE model might not be fit to jetmoe.
Separately, I am asking the jetmoe website to provide parameter mapping information ( myshell-ai/JetMoE#11 ). Unfortunately, I haven't received a reply yet.

from litgpt.

rasbt commented on June 29, 2024

Hi there,
thanks for suggesting! New models are always welcome. JetMoE is currently not on the priority list due to many other requests and features to be added, but if you want to contribute it, that'd be welcome!

from litgpt.

rasbt commented on June 29, 2024

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

from litgpt.

takgto commented on June 29, 2024

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Thanks so much for your information. It is really valuable for me.
Currently, I have a difficulty in updating the checkpoint conversion script (convert_hf_checkpoint.py) for the new model (jetmoe/jetmoe-8b). I think It needs another weight_map in the script. However, I don't find out some keys of the new model as follows.
weight_map = {
"model.embed_tokens.weight": "transformer.wte.weight",
"model.layers.{}.mlp.output_linear.weight": ?, # ? mark means unknown key
"model.layers.{}.mlp.router.layer.weight": ?,
"model.layers.{}.input_layernorm.weight":"transformer.h.{}.norm_1.weight",
"model.layers.{}.mlp.bias": ?,
"model.layers.{}.mlp.input_linear.weight": ?,
"model.layers.{}.post_attention_layernorm.weight":"transformer.h.{}.norm_2.weight",
"model.layers.{}.self_attention.experts.bias": ? ,
"model.layers.{}.self_attention.experts.input_linear.weight": ? ,
"model.layers.{}.self_attention.experts.output_linear.weight": ? ,
"model.layers.{}.self_attention.experts.router.layer.weight":"transformer.h.{}.attn.experts.out_proj.weight",
"model.layers.{}.self_attention.kv_proj.weight": ? ,
"model.norm.weight": "transformer.ln_f.weight",
"model.layers.{}.self_attention.q_proj.weight":"transformer.h.{}.attn.q_proj.weight",
"model.layers.{}.self_attention.k_proj.weight":"transformer.h.{}.attn.k_proj.weight",
"model.layers.{}.self_attention.v_proj.weight":"transformer.h.{}.attn.v_proj.weight",
}
Do you know any tools or documentations to find out those unknown keys?

from litgpt.

rasbt commented on June 29, 2024

That's a good question and usually the tricky part. It can be pretty hard to find the corresponding layer some times due to naming convention and sometimes because it may not be supported yet. I think in this case the LlamaMoE might be a good template to look at:

litgpt/litgpt/scripts/convert_hf_checkpoint.py

Line 138 in e2f8074

if config.mlp_class_name == "LLaMAMoE":

from litgpt.

rasbt commented on June 29, 2024

I haven't read the JetMoE paper, do they also have different attention experts? In this case, this would not be supported yet. The LlamaMoE is only for the MLP layers as in the Mixtral.

from litgpt.

rasbt commented on June 29, 2024

Oh I see, the Mixture of Attention heads (MoA) part will be a bit tricky then, that's currently not supported by LitGPT and would have to be coded. It might be a bit tricky for a contribution like this

from litgpt.

Support a new model about litgpt HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent