The chatterbox from enze5088

chatterbox's Introduction

Hi there 👋

I'm Enze, a research engineer in Natural language processing.

My research interests are in Natural language processing, recommender systems, computational advertising, and information retrieval, and broader interests include comparative learning, multi-modal pre-training model, transfer learning, and their applications in advertising, search, and recommendation.

Coding is probably my favorite thing. If you have anything else want to know, please drop me an email, or comment a message on my Website.

chatterbox's People

Contributors

Stargazers

Watchers

chatterbox's Issues

关于MLP层一些疑问

class LlamaMLP(nn.Module):
def init(
self,
hidden_size: int,
intermediate_size: int,
hidden_act: str,
):
super().init()
self.gate_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
self.down_proj = nn.Linear(intermediate_size, hidden_size, bias=False)
self.up_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
self.act_fn = ACT2FN[hidden_act]

def forward(self, x):
    return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))

这个是llama的mlp层，大佬知道为什么是映射一层self.gate_proj 然后激活后点积一层self.up_proj 再进行self.up_proj的映射吗，按照类似transfomer的FFN结构应该没有self.gate_proj(x)) * self.up_proj(x)，类似的操作

enze5088 / chatterbox Goto Github PK

chatterbox's Introduction

Hi there 👋

chatterbox's People

Contributors

Stargazers

Watchers

Forkers

chatterbox's Issues

关于MLP层一些疑问

关于重新初始化模型进行预训练

block_size和max_position_embeddings

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent