Code Monkey home page Code Monkey logo

bert4pytorch's People

Contributors

muqiujun-ai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bert4pytorch's Issues

KeyError: 'bert.embeddings.LayerNorm.gamma'

有人知道这个问题改怎么解决吗
File "F:\software\Anaconda\envs\tensor\lib\site-packages\bert4pytorch\modeling.py", line 71, in load_weights_from_pytorch_checkpoint
state_dict[new_key] = state_dict.pop(old_key)
KeyError: 'bert.embeddings.LayerNorm.gamma'

basic_masked_language_model.py

这个example中
输入:[CLS]科学[MASK][MASK]是第一生产力[SEP]
预测出来的结果是,,两个逗号,而不是技术
使用的模型是hugging face 模型库中的bert-base-chinese。
模型加载过程中出现大量警告:
image

请问是啥问题?

LayerNorm 类有个小错误

if conditional:
self.dense1 = nn.Linear(2 * hidden_size, hidden_size, bias=False)
self.dense.weight.data.uniform_(0, 0) -------> 此处应该self.dense1, 下边的self.dense2 也是一样的

Transformer Quality in Linear Time gate control unit and FLASH code

Transformer Quality in Linear Time

Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le
We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences. First, we propose a simple layer named gated attention unit, which allows the use of a weaker single-head attention with minimal quality loss. We then propose a linear approximation method complementary to this new layer, which is accelerator-friendly and highly competitive in quality. The resulting model, named FLASH, matches the perplexity of improved Transformers over both short (512) and long (8K) context lengths, achieving training speedups of up to 4.9× on Wiki-40B and 12.1× on PG-19 for auto-regressive language modeling, and 4.8× on C4 for masked language modeling.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Cite as: arXiv:2202.10447 [cs.LG]
(or arXiv:2202.10447v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2202.10447

https://arxiv.org/abs/2202.10447

LabelSmoothingCrossEntropy中的疑问

LabelSmoothingCrossEntropy这个函数最终返回的总loss的前半部分: loss*self.eps/c ,这里c是类别个数,我发现有的公式里写的这里应该是除以类别个数减一。
请教一下到底要不要减一

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.