muqiujun-ai / bert4pytorch Goto Github PK

View Code? Open in Web Editor NEW

398.0 398.0 67.0 122 KB

超轻量级bert的pytorch版本，大量中文注释，容易修改结构，持续更新

Python 100.00%

bert nlp pytorch transformer

bert4pytorch's People

Contributors

Stargazers

Watchers

Forkers

three-continent-practice intelligent-edge-group yishengduwu huhang-koko5 createrll hellomlwo victorshawfan brucew91 xiaohan-chen dwz9406 bamboopu bupt-yxy strategist922 zijun0410 allensmile laurasanchz2 abraham2001 auto-ml shyustc yuanxzmst csnowhermit limitedfxw joshua0128 zshy1205 zzhaobh xmxoxo hurun yale1417 zhengtongopu junix baixiaoxiao07 jmhicoding 7clearlove71 huhengkai tinmnit snaildm fuxuliu minmintemuerzhao battlenet996 sichitong jamboneylj yechuze jianliu-ml xxentropy wangyi120226 minghsuanwu brianliu2 zhaisilong yaoyao20050321 ucasren xiaoduozhou argb tim-taoxq schifflee xdg2016 arctanxy charleoy yaoweihu xi2001 aka-wen majorization beiluomi mengshanxi 1258505152 sunrise0906 heathen0449

bert4pytorch's Issues

KeyError: 'bert.embeddings.LayerNorm.gamma'

有人知道这个问题改怎么解决吗
File "F:\software\Anaconda\envs\tensor\lib\site-packages\bert4pytorch\modeling.py", line 71, in load_weights_from_pytorch_checkpoint
state_dict[new_key] = state_dict.pop(old_key)
KeyError: 'bert.embeddings.LayerNorm.gamma'

导入hugging face 的bert-base-chinese存在bug

modelling文件中variable mapping函数中的mapping存在bug.
hugging face的模型文件中layerNorm参数用的是gamma和beta, 作者给的是weight bias, 不匹配

basic_masked_language_model.py

这个example中
输入：[CLS]科学[MASK][MASK]是第一生产力[SEP]
预测出来的结果是，，两个逗号,而不是技术
使用的模型是hugging face 模型库中的bert-base-chinese。
模型加载过程中出现大量警告：

请问是啥问题？

请问/chinese_L-12_H-768_A-12/下的文件到哪里去下载呢

批量加载数据问题

现在版本的分词似乎是不支持批量数据加载的，是吗？

返回 embedding 和 huggingface 的返回结果不完全一致

比如 bert-base-chinese，作者是否有做过这方面的评估测试呀～

loss of file

can not found 'config.json'

LayerNorm 类有个小错误

if conditional:
self.dense1 = nn.Linear(2 * hidden_size, hidden_size, bias=False)
self.dense.weight.data.uniform_(0, 0) -------> 此处应该self.dense1, 下边的self.dense2 也是一样的

RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!

能提供一下这个的安装版本嘛

是否实现了全词掩码wwm呢？

Transformer Quality in Linear Time gate control unit and FLASH code

Transformer Quality in Linear Time

Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le
We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences. First, we propose a simple layer named gated attention unit, which allows the use of a weaker single-head attention with minimal quality loss. We then propose a linear approximation method complementary to this new layer, which is accelerator-friendly and highly competitive in quality. The resulting model, named FLASH, matches the perplexity of improved Transformers over both short (512) and long (8K) context lengths, achieving training speedups of up to 4.9× on Wiki-40B and 12.1× on PG-19 for auto-regressive language modeling, and 4.8× on C4 for masked language modeling.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Cite as: arXiv:2202.10447 [cs.LG]
(or arXiv:2202.10447v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2202.10447

https://arxiv.org/abs/2202.10447

basic_masked_language_model.py输出结果总是"the"

sentence = "北京[MASK]安门"

huggingface上下载的bert-base-uncased模型，预测的结果总是"the"，为什么

LabelSmoothingCrossEntropy中的疑问

LabelSmoothingCrossEntropy这个函数最终返回的总loss的前半部分: loss*self.eps/c ，这里c是类别个数，我发现有的公式里写的这里应该是除以类别个数减一。
请教一下到底要不要减一