Comments (4)
@egg-west Well the reason why I used this layer norm is "Attention All you need" implementation Annotated Transformer
used this code, and just copied from there. So.. if anyone can answer this question, would be seriously awesome
from bert-pytorch.
I believe that they should do similar things, however there is a difference in implementation.
For a given input:
x = torch.tensor([1.,0.,0.,0.])
The Annotated Transformer version gives the output:
tensor([ 1.5000, -0.5000, -0.5000, -0.5000], grad_fn=<ThAddBackward>)
While torch.nn.LayerNorm gives:
tensor([ 1.7320, -0.5773, -0.5773, -0.5773], grad_fn=<AddcmulBackward>)
The layer_norm implementation in PyTorch is here:
https://github.com/pytorch/pytorch/blob/cca247635c6edb323176eeac7a18d3e9ab71c558/caffe2/python/helpers/normalization.py
from bert-pytorch.
@egg-west Is your question is solved? 👍
from bert-pytorch.
Thank you for your clarification, I guess pulling the epsilon out of sqrt may speed up the computation.
But yes, they did the same thing.
from bert-pytorch.
Related Issues (20)
- how to fine tune model with trained weight
- GELU is available in PyTorch HOT 1
- 请问训练得到的模型后缀为.model.ep*格式,应该如何进行后续的调用呢?
- Why Segment Embedding number only 3? HOT 1
- Clarification on Padding Process in BERT Model Construction
- how to do Ner
- transformer.py 中的forword方法调用的SublayerConnection类。实现残差链接和标准化的实现 HOT 1
- .
- In Next Sentence Prediction task,the original code may choose the same line when you try to use the negative sample
- An error occurred【AttributeError: type object 'BERT' has no attribute 'hidden'】
- IndexError HOT 6
- The exact English pretraining data and Chinese pretraining data that are exact same to the BERT paper's pretraining data.
- why language_model.py has different vectors HOT 1
- Why not use torch.no_grad when evaluating test data? HOT 1
- dataset / dataset.py have one erro? HOT 1
- It keeps trying to use CUDA despite --with_cuda False option
- Pooler layer?
- bert-vocab? HOT 1
- why specify `ignore_index=0` in the NLLLoss function in BERTTrainer? HOT 1
- What dataset did you use to train model? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bert-pytorch.