Comments (2)
All these hyperparameters are first set by experience and then tested by experiments as in all neural models. The learning rate for finetuning BERT should be smaller than traiking a model from scratch as in the original transformer paper.
You can alao use your own learning schedule.
from bertsum.
Thanks for you answer ๐
from bertsum.
Related Issues (20)
- How to implement Bert + CNN
- How to enter a sentence for prediction
- Question for models/trainer.py#L325 ? HOT 3
- How to train TransformerExt baseline๏ผ
- Package Requirements - versions? HOT 1
- Hi i wonder if i want to do a multi-classification task, what should i change?
- How to continue training from previous checkpoints? HOT 4
- I got xent of 3-4 after 30000 training steps, is this normal? HOT 1
- Can someone please explain the validation metrics for me?
- Format of the .story files
- Help needed when processing my own dataset for testing.
- Questions about imbalance sentence distribution of label 1 and label 0 when training
- [CLS] similar context vector on Evaluation
- FileNotFoundError HOT 1
- Extractive Setting? HOT 2
- I also got slightly Lower Rouge score for the same code
- Order inconsistency of output candidate file with original test.json when testing bertSum Extractive HOT 1
- expected mask dtype to be Bool but got Long
- default batch_size is 3000, I don't quite understand, why so huge? HOT 1
- Training in Colab (CNN/DM)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bertsum.