Comments (7)
@iOSGeekerOfChina I didn't decide yet, just started this project one hour ago haha.
Do you think using the dataset which referred on paper is good idea?
Or have some another good idea? thanx 👍
from bert-pytorch.
Maybe you can try some multilingual corpus not just English, hah
from bert-pytorch.
@MrRace Love to do it, if I have enough lot's of 2080ti. https://twitter.com/Tim_Dettmers/status/1050787783004942336
Regarding compute for BERT: Uses 256 TPU-hours similar to the OpenAI model. Lots of TPUs parallelize about 25% better than GPUs. RTX 2080 Ti and V100 should be ~70% matmul and ~90% matmul perf vs TPU if you use 16-bit (important!). BERT ~= 375 RTX 2080 Ti days or 275 V100 days.
from bert-pytorch.
@crazyofapple Totally agree haha. Now I'm trying to train this model with korean corpus with 1080ti x2. But seriously, the model is too big for individual researcher.... we need some NASA Scale GPU power.
from bert-pytorch.
Just the same dataset with raw paper,I think maybe better
from bert-pytorch.
@codertimo
http://timdettmers.com/2018/10/17/tpus-vs-gpus-for-transformers-bert/
from bert-pytorch.
On a standard, affordable GPU machine with 4 GPUs one can expect to train BERT for about 99 days using 16-bit or about 21 days using 8-bit.
Haha 99 days LoL.
from bert-pytorch.
Related Issues (20)
- how to fine tune model with trained weight
- GELU is available in PyTorch HOT 1
- 请问训练得到的模型后缀为.model.ep*格式,应该如何进行后续的调用呢?
- Why Segment Embedding number only 3? HOT 1
- Clarification on Padding Process in BERT Model Construction
- how to do Ner
- transformer.py 中的forword方法调用的SublayerConnection类。实现残差链接和标准化的实现 HOT 1
- .
- In Next Sentence Prediction task,the original code may choose the same line when you try to use the negative sample
- An error occurred【AttributeError: type object 'BERT' has no attribute 'hidden'】
- IndexError HOT 6
- The exact English pretraining data and Chinese pretraining data that are exact same to the BERT paper's pretraining data.
- why language_model.py has different vectors HOT 1
- Why not use torch.no_grad when evaluating test data? HOT 1
- dataset / dataset.py have one erro? HOT 1
- It keeps trying to use CUDA despite --with_cuda False option
- Pooler layer?
- bert-vocab? HOT 1
- why specify `ignore_index=0` in the NLLLoss function in BERTTrainer? HOT 1
- What dataset did you use to train model? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bert-pytorch.