Comments (3)
Do you mean text output of the trained model? If so, you can see some sample output in the appendix of a paper we just put up on arXiv: https://arxiv.org/abs/1909.08053.
from megatron-lm.
Thanks, I see them.
So these are prompt-completions, correct? So it's GPT-2 on steroids haha?
Btw how well do you understand Transformers? I've spoke with many freelancers and collaborators and they seemed to not fully grasp it, none could re-code the Paper from scratch or explain every atom of the complete thing.
from megatron-lm.
Yes, for the samples in that paper GPT-2 is the underlying model (the code in this repo can also do BERT models).
After writing this code and training these models from scratch I think those of us working on this project can all say we have a pretty good understanding of transformers. :) All the information is there in the various papers, but it can take some determination to really understand it all. Now that there are several open source implementations studying the code is a good way to get to the nuts and bolts. Hopefully this code can help a bit!
from megatron-lm.
Related Issues (20)
- [QUESTION] Why set requires_grad=True for token unpermutation in MoE HOT 1
- [BUG] Unnecessary initialization for router in megatron-core HOT 1
- [ENHANCEMENT] Enable non-gelu activations for BERT LM Head
- [QUESTION] When i use --use-dist-ckpt to load ,there is something error and I can't tell if it's my configuration problem or the code problem.
- [BUG] Getting distributed rank in save_checkpoint when torch.distributed is not initialized. HOT 5
- [QUESTION] Why Megatron choose sync style training? HOT 1
- [BUG] Failed to load the megatron_mixtral checkpoint HOT 2
- [QUESTION] How Do NCCL_ALGO and Flash Attention Affect Deterministic Training in Megatron? HOT 1
- VocabParallelEmbedding
- How to train multiple binariey files at the same time or merge them?
- too many .bin files for dataloader, crashed
- what's the biggest dataset you've tried?
- how to install llama package HOT 1
- [BUG] Resource Leak When Profile Parameter is Enabled
- [QUESTION] add_position_embedding=False in checkpoint_args during Llama3 8B training HOT 2
- [BUG]Get an AtrributeError when trying to convert llama3-8B model from HF format to mcore format
- terminate called after throwing an instance of 'c10::DistBackendError'
- [QUESTION] Calculations regarding calculate_per_token_loss parameter
- [BUG]Get an AtrributeError when trying to finetune llama3-8B model with multi nodes
- When will megatron Flash attention 3 be supported?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from megatron-lm.