Comments (5)
@YannDubs Sorry for the late response here, some travel.
MaxText has historically been focused on the largest customers who were training custom models of their own design. So we've only been focused on making sure there was correctness and prioritizing perf/scalability. But we've assumed pretraining customers would have their own secret sauce regarding convergence, etc. To demonstrate correctly, we verify we can directly reproduce Chinchilla results:
https://github.com/google/maxtext/blob/main/end_to_end/test_convergence_1b_params.sh
We've gotten a lot of interest in off-the-shelf models that appeal to different folks so we've been adding support for more models. (Now Gemma, Llama and Mistral.)
We also have high performance inference coming soon.
But I think you're asking for something more. Happy to talk live as well, (rwitten at google.com)
from maxtext.
Thanks @rwitten, something like the Chinchilla results was what I was asking about but I was hoping to see the actual training curves and final evaluation results to be able to compare to (1) be able to compare to the original results, and (2) have a reference curve to compare with when modifying the configs/model.
Thanks!
from maxtext.
@gobbleturk can you provide?
from maxtext.
I've uploaded loss curve data using test_convergence_1b_params.sh here
Here is a screenshot of some learning metrics from that run that we display via tensorboard:
from maxtext.
Great, thanks!
from maxtext.
Related Issues (20)
- Support for T5 HOT 4
- Supported features HOT 9
- Issues running test_llama2_7b.sh on TPU VM v3-8 HOT 1
- Gemma instructions were deleted in commit HOT 2
- Support Qwen1.5 HOT 1
- Support beam search
- Cannot do inference in float32 HOT 2
- Support for RecurrentGemma
- Clarification: how does Llama-2-7b fit on a v4-8 when using Adam? HOT 3
- Question: Gradient Accumulation HOT 4
- Support LoRA training HOT 1
- Consolidate inference related logic under jetstream-maxtext HOT 1
- DEFAULT_MASK_VALUE causes gradient explosion and nan loss on deep models HOT 1
- Asignación HOT 1
- Reproducing pure computation TFLOPs HOT 4
- How to convert a model to parameter only checkpoints (unscanned) on a CPU VM HOT 2
- Update Inference Microbenchmark scripts
- llama_or_mistral_ckpt.py file requiring checkpoints in local file system
- Llama3 HOT 1
- Eval on C4?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maxtext.