Comments (6)
It should work on a v3-8.
You can also try the decode.py
function but for me it worked on the 7b models (gemma or mistral).
from maxtext.
Amazing @borisdayma! We don't actually official support Mistral (we do support Llama and Gemma) but we're thrilled things are working for you!
from maxtext.
Looks like this is actually available: https://github.com/google/maxtext/blob/main/end_to_end/test_mistral.sh
The only thing I had to do was replace tokenizer.mistral
with tokenizer.model
(is it a typo or did you rename it in your bucket?).
Also I chose to convert the bfloat16
weights to float32
instead of float16
which I think could bring some imprecision.
from maxtext.
Can I ask what kind of TPU are you using for the test, @borisdayma? I do have available a v4-32 that I'd like to use to do continue pre-training on Llama2/Mistral 7B, but other frameworks seem sub-optimal so far to me.
from maxtext.
Yeah your inference test of mistral is correct. I compared with transformers
output and was getting the same.
from maxtext.
I'm closing this issue because Mistral seems to already work well after further testing.
from maxtext.
Related Issues (20)
- Support for T5 HOT 4
- Supported features HOT 9
- Issues running test_llama2_7b.sh on TPU VM v3-8 HOT 1
- Gemma instructions were deleted in commit HOT 2
- Support Qwen1.5 HOT 1
- Support beam search
- Cannot do inference in float32 HOT 2
- Support for RecurrentGemma
- Clarification: how does Llama-2-7b fit on a v4-8 when using Adam? HOT 3
- Question: Gradient Accumulation HOT 4
- Support LoRA training HOT 1
- Consolidate inference related logic under jetstream-maxtext HOT 1
- DEFAULT_MASK_VALUE causes gradient explosion and nan loss on deep models HOT 1
- Asignación HOT 1
- Reproducing pure computation TFLOPs HOT 4
- How to convert a model to parameter only checkpoints (unscanned) on a CPU VM HOT 2
- Update Inference Microbenchmark scripts
- llama_or_mistral_ckpt.py file requiring checkpoints in local file system
- Llama3 HOT 1
- Eval on C4?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maxtext.