Hi, Thanks for the library! I'm new to the JAX+LLM ecosystem and try

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Question] are there some some train replication results? about maxtext HOT 5 CLOSED

YannDubs commented on June 26, 2024

[Question] are there some some train replication results?

from maxtext.

Comments (5)

rwitten commented on June 26, 2024

@YannDubs Sorry for the late response here, some travel.

MaxText has historically been focused on the largest customers who were training custom models of their own design. So we've only been focused on making sure there was correctness and prioritizing perf/scalability. But we've assumed pretraining customers would have their own secret sauce regarding convergence, etc. To demonstrate correctly, we verify we can directly reproduce Chinchilla results:
https://github.com/google/maxtext/blob/main/end_to_end/test_convergence_1b_params.sh

We've gotten a lot of interest in off-the-shelf models that appeal to different folks so we've been adding support for more models. (Now Gemma, Llama and Mistral.)

We also have high performance inference coming soon.

But I think you're asking for something more. Happy to talk live as well, (rwitten at google.com)

from maxtext.

YannDubs commented on June 26, 2024

Thanks @rwitten, something like the Chinchilla results was what I was asking about but I was hoping to see the actual training curves and final evaluation results to be able to compare to (1) be able to compare to the original results, and (2) have a reference curve to compare with when modifying the configs/model.

Thanks!

from maxtext.

rwitten commented on June 26, 2024

@gobbleturk can you provide?

from maxtext.

gobbleturk commented on June 26, 2024

I've uploaded loss curve data using test_convergence_1b_params.sh here

Here is a screenshot of some learning metrics from that run that we display via tensorboard:

from maxtext.

YannDubs commented on June 26, 2024

Great, thanks!

from maxtext.

Recommend Projects

[Question] are there some some train replication results? about maxtext HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent