Code Monkey home page Code Monkey logo

Comments (6)

raulpuric avatar raulpuric commented on July 21, 2024

Hi the config files are the same, however we've had to change some layers' structure and names to account for model parallelism.

We wrote a helper function to allow us to load huggingface/openai weights into our model for debugging purposes. It may work in the reverse direction, but we haven't tested either direction in a while. Please shout if it doesn't work for you.

Note this will not work for model parallel weights, we're still working on our serialization strategies to ship/port model parallel weights.

Raul

from megatron-lm.

harkous avatar harkous commented on July 21, 2024

Thanks a lot Raul. I will try the function you mentioned and get back on that.

from megatron-lm.

jaredcasper avatar jaredcasper commented on July 21, 2024

Closing this since it has been a while; I hope that worked out for you! If you have more questions please reopen or start a new issue.

from megatron-lm.

PyxAI avatar PyxAI commented on July 21, 2024

@jaredcasper Any chance on news on the parallel loading part?
I want to import gpt2 pretrained weights and use Megatron to train in parallel

from megatron-lm.

jaredcasper avatar jaredcasper commented on July 21, 2024

@PyxAI Still in the plans, but not a priority so won't hit the repo for another month or two. Note that we plan to only support loading weights, not optimizer state or anything.

from megatron-lm.

usuyama avatar usuyama commented on July 21, 2024

@jaredcasper

I found that the helper function is deleted from the master. I'm also interested in loading huggingface model weights and continue pretraining using Megatron-LM. Do you have current suggestions?

Found the utils.py from the history https://github.com/NVIDIA/Megatron-LM/blob/c882ac61182d423a89d21b453251a20fb7271a67/megatron/utils.py

from megatron-lm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.