Code Monkey home page Code Monkey logo

Comments (4)

DtYXs avatar DtYXs commented on May 29, 2024

您好,目前在启动flash-attn训练时,保存的ckpt格式与不启动是完全一致的。因此用flash-attn训练得到的ckpt应该是直接可以load进来的,您可以先尝试一下。

"state_dict": model.state_dict() if not args.use_flash_attention else convert_state_dict(model.state_dict()),

from chinese-clip.

ZechengLi19 avatar ZechengLi19 commented on May 29, 2024

@DtYXs 感谢您的回复,但是您好像误解了我的意思。

我想做的事情是,在我自己写的代码段中,直接调用load_from_name函数得到模型,并且该模型具有直接切换为flash-attn模式的功能。但是目前的load_from_name这个方法并没有提供flash-attn的选项~

from chinese-clip.

DtYXs avatar DtYXs commented on May 29, 2024

@ZechengLi19 我明白你的意思~我理解目前代码中定义的flash-attn格式只适用Chinese-CLIP这一个项目,而Chinese-CLIP训练得到的模型会自动将flash-attn模型转化为正常模式,所以我想知道目前是在什么情况下需要load一个flash-attn格式的模型呢。

from chinese-clip.

ZechengLi19 avatar ZechengLi19 commented on May 29, 2024

@DtYXs 比如说,我想把你训练好的chinese-clip用到其他下游任务中。

那我可能会有一个该下游任务的一个baseline代码,那我想换一个backbone的话,就希望调用load_from_name函数创建一个clip的backbone,如果我进一步的想微调clip的话,我觉得加上一个flash-attn可以更加好的帮助我代码的加速,这样~

也就是说,我把你的仓库当作一个包来用,那我其实就只需要看到load_from_name这一个函数,如果有flash-attn的支持可能会帮助到更多人用到下游中?

from chinese-clip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.