Code Monkey home page Code Monkey logo

Comments (9)

Sengxian avatar Sengxian commented on July 20, 2024

Hello @hamza-alethea. Sorry, the current conversion script is for the checkpoint of our training framework, we will fix it at very soon. By the way, FasterTransformer does not support inference on V100 machines. A quantized version of the GLM-130B that allows efficient INT8 inference on the V100 will be released in the next few days, so please be patient and keep an eye on our GitHub repo.

from glm-130b.

hamza-alethea avatar hamza-alethea commented on July 20, 2024

Thank you!
Do you know any other method which helps me to reduce response time for GLM-130B?

from glm-130b.

Sengxian avatar Sengxian commented on July 20, 2024

We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.

from glm-130b.

jiangliqin avatar jiangliqin commented on July 20, 2024

We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
Hello,the Quantization method referred in the link can also apply to GLM-10B model?

from glm-130b.

Sengxian avatar Sengxian commented on July 20, 2024

We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
Hello,the Quantization method referred in the link can also apply to GLM-10B model?

We haven't tried it, but I think a smaller model might be easier to do quantization.

from glm-130b.

jiangliqin avatar jiangliqin commented on July 20, 2024

We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
Hello,the Quantization method referred in the link can also apply to GLM-10B model?

We haven't tried it, but I think a smaller model might be easier to do quantization.

i juset convert to 4-way for the GLM-10B,but i run the generate_block.sh,but failed to load the model
image

from glm-130b.

jiangliqin avatar jiangliqin commented on July 20, 2024

We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
Hello,the Quantization method referred in the link can also apply to GLM-10B model?

We haven't tried it, but I think a smaller model might be easier to do quantization.

i juset convert to 4-way for the GLM-10B,but i run the generate_block.sh,but failed to load the model image

i change the MPSIZE to4 in the script

from glm-130b.

jiangliqin avatar jiangliqin commented on July 20, 2024

用FasterTransformer加速推理中指定某个pt,还是整个pt的文件夹?
image

from glm-130b.

Sengxian avatar Sengxian commented on July 20, 2024

Hello @jiangliqin! This repo is only for GLM-130B model, we have not yet done quantization for GLM-10B.

from glm-130b.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.