Code Monkey home page Code Monkey logo

Comments (12)

mayank31398 avatar mayank31398 commented on May 12, 2024 1

https://github.com/huggingface/transformers-bloom-inference/blob/abe365066fec6e03ce0ea2cc8136f2da1254e2ea/bloom-inference-server/ds_inference/grpc_server.py#L33
@cderinbogaz I hacked my way around it for now
I pass the downloaded model path and checkpoint dict for the model I need to use and the model="bigscience/bloom"

I know this is not the most elegant method to do this :(

from deepspeed-mii.

mayank31398 avatar mayank31398 commented on May 12, 2024 1

@mrwyattii I believe your commit yesterday has fixed this?
Let me know.
I am closely watching this repo :)

from deepspeed-mii.

mayank31398 avatar mayank31398 commented on May 12, 2024

Seems like there is a check in place which is not letting the new weights work with MII

from deepspeed-mii.

mayank31398 avatar mayank31398 commented on May 12, 2024

Any updates on this?
@jeffra @RezaYazdaniAminabadi

from deepspeed-mii.

cderinbogaz avatar cderinbogaz commented on May 12, 2024

Also the same thing happens with the bigscience/bloom-350m for some reason.

I just ran the example in the README and I got the
AssertionError: text-generation only supports [.....]
error

from deepspeed-mii.

cderinbogaz avatar cderinbogaz commented on May 12, 2024

Thanks for the response @mayank31398 !
I think its a neat solution :)

from deepspeed-mii.

TahaBinhuraib avatar TahaBinhuraib commented on May 12, 2024

weight_quantizer.quantize(transpose(sd[0][prefix + 'self_attention.query_key_value.' + 'weight']))) File "/opt/conda/lib/python3.7/site-packages/deepspeed/module_inject/replace_module.py", line 100, in copy dim=self.in_dim)[self.gpu_index].to(

This is the error I got today while trying int8 inference with bloom.

from deepspeed-mii.

mayank31398 avatar mayank31398 commented on May 12, 2024

Hi @TahaBinhuraib I think MII doesn't support int8 models.
Can you try vanilla DS-inference?

https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-server
you can try running via a CLI/ deploy a generation server as given in the instructions ^^.

from deepspeed-mii.

mrwyattii avatar mrwyattii commented on May 12, 2024

The fp16 Bloom weights are now supported. Int8 models are also supported, but currently the DeepSpeed sharded int8 weights for the Bloom model will throw an error. I'm working on a fix for this and automatic loading of the sharded weights (so you don't have to manually download the weights and define the checkpoint file list). Those changes will come in #69 and likely another PR.

from deepspeed-mii.

mayank31398 avatar mayank31398 commented on May 12, 2024

Thanks @mrwyattii

from deepspeed-mii.

TahaBinhuraib avatar TahaBinhuraib commented on May 12, 2024

Thanks @mrwyattii can't wait!

from deepspeed-mii.

mrwyattii avatar mrwyattii commented on May 12, 2024

@mayank31398 @TahaBinhuraib I finally found the time to fix #69 so that it works with int8. You no longer need to download the sharded checkpoint files separately and MII will handle this for you (but it will take a while as the checkpoints are quite large). I just confirmed that it's working on my side, but if you have the opportunity to test it out, please do. The script I used:

import mii

mii_configs = {
    "dtype": "int8",
    "tensor_parallel": 4,
    "port_number": 50950,
}
name = "microsoft/bloom-deepspeed-inference-int8"

mii.deploy(task='text-generation',
           model=name,
           deployment_name="bloom_deployment",
           model_path="/data/bloom-ckpts",
           mii_config=mii_configs)

You will probably want to change the model_path parameter if you run this on your local machine.

from deepspeed-mii.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.