Code Monkey home page Code Monkey logo

Comments (13)

1049451037 avatar 1049451037 commented on May 18, 2024 4

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

from cogvlm.

xinsir6 avatar xinsir6 commented on May 18, 2024

Thank you very much, I will try it to tag captions for the images collected from the internet.

from cogvlm.

waltonfuture avatar waltonfuture commented on May 18, 2024

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

How should I change the scripts to conduct inference on multiple GPUs (2*4090)?

from cogvlm.

1049451037 avatar 1049451037 commented on May 18, 2024

cli_demo.py and web_demo.py both support multiple GPUs. The commands to run them are introduced in README.md.

You can try simplifying them if you think they are not simple enough.

from cogvlm.

waltonfuture avatar waltonfuture commented on May 18, 2024

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

image
I met this bug when running this code. Could you help with it?

from cogvlm.

1049451037 avatar 1049451037 commented on May 18, 2024

Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.

from cogvlm.

waltonfuture avatar waltonfuture commented on May 18, 2024

Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.

Thanks a lot! I have fixed this problem. Btw, does cogvlm support multiple images as input?

from cogvlm.

1049451037 avatar 1049451037 commented on May 18, 2024

FYI: #38

from cogvlm.

xinsir6 avatar xinsir6 commented on May 18, 2024

Can you provide a more faster version, such as 4bit/8bit quantize or multiple GPU inference?

from cogvlm.

1049451037 avatar 1049451037 commented on May 18, 2024

FYI: #75

from cogvlm.

xinsir6 avatar xinsir6 commented on May 18, 2024

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

In this scripts, how to set the GPU ids the model loaded, I want to load all model parameter in one GPU card so that I can caption mutiple images using multiple GPUs. However, i tried many setting in the local_rank, rank and device but still got parameters loaded in GPU0, can you provide some advice?

image
image

from cogvlm.

1049451037 avatar 1049451037 commented on May 18, 2024

You should set CUDA_VISIBLE_DEVICES at the very beginning of your code, instead of middle of your code.

Moreover, if you set your visible devices to 3. You should set your device to cuda:0 because card 3 is cuda:0 for now.

from cogvlm.

xinsir6 avatar xinsir6 commented on May 18, 2024

Yes, you are right, respect!!!!

from cogvlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.