<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

Here is a simplified if you do not need model parallel: <div class="highlig

Here is a simplified if you do not need model parallel: <div c

FYI: <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-i

Could you please provide a simple script to use your multimodel like huggingface or other multimodels? about cogvlm HOT 13 CLOSED

xinsir6 commented on May 18, 2024

Could you please provide a simple script to use your multimodel like huggingface or other multimodels?

from cogvlm.

Comments (13)

1049451037 commented on May 18, 2024 4

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

from cogvlm.

xinsir6 commented on May 18, 2024

Thank you very much, I will try it to tag captions for the images collected from the internet.

from cogvlm.

waltonfuture commented on May 18, 2024

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

How should I change the scripts to conduct inference on multiple GPUs (2*4090)?

from cogvlm.

1049451037 commented on May 18, 2024

cli_demo.py and web_demo.py both support multiple GPUs. The commands to run them are introduced in README.md.

You can try simplifying them if you think they are not simple enough.

from cogvlm.

waltonfuture commented on May 18, 2024

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

I met this bug when running this code. Could you help with it?

from cogvlm.

1049451037 commented on May 18, 2024

Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.

from cogvlm.

waltonfuture commented on May 18, 2024

Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.

Thanks a lot! I have fixed this problem. Btw, does cogvlm support multiple images as input?

from cogvlm.

1049451037 commented on May 18, 2024

FYI: #38

from cogvlm.

xinsir6 commented on May 18, 2024

Can you provide a more faster version, such as 4bit/8bit quantize or multiple GPU inference?

from cogvlm.

1049451037 commented on May 18, 2024

FYI: #75

from cogvlm.

xinsir6 commented on May 18, 2024

Here is a simplified script if you do not need model parallel:

from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse

# load model
model, model_args = CogVLMModel.from_pretrained(
    "cogvlm-chat",
    args=argparse.Namespace(
        deepspeed=None,
        local_rank=0,
        rank=0,
        world_size=1,
        model_parallel_size=1,
        mode='inference',
        skip_init=True,
        fp16=False,
        bf16=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model = model.eval()

tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)

with torch.no_grad():
    response, history, cache_image = chat(
        "fewshot-data/kobe.png", 
        model, 
        text_processor_infer,
        image_processor,
        "Describe the image.", 
        history=[],
        max_length=2048, 
        top_p=0.4, 
        temperature=0.8,
        top_k=1,
        invalid_slices=text_processor_infer.invalid_slices,
        no_prompt=False
        )
    print(response)

In this scripts, how to set the GPU ids the model loaded, I want to load all model parameter in one GPU card so that I can caption mutiple images using multiple GPUs. However, i tried many setting in the local_rank, rank and device but still got parameters loaded in GPU0, can you provide some advice?

from cogvlm.

1049451037 commented on May 18, 2024

You should set CUDA_VISIBLE_DEVICES at the very beginning of your code, instead of middle of your code.

Moreover, if you set your visible devices to 3. You should set your device to cuda:0 because card 3 is cuda:0 for now.

from cogvlm.

xinsir6 commented on May 18, 2024

Yes, you are right, respect!!!!

from cogvlm.

Could you please provide a simple script to use your multimodel like huggingface or other multimodels? about cogvlm HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent