Comments (13)
Here is a simplified script if you do not need model parallel:
from models.cogvlm_model import CogVLMModel
from utils.language import llama2_tokenizer, llama2_text_processor_inference
from utils.vision import get_image_processor
from utils.chat import chat
from sat.model.mixins import CachedAutoregressiveMixin
import argparse
# load model
model, model_args = CogVLMModel.from_pretrained(
"cogvlm-chat",
args=argparse.Namespace(
deepspeed=None,
local_rank=0,
rank=0,
world_size=1,
model_parallel_size=1,
mode='inference',
skip_init=True,
fp16=False,
bf16=True,
use_gpu_initialization=True,
device='cuda',
))
model = model.eval()
tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat")
image_processor = get_image_processor(model_args.eva_args["image_size"][0])
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length)
with torch.no_grad():
response, history, cache_image = chat(
"fewshot-data/kobe.png",
model,
text_processor_infer,
image_processor,
"Describe the image.",
history=[],
max_length=2048,
top_p=0.4,
temperature=0.8,
top_k=1,
invalid_slices=text_processor_infer.invalid_slices,
no_prompt=False
)
print(response)
from cogvlm.
Thank you very much, I will try it to tag captions for the images collected from the internet.
from cogvlm.
Here is a simplified script if you do not need model parallel:
from models.cogvlm_model import CogVLMModel from utils.language import llama2_tokenizer, llama2_text_processor_inference from utils.vision import get_image_processor from utils.chat import chat from sat.model.mixins import CachedAutoregressiveMixin import argparse # load model model, model_args = CogVLMModel.from_pretrained( "cogvlm-chat", args=argparse.Namespace( deepspeed=None, local_rank=0, rank=0, world_size=1, model_parallel_size=1, mode='inference', skip_init=True, fp16=False, bf16=True, use_gpu_initialization=True, device='cuda', )) model = model.eval() tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat") image_processor = get_image_processor(model_args.eva_args["image_size"][0]) model.add_mixin('auto-regressive', CachedAutoregressiveMixin()) text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length) with torch.no_grad(): response, history, cache_image = chat( "fewshot-data/kobe.png", model, text_processor_infer, image_processor, "Describe the image.", history=[], max_length=2048, top_p=0.4, temperature=0.8, top_k=1, invalid_slices=text_processor_infer.invalid_slices, no_prompt=False ) print(response)
How should I change the scripts to conduct inference on multiple GPUs (2*4090)?
from cogvlm.
cli_demo.py
and web_demo.py
both support multiple GPUs. The commands to run them are introduced in README.md.
You can try simplifying them if you think they are not simple enough.
from cogvlm.
Here is a simplified script if you do not need model parallel:
from models.cogvlm_model import CogVLMModel from utils.language import llama2_tokenizer, llama2_text_processor_inference from utils.vision import get_image_processor from utils.chat import chat from sat.model.mixins import CachedAutoregressiveMixin import argparse # load model model, model_args = CogVLMModel.from_pretrained( "cogvlm-chat", args=argparse.Namespace( deepspeed=None, local_rank=0, rank=0, world_size=1, model_parallel_size=1, mode='inference', skip_init=True, fp16=False, bf16=True, use_gpu_initialization=True, device='cuda', )) model = model.eval() tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat") image_processor = get_image_processor(model_args.eva_args["image_size"][0]) model.add_mixin('auto-regressive', CachedAutoregressiveMixin()) text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length) with torch.no_grad(): response, history, cache_image = chat( "fewshot-data/kobe.png", model, text_processor_infer, image_processor, "Describe the image.", history=[], max_length=2048, top_p=0.4, temperature=0.8, top_k=1, invalid_slices=text_processor_infer.invalid_slices, no_prompt=False ) print(response)
I met this bug when running this code. Could you help with it?
from cogvlm.
Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.
from cogvlm.
Seems like your cuda driver is too old. Your PyTorch should be built with the corresponding cuda version as your machine.
Thanks a lot! I have fixed this problem. Btw, does cogvlm support multiple images as input?
from cogvlm.
FYI: #38
from cogvlm.
Can you provide a more faster version, such as 4bit/8bit quantize or multiple GPU inference?
from cogvlm.
FYI: #75
from cogvlm.
Here is a simplified script if you do not need model parallel:
from models.cogvlm_model import CogVLMModel from utils.language import llama2_tokenizer, llama2_text_processor_inference from utils.vision import get_image_processor from utils.chat import chat from sat.model.mixins import CachedAutoregressiveMixin import argparse # load model model, model_args = CogVLMModel.from_pretrained( "cogvlm-chat", args=argparse.Namespace( deepspeed=None, local_rank=0, rank=0, world_size=1, model_parallel_size=1, mode='inference', skip_init=True, fp16=False, bf16=True, use_gpu_initialization=True, device='cuda', )) model = model.eval() tokenizer = llama2_tokenizer("lmsys/vicuna-7b-v1.5", signal_type="chat") image_processor = get_image_processor(model_args.eva_args["image_size"][0]) model.add_mixin('auto-regressive', CachedAutoregressiveMixin()) text_processor_infer = llama2_text_processor_inference(tokenizer, None, model.image_length) with torch.no_grad(): response, history, cache_image = chat( "fewshot-data/kobe.png", model, text_processor_infer, image_processor, "Describe the image.", history=[], max_length=2048, top_p=0.4, temperature=0.8, top_k=1, invalid_slices=text_processor_infer.invalid_slices, no_prompt=False ) print(response)
In this scripts, how to set the GPU ids the model loaded, I want to load all model parameter in one GPU card so that I can caption mutiple images using multiple GPUs. However, i tried many setting in the local_rank, rank and device but still got parameters loaded in GPU0, can you provide some advice?
from cogvlm.
You should set CUDA_VISIBLE_DEVICES
at the very beginning of your code, instead of middle of your code.
Moreover, if you set your visible devices to 3
. You should set your device to cuda:0
because card 3 is cuda:0
for now.
from cogvlm.
Yes, you are right, respect!!!!
from cogvlm.
Related Issues (20)
- 用cogagent-chat-hf直接做预测结果中出现数字列表(basic-demo) HOT 4
- 关于模型量化 HOT 14
- GPU selection / multi-GPU HOT 2
- Deploy HOT 2
- Chat using one image and three prompt
- CogVLM是开放中文模型了吗,开源模型是否已经支持中文提问回答以及中文数据微调呢? HOT 1
- [CogVLM-chat-v1.1] LM weights are different with vicuna-7b-v1.5 HOT 3
- Running Gradio app locally results in inappropriate error: "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE." HOT 1
- Using CogVLM as an API HOT 1
- Code of finetuning the cogagent on Mind2Web ? HOT 1
- Deploy CogVLM using Docker
- Could we replace the vicuna-7b directly with stronger llm? HOT 1
- 我想用同样的promt,在每次都清除上下文的情况下得到3种答案,为什么结果都是一样的 HOT 2
- Chat with PDF documentation instead of images
- CogAgent 视觉预训练模型 EVA2-CLIP-L
- CogVLM源代码是否支持多轮对话训练 HOT 4
- 关于模型视觉定位原理
- 运行微调脚本报错缺少相关参数 HOT 2
- 如何构建CogAgent的微调数据集? HOT 1
- 两张3090微调CogVLM的可能性?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cogvlm.