openbmb / minicpm-v Goto Github PK

View Code? Open in Web Editor NEW

10.6K 90.0 740.0 308.12 MB

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

License: Apache License 2.0

Python 98.42% Shell 1.58%

minicpm minicpm-v multi-modal

minicpm-v's People

Stargazers

Watchers

Forkers

autogyro eddiezero nanqiai unibots1043 creative-v lihuibng 0iui0 xiaozhiob winscat arjun-prakash2 newtolan yuzachongyi raixcreen rosefun josephrp big-data-ai zhangxinyi0529 brianm93 v2hack henri-edh kdu4108 jameshujy chaojigang001 itsharex zzccchen zhouhuashan huangxiang701 fang1418975997 seanliaoy qianyouliang strategist922 originalos frank1016 fatiose alexwang123456 slothme kimsweet ceylone sakuraentropia liwangchao qibiui alohahalo happybuby quduoduo snailccc dantecoda geneforge 3dalgolab eltociear techthiyanes folkevil selfmoff46socialwil scorpions11 truechilled95righthaja kasikasi2014 restlesscode12 katffli3 steenfrfinaltalk adaluo2510 babixzbabydeckfunk ellynnon72lunetes narcommaf2 sletsuchem omidsfp finmalage-westphold bloodsoldiscoverahon jakubik2023 freddarth93 bingmo33 weisili2016 ailabteam fbigabig xrinairgi tsutomu-n agentsolid53cistflames jiggynywizroses ladywib countryroyal4 heartonreporks gazettekissez-v hhy5277 xclownia mowervie hubin858130 iuysy crayonupdatesf insidelifel kitbs-opissroo jamepeng hay86 xz-wtx laobadao hwwwwwwwh pamxy howardchiang2 morinw gptalgopro shliar liwenju0 qyc-98

minicpm-v's Issues

V2.0 可以直接使用 V1 的微调代码么

RT
看到 swift 里面有 V1 的微调代码，请问在 V2.0 上可以直接使用么，还是需要重新开发一下？

P40上执行示例代码出错ImportError: /home/ubuntu/.local/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

重装flash_attn_2也不能解决。
还是同样错误

Training Data Composition for MiniCPM-v2

Thanks for the great work!

I am wondering where we can find the training data composition for MiniCPM-v2? Specifically, if any distillation from other LMMs is used?

ModuleNotFoundError: No module named 'transformers_modules.MiniCPM-V-2'

使用哪一个基座模型？

我观察到minicpm的llm有多个版本，例如fp32，bf16，dpo和sft。
请问多模态模型是使用哪一个llm呢？

Any detail report of minicpm v2?

Such as data/training strategy/ablation study etc?

Question about details on the training procedure and data

Hi, and thanks for this fantastic work!

I am using your models for a scientific project, which requires details about the training procedure and training data. I haven't found a paper or a technical report specifically for this work. Did I miss something?

Is the training procedure and data the same as in https://arxiv.org/pdf/2308.12038.pdf ?

Can you please share some details or a technical report?

Thanks :-)

Does it support multi-image interleaved conversations?

the demo looks like only support one image in one prompt. i wonder does it support multi-image interleaved conversation? e.g. input 2 images at once and compare which is brighter

miniCPM-V Quantization

Hi, Is it possible convert omniLLM-3B (miniCPM-V) to Quantization version ( e,g: GPTQ ) ?

可否提供一下预训练部分数据的来源？

Or, just opensource the training data

实时多模态交互代码demo部分源码大概什么时候可以开源呢？可以说一下思路吗？

请问一下演示的demo部分的代码大概什么时候开源呢？可以先说一下框架和实现的思路吗？

安装报错Failed building wheel for pycrypto

模型不支持中文？还有批量跑多个图片预测时会缓存前面的结果？

遍历图片时总是复制前面的输出，是因为跑多个图片预测时会缓存前面的输出结果？
而单独对每个图片进行推断，中文ocr效果又很差。是不支持中文ocr识别吗

OCR场景下无法返回全部的信息

首先非常感谢贵团队做出的卓越的贡献，我打算将其作为OCR使用读取图片上的内容并且给出完整的信息

但是尝试多个prompt之后还是无法有效的输出完整的信息，想请教下有什么解决方法或者正确的使用方式吗？

测试OCR图片源自 <MapReduce: Simplified Data Processing on Large Clusters>第一页

memory leak looping inference

Hi, I got out of memory/memory leak after after some calls. Here the script

model = AutoModel.from_pretrained('openbmb/MiniCPM-V', trust_remote_code=True, cache_dir="./cache", torch_dtype=torch.bfloat16)
model = model.to(device='cuda', dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V', trust_remote_code=True, cache_dir="./cache")

files = ['a.jpg', 'b.jpg', 'c.jpg', ...]
for image_path in files:
        msgs = [{'role': 'user', 'content': prompt}]
        image = Image.open(image_path).convert('RGB')
        res, context, _ = model.chat(
            image=image,
            msgs=msgs,
            context=None,
            tokenizer=tokenizer,
            sampling=False,
            temperature=0.0
        )
        print(res)

是否支持mac？

使用Mac部署的时候提示没有使用CUDA的错误
`

(OmniLMM) crz@crzdeMacBook-Air OmniLMM % pip install flash_attn

Collecting flash_attn
Downloading flash_attn-2.5.2.tar.gz (2.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 882.4 kB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
fatal: not a git repository (or any of the parent directories): .git
/private/var/folders/fn/bz39wtv175s83wx3cm11lptm0000gn/T/pip-install-k91r8ka8/flash-attn_fccd9cf359024ff8acc3369bdc82e45f/setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/private/var/folders/fn/bz39wtv175s83wx3cm11lptm0000gn/T/pip-install-k91r8ka8/flash-attn_fccd9cf359024ff8acc3369bdc82e45f/setup.py", line 133, in
CUDAExtension(
File "/Users/crz/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/Users/crz/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/Users/crz/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

  torch.__version__  = 2.0.1
  
  
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

请问MiniCPM-V能使用类似stream为True的参数来实现流式生成吗？

请问MiniCPM-V如何使用流式的方式生成？像是使用transformers的streamer之类的。

请问什么时候可以开源微调代码？

Benchmark results of TextVQA

We got different benchmark results while evaluation the MiniCPM-V 2.0 model provided in huggingface. For example, we got only 66 on TextVQA, which is different from 74.1 provided in the technique report. We directly use the chat api in modeling_minicpmv.py like this.

        output_ids, context, _ = model.chat(
            image=image_tensor,
            msgs=msgs,
            context=None,
            tokenizer=tokenizer,
            sampling=True,
            temperature=0.7
    	)

Would you please provide the detailed hyperparameters for the evaluation?

会支持VLLM推理吗

Chat without image input

Apologies for asking this in the issues tab.

Is this model capable of chatting without image input? I see in the benchmarks you have some models that can take both image+text and just text. I'm looking to use this model for a project, but I'm unsure if it'll work for my needs.

训练数据以及训练方式？

是否能开源训练数据复现以排除刷榜行为？
是否能简单写明训练方法，例如proj训练，是否冻住vit？

方便提供openai_api接口吗，这样我就可以做成服务调用了，VLLM就更好了

方便提供openai_api接口吗，这样我就可以做成服务调用了，VLLM就更好了，听说这周就可以实现吗？

关于3b模型在MME上精度结果不一致问题

我下载了3b模型跑了官方的MME测试集，使用prompt为官方的默认prompt，算出来的分数为1378.67，和给出的1452差别较大。请问是测试结果使用的prompt不一致导致的吗？

请问支持中文数据微调/推理吗

Error about missing methods

Hello. I'm getting followed error when i try to run the demo of MiniCPM-V on Ubuntu 22.04：
(OmniLMM) markusc@ubuntu2:~/github/OmniLMM$ /home/markusc/anaconda3/envs/OmniLMM/bin/python /home/markusc/github/OmniLMM/infer.py Loading checkpoint shards: 100%|████████████████████████████████████████████| 2/2 [00:01<00:00, 1.94it/s] Traceback (most recent call last): File "/home/markusc/github/OmniLMM/infer.py", line 10, in <module> answer = chat_model.process(inputs) AttributeError: 'OmniLMMChat' object has no attribute 'process'

Apart from importing the json module and changing the model path to openbmb/MiniCPM-V, I did not make any modifications to the original code. As followed:

`from chat import OmniLMMChat, img2base64
import json
chat_model = OmniLMMChat('openbmb/MiniCPM-V')

im_64 = img2base64('./assets/worldmap_ck.jpg')

msgs = [{"role": "user", "content": "What is interesting about this image?"}]

inputs = {"image": im_64, "question": json.dumps(msgs)}
answer = chat_model.process(inputs)
print(answer)

msgs.append({"role": "assistant", "content": answer})
msgs.append({"role": "user", "content": "Where is China in the image"})

inputs = {"image": im_64, "question": json.dumps(msgs)}
answer = chat_model.process(inputs)
print(answer)`

Thanks in advance.

数学推理出现幻觉？

数学推理出错
prompt: 1+1/sqrt(2)=?
completion: 1 + 1/sqrt(2) 可以转换为一个数学表达式。

首先，我们可以将根号下的 2 化简为 √2。

然后，我们可以将 1 视为一个常数，并将其加到 1/√2。为了将 1/√2 转换为一个分数，我们可以将其看作是一个带根号的分数。我们可以将根号下的 2 平方，得到 2。因此，1/√2 可以表示为 1/(√2 * √2)，即 1/(2 * 2)。

现在，我们可以将 1 + 1/√2 表示为：

1 + 1/√2 = 1 + 1/2

接下来，我们可以将这个表达式转换为一个带分数：

1 + 1/√2 = 2/2 + 1/2 = 3/2

所以，1 + 1/√2 的结果是 3/2 或 1 1/2。

请什么时候可以发布微调代码？

您好，感谢发布这么优秀的模型。请问什么时候可以提供微调代码呢？

https://openbmb.vercel.app/minicpm-v-2 挂掉了吗？打不开了。

installed cuda ，but there is a mismatch in cuda verision ，the code couldn't run without cuda

I installed conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c
pytorch -c nvidia
but an error message showed up, RuntimeError: cutlassF: no kernel found to launch! ， it indicates that my cuda is not the correct verision ， I downloaded the model from modelscope

我根据requirements.txt安装了依赖但是在运行代码时系统提示我需要安装cuda ，但我安装了cuda后依然报错，是说我的cuda版本与你们的不一致，我是自己下载了模型，并且在使用

训练过程细节有吗？

object hal的评估方法

请问榜单里给出的object hal的分数是如何计算的，和RLHF-V中的计算方式是否有不同。

cannot load MiniCPM-V-2.0, ModuleNotFoundError: No module named 'transformers_modules.MiniCPM-V-2'

finetune code

我注意到没有finetune的代码，是不是可以使用viscpm的代码？还是说team后续会放出？

Request for a 4bit quantization model for omnilmm 12B

I have tried the multi card inference but failed, it seems the layers are not dispatched correctly.

        with init_empty_weights():
            model = OmniLMMForCausalLM.from_pretrained(model_name, tune_clip=True, torch_dtype=torch.bfloat16)
        model = load_checkpoint_and_dispatch(model, model_name, dtype=torch.bfloat16, 
                    device_map="balanced",  no_split_module_classes=['Eva','MistralDecoderLayer', 'ModuleList', 'Resampler']
        )

So it would be convenience to provide a quantization model for low memory cards, it may be nice if under 20GB single card or 40GB multi cards.

MMMU val 分数疑问

关于MiniCPM-V，我注意到Github上报告MMMU val 34.7，而官方leaderboard是37.2？

可以支持多卡部署并行推理么？

和yi-vl-34b相比如何？

fine tune

Hello, is it possible (will it be) to fine tune with peft lora and quantization?

object detection能力

请问，支持detection返回bbox吗？很多多模态模型已经支持了，且性能也很优秀，为什么cpmv我测了感觉不支持。
但是cpmv对空间位置关系的感知能力也很强，是什么原因导致放弃了训练这一功能呢？
如果加上detect的预训练，做finetune时有bbox输入对应文本，是否能让模型更容易获取新知识？

redmi k70 16GB使用MiniCPM-V时报错

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: TVMError: OpenCL build error for device=0x78b138eaa8
Error: CL_OUT_OF_HOST_MEMORY
Pass
Stack trace:
File "/Users/acha/Desktop/mlc-MiniCPM/3rdparty/tvm/src/runtime/library_module.cc", line 78

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at com.modelbest.mlcllm.ChatModule.image(ChatModule.java:64)
at com.modelbest.minicpm.AppViewModel$ChatState$requestImage$f$1$1.invoke(AppViewModel.kt:637)
at com.modelbest.minicpm.AppViewModel$ChatState$requestImage$f$1$1.invoke(AppViewModel.kt:637)
at com.modelbest.minicpm.AppViewModel$ChatState.callBackend(AppViewModel.kt:516)
at com.modelbest.minicpm.AppViewModel$ChatState.requestImage$lambda$4(AppViewModel.kt:637)
at com.modelbest.minicpm.AppViewModel$ChatState.$r8$lambda$-81FfOHgL43Kw6mcY6rLiYQKEuE(Unknown Source:0)
at com.modelbest.minicpm.AppViewModel$ChatState$$ExternalSyntheticLambda2.run(Unknown Source:12)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
TVMError: OpenCL build error for device=0x78b138eaa8
Error: CL_OUT_OF_HOST_MEMORY
Pass
Stack trace:
File "/Users/acha/Desktop/mlc-MiniCPM/3rdparty/tvm/src/runtime/library_module.cc", line 78

ModuleNotFoundError: No module named 'transformers_modules.MiniCPM-V-2'

(OmniLMM) [zhuguibo@GPU1 OmniLMM]$ python example.py
Traceback (most recent call last):
File "/mnt/data/dataProcess/LLM_Models/LLM_code/minicpm/OmniLMM/example.py", line 52, in
option1()
File "/mnt/data/dataProcess/LLM_Models/LLM_code/minicpm/OmniLMM/example.py", line 5, in option1
chat_model = OmniLMMChat('/mnt/data/dataProcess/LLM_Models/MiniCPM-V-2.0')
File "/mnt/data/dataProcess/LLM_Models/LLM_code/minicpm/OmniLMM/chat.py", line 169, in init
self.model = OmniLMM3B(model_path)
File "/mnt/data/dataProcess/LLM_Models/LLM_code/minicpm/OmniLMM/chat.py", line 141, in init
self.model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(dtype=torch.bfloat16)
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1091, in from_pretrained
config_class = get_class_from_dynamic_module(
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 500, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 200, in get_class_in_module
module = importlib.import_module(module_path)
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.MiniCPM-V-2'

MiniCPM-V和OmniLMM都支持中文吗？

支持llama.cpp 部署么？

看到有fork了这个llama.cpp 的仓库

请问下2.0版本的webui有没有demo

http://120.92.209.146/ 这个demo地址的webui我想测试下。

openbmb / minicpm-v Goto Github PK

minicpm-v's People

Stargazers

Watchers

Forkers

minicpm-v's Issues

Recommend Projects

Recommend Topics

Recommend Org