Code Monkey home page Code Monkey logo

minicpm-v's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minicpm-v's Issues

Training Data Composition for MiniCPM-v2

Thanks for the great work!

I am wondering where we can find the training data composition for MiniCPM-v2? Specifically, if any distillation from other LMMs is used?

使用哪一个基座模型?

我观察到minicpm的llm有多个版本,例如fp32,bf16,dpo和sft。
请问多模态模型是使用哪一个llm呢?

Question about details on the training procedure and data

Hi, and thanks for this fantastic work!

I am using your models for a scientific project, which requires details about the training procedure and training data. I haven't found a paper or a technical report specifically for this work. Did I miss something?

Is the training procedure and data the same as in https://arxiv.org/pdf/2308.12038.pdf ?

Can you please share some details or a technical report?

Thanks :-)

miniCPM-V Quantization

Hi, Is it possible convert omniLLM-3B (miniCPM-V) to Quantization version ( e,g: GPTQ ) ?

OCR场景下无法返回全部的信息

首先非常感谢贵团队做出的卓越的贡献,我打算将其作为OCR使用读取图片上的内容并且给出完整的信息

但是尝试多个prompt之后还是无法有效的输出完整的信息,想请教下有什么解决方法或者正确的使用方式吗?

测试OCR图片源自 <MapReduce: Simplified Data Processing on Large Clusters>第一页

iShot2024-04-15 20 51 26

memory leak looping inference

Hi, I got out of memory/memory leak after after some calls. Here the script

model = AutoModel.from_pretrained('openbmb/MiniCPM-V', trust_remote_code=True, cache_dir="./cache", torch_dtype=torch.bfloat16)
model = model.to(device='cuda', dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V', trust_remote_code=True, cache_dir="./cache")

files = ['a.jpg', 'b.jpg', 'c.jpg', ...]
for image_path in files:
        msgs = [{'role': 'user', 'content': prompt}]
        image = Image.open(image_path).convert('RGB')
        res, context, _ = model.chat(
            image=image,
            msgs=msgs,
            context=None,
            tokenizer=tokenizer,
            sampling=False,
            temperature=0.0
        )
        print(res)

是否支持mac?

使用Mac部署的时候 提示没有使用CUDA的错误
`

(OmniLMM) crz@crzdeMacBook-Air OmniLMM % pip install flash_attn

Collecting flash_attn
Downloading flash_attn-2.5.2.tar.gz (2.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 882.4 kB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
fatal: not a git repository (or any of the parent directories): .git
/private/var/folders/fn/bz39wtv175s83wx3cm11lptm0000gn/T/pip-install-k91r8ka8/flash-attn_fccd9cf359024ff8acc3369bdc82e45f/setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/private/var/folders/fn/bz39wtv175s83wx3cm11lptm0000gn/T/pip-install-k91r8ka8/flash-attn_fccd9cf359024ff8acc3369bdc82e45f/setup.py", line 133, in
CUDAExtension(
File "/Users/crz/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/Users/crz/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/Users/crz/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

  torch.__version__  = 2.0.1
  
  
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

Benchmark results of TextVQA

We got different benchmark results while evaluation the MiniCPM-V 2.0 model provided in huggingface. For example, we got only 66 on TextVQA, which is different from 74.1 provided in the technique report. We directly use the chat api in modeling_minicpmv.py like this.

        output_ids, context, _ = model.chat(
            image=image_tensor,
            msgs=msgs,
            context=None,
            tokenizer=tokenizer,
            sampling=True,
            temperature=0.7
    	)

Would you please provide the detailed hyperparameters for the evaluation?

Chat without image input

Apologies for asking this in the issues tab.

Is this model capable of chatting without image input? I see in the benchmarks you have some models that can take both image+text and just text. I'm looking to use this model for a project, but I'm unsure if it'll work for my needs.

训练数据以及训练方式?

  1. 是否能开源训练数据复现以排除刷榜行为?
  2. 是否能简单写明训练方法,例如proj训练,是否冻住vit?

关于3b模型在MME上精度结果不一致问题

我下载了3b模型跑了官方的MME测试集,使用prompt为官方的默认prompt,算出来的分数为1378.67,和给出的1452差别较大。请问是测试结果使用的prompt不一致导致的吗?

Error about missing methods

Hello. I'm getting followed error when i try to run the demo of MiniCPM-V on Ubuntu 22.04:
(OmniLMM) markusc@ubuntu2:~/github/OmniLMM$ /home/markusc/anaconda3/envs/OmniLMM/bin/python /home/markusc/github/OmniLMM/infer.py Loading checkpoint shards: 100%|████████████████████████████████████████████| 2/2 [00:01<00:00, 1.94it/s] Traceback (most recent call last): File "/home/markusc/github/OmniLMM/infer.py", line 10, in <module> answer = chat_model.process(inputs) AttributeError: 'OmniLMMChat' object has no attribute 'process'

Apart from importing the json module and changing the model path to openbmb/MiniCPM-V, I did not make any modifications to the original code. As followed:

`from chat import OmniLMMChat, img2base64
import json
chat_model = OmniLMMChat('openbmb/MiniCPM-V')

im_64 = img2base64('./assets/worldmap_ck.jpg')

msgs = [{"role": "user", "content": "What is interesting about this image?"}]

inputs = {"image": im_64, "question": json.dumps(msgs)}
answer = chat_model.process(inputs)
print(answer)

msgs.append({"role": "assistant", "content": answer})
msgs.append({"role": "user", "content": "Where is China in the image"})

inputs = {"image": im_64, "question": json.dumps(msgs)}
answer = chat_model.process(inputs)
print(answer)`

Thanks in advance.

数学推理出现幻觉?

数学推理出错
prompt: 1+1/sqrt(2)=?
completion: 1 + 1/sqrt(2) 可以转换为一个数学表达式。

首先,我们可以将根号下的 2 化简为 √2。

然后,我们可以将 1 视为一个常数,并将其加到 1/√2。为了将 1/√2 转换为一个分数,我们可以将其看作是一个带根号的分数。我们可以将根号下的 2 平方,得到 2。因此,1/√2 可以表示为 1/(√2 * √2),即 1/(2 * 2)。

现在,我们可以将 1 + 1/√2 表示为:

1 + 1/√2 = 1 + 1/2

接下来,我们可以将这个表达式转换为一个带分数:

1 + 1/√2 = 2/2 + 1/2 = 3/2

所以,1 + 1/√2 的结果是 3/2 或 1 1/2。

installed cuda ,but there is a mismatch in cuda verision ,the code couldn't run without cuda

I installed conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c
pytorch -c nvidia
but an error message showed up, RuntimeError: cutlassF: no kernel found to launch! , it indicates that my cuda is not the correct verision , I downloaded the model from modelscope

我根据requirements.txt安装了依赖 但是在运行代码时 系统提示我需要安装cuda , 但我安装了cuda后依然报错,是说我的cuda版本与你们的不一致,我是自己下载了模型,并且在使用

object hal的评估方法

请问榜单里给出的object hal的分数是如何计算的,和RLHF-V中的计算方式是否有不同。
image

finetune code

我注意到没有finetune的代码,是不是可以使用viscpm的代码? 还是说team后续会放出?

Request for a 4bit quantization model for omnilmm 12B

I have tried the multi card inference but failed, it seems the layers are not dispatched correctly.

        with init_empty_weights():
            model = OmniLMMForCausalLM.from_pretrained(model_name, tune_clip=True, torch_dtype=torch.bfloat16)
        model = load_checkpoint_and_dispatch(model, model_name, dtype=torch.bfloat16, 
                    device_map="balanced",  no_split_module_classes=['Eva','MistralDecoderLayer', 'ModuleList', 'Resampler']
        )

So it would be convenience to provide a quantization model for low memory cards, it may be nice if under 20GB single card or 40GB multi cards.

fine tune

Hello, is it possible (will it be) to fine tune with peft lora and quantization?

object detection能力

请问,支持detection返回bbox吗?很多多模态模型已经支持了,且性能也很优秀,为什么cpmv我测了感觉不支持。
但是cpmv对空间位置关系的感知能力也很强,是什么原因导致放弃了训练这一功能呢?
如果加上detect的预训练,做finetune时有bbox输入对应文本,是否能让模型更容易获取新知识?

redmi k70 16GB使用MiniCPM-V时报错

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: TVMError: OpenCL build error for device=0x78b138eaa8
Error: CL_OUT_OF_HOST_MEMORY
Pass
Stack trace:
File "/Users/acha/Desktop/mlc-MiniCPM/3rdparty/tvm/src/runtime/library_module.cc", line 78

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at com.modelbest.mlcllm.ChatModule.image(ChatModule.java:64)
at com.modelbest.minicpm.AppViewModel$ChatState$requestImage$f$1$1.invoke(AppViewModel.kt:637)
at com.modelbest.minicpm.AppViewModel$ChatState$requestImage$f$1$1.invoke(AppViewModel.kt:637)
at com.modelbest.minicpm.AppViewModel$ChatState.callBackend(AppViewModel.kt:516)
at com.modelbest.minicpm.AppViewModel$ChatState.requestImage$lambda$4(AppViewModel.kt:637)
at com.modelbest.minicpm.AppViewModel$ChatState.$r8$lambda$-81FfOHgL43Kw6mcY6rLiYQKEuE(Unknown Source:0)
at com.modelbest.minicpm.AppViewModel$ChatState$$ExternalSyntheticLambda2.run(Unknown Source:12)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
TVMError: OpenCL build error for device=0x78b138eaa8
Error: CL_OUT_OF_HOST_MEMORY
Pass
Stack trace:
File "/Users/acha/Desktop/mlc-MiniCPM/3rdparty/tvm/src/runtime/library_module.cc", line 78

ModuleNotFoundError: No module named 'transformers_modules.MiniCPM-V-2'

(OmniLMM) [zhuguibo@GPU1 OmniLMM]$ python example.py
Traceback (most recent call last):
File "/mnt/data/dataProcess/LLM_Models/LLM_code/minicpm/OmniLMM/example.py", line 52, in
option1()
File "/mnt/data/dataProcess/LLM_Models/LLM_code/minicpm/OmniLMM/example.py", line 5, in option1
chat_model = OmniLMMChat('/mnt/data/dataProcess/LLM_Models/MiniCPM-V-2.0')
File "/mnt/data/dataProcess/LLM_Models/LLM_code/minicpm/OmniLMM/chat.py", line 169, in init
self.model = OmniLMM3B(model_path)
File "/mnt/data/dataProcess/LLM_Models/LLM_code/minicpm/OmniLMM/chat.py", line 141, in init
self.model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(dtype=torch.bfloat16)
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1091, in from_pretrained
config_class = get_class_from_dynamic_module(
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 500, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 200, in get_class_in_module
module = importlib.import_module(module_path)
File "/mnt/data/app/anaconda3/envs/OmniLMM/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.MiniCPM-V-2'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.