internlm / internlm-xcomposer Goto Github PK

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

License: Apache License 2.0

Python 91.89% Shell 3.32% Jupyter Notebook 4.79%

chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

internlm-xcomposer's Introduction

InternLM

InternLM ^HOT

📘Commercial Application | 🤗HuggingFace | 🆕Update News | 🤔Reporting Issues | 📜Technical Report

English | 简体中文

👋 join us on Discord and WeChat

Introduction

InternLM2.5 series are released with the following features:

Outstanding reasoning capability: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
1M Context window: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with LMDeploy for 1M-context inference. More details and a file chat demo are found here.
Stronger tool use: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in Lagent soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See examples.

News

[2024.08.01] We release InternLM2.5-1.8B, InternLM2.5-1.8B-Chat, InternLM2.5-20B and InternLM2.5-20B-Chat. See model zoo below for download or model cards for more details.

[2024.07.19] We release the InternLM2-Reward series of reward models in 1.8B, 7B and 20B sizes. See model zoo below for download or model cards for more details.

[2024.07.03] We release InternLM2.5-7B, InternLM2.5-7B-Chat and InternLM2.5-7B-Chat-1M. See model zoo below for download or model cards for more details.

[2024.03.26] We release InternLM2 technical report. See arXiv for details.

[2024.01.31] We release InternLM2-1.8B, along with the associated chat model. They provide a cheaper deployment option while maintaining leading performance.

[2024.01.23] We release InternLM2-Math-7B and InternLM2-Math-20B with pretraining and SFT checkpoints. They surpass ChatGPT with small sizes. See InternLM-Math for details and download.

[2024.01.17] We release InternLM2-7B and InternLM2-20B and their corresponding chat models with stronger capabilities in all dimensions. See model zoo below for download or model cards for more details.

[2023.12.13] InternLM-7B-Chat and InternLM-20B-Chat checkpoints are updated. With an improved finetuning strategy, the new chat models can generate higher quality responses with greater stylistic diversity.

[2023.09.20] InternLM-20B is released with base and chat versions.

Model Zoo

InternLM2.5

Model	Transformers(HF)	ModelScope(HF)	Release Date
InternLM2.5-1.8B	🤗internlm2_5-1_8b	internlm2_5-1_8b	2024-08-05
InternLM2.5-1.8B-Chat	🤗internlm2_5-1_8b-chat	internlm2_5-1_8b-chat	2024-08-05
InternLM2.5-7B	🤗internlm2_5-7b	internlm2_5-7b	2024-07-03
InternLM2.5-7B-Chat	🤗internlm2_5-7b-chat	internlm2_5-7b-chat	2024-07-03
InternLM2.5-7B-Chat-1M	🤗internlm2_5-7b-chat-1m	internlm2_5-7b-chat-1m	2024-07-03
InternLM2.5-20B	🤗internlm2_5-20b	internlm2_5-20b	2024-08-05
InternLM2.5-20B-Chat	🤗internlm2_5-20b-chat	internlm2_5-20b-chat	2024-08-05

Notes:

The release of InternLM2.5 series contains 1.8B, 7B, and 20B versions. 7B models are efficient for research and application and 20B models are more powerful and can support more complex scenarios. The relation of these models are shown as follows.

InternLM2.5: Foundation models pre-trained on large-scale corpus. InternLM2.5 models are recommended for consideration in most applications.
InternLM2.5-Chat: The Chat model that undergoes supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), based on the InternLM2.5 model. InternLM2.5-Chat is optimized for instruction following, chat experience, and function call, which is recommended for downstream applications.
InternLM2.5-Chat-1M: InternLM2.5-Chat-1M supports 1M long-context with compatible performance as InternLM2.5-Chat.

Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.

Supplements: HF refers to the format used by HuggingFace in transformers, whereas Origin denotes the format adopted by the InternLM team in InternEvo.

InternLM2-Reward

InternLM2-Reward is a series of reward models, trained on 2.4 million preference samples, available in 1.8B, 7B, and 20B sizes. These model were applied to the PPO training process of our chat models. See model cards for more details.

Model	RewardBench Score	Transformers(HF)	ModelScope(HF)	Release Date
InternLM2-1.8B-Reward	80.6	🤗internlm2-1_8b-reward	internlm2-1_8b-reward	2024-07-19
InternLM2-7B-Reward	86.6	🤗internlm2-7b-reward	internlm2-7b-reward	2024-07-19
InternLM2-20B-Reward	89.5	🤗internlm2-20b-reward	internlm2-20b-reward	2024-07-19

InternLM2

(click to expand)

Our previous generation models with advanced capabilities in long-context processing, reasoning, and coding. See model cards for more details.

Model	Transformers(HF)	ModelScope(HF)	Release Date
InternLM2-1.8B	🤗internlm2-1.8b	internlm2-1.8b	2024-01-31
InternLM2-Chat-1.8B-SFT	🤗internlm2-chat-1.8b-sft	internlm2-chat-1.8b-sft	2024-01-31
InternLM2-Chat-1.8B	🤗internlm2-chat-1.8b	internlm2-chat-1.8b	2024-02-19
InternLM2-Base-7B	🤗internlm2-base-7b	internlm2-base-7b	2024-01-17
InternLM2-7B	🤗internlm2-7b	internlm2-7b	2024-01-17
InternLM2-Chat-7B-SFT	🤗internlm2-chat-7b-sft	internlm2-chat-7b-sft	2024-01-17
InternLM2-Chat-7B	🤗internlm2-chat-7b	internlm2-chat-7b	2024-01-17
InternLM2-Base-20B	🤗internlm2-base-20b	internlm2-base-20b	2024-01-17
InternLM2-20B	🤗internlm2-20b	internlm2-20b	2024-01-17
InternLM2-Chat-20B-SFT	🤗internlm2-chat-20b-sft	internlm2-chat-20b-sft	2024-01-17
InternLM2-Chat-20B	🤗internlm2-chat-20b	internlm2-chat-20b	2024-01-17

Performance

We have evaluated InternLM2.5 on several important benchmarks using the open-source evaluation tool OpenCompass. Some of the evaluation results are shown in the table below. You are welcome to visit the OpenCompass Leaderboard for more evaluation results.

Base Model

Benchmark	InternLM2.5-7B	Llama3-8B	Yi-1.5-9B
MMLU (5-shot)	71.6	66.4	71.6
CMMLU (5-shot)	79.1	51.0	74.1
BBH (3-shot)	70.1	59.7	71.1
MATH (4-shot)	34.0	16.4	31.9
GSM8K (4-shot)	74.8	54.3	74.5
GPQA (0-shot)	31.3	31.3	27.8

Chat Model

Benchmark	InternLM2.5-7B-Chat	Llama3-8B-Instruct	Gemma2-9B-IT	Yi-1.5-9B-Chat	GLM-4-9B-Chat	Qwen2-7B-Instruct
MMLU (5-shot)	72.8	68.4	70.9	71.0	71.4	70.8
CMMLU (5-shot)	78.0	53.3	60.3	74.5	74.5	80.9
BBH (3-shot CoT)	71.6	54.4	68.2*	69.6	69.6	65.0
MATH (0-shot CoT)	60.1	27.9	46.9	51.1	51.1	48.6
GSM8K (0-shot CoT)	86.0	72.9	88.9	80.1	85.3	82.9
GPQA (0-shot)	38.4	26.1	33.8	37.9	36.9	38.4

We use ppl for the MCQ evaluation on base model.
The evaluation results were obtained from OpenCompass , and evaluation configuration can be found in the configuration files provided by OpenCompass.
The evaluation data may have numerical differences due to the version iteration of OpenCompass, so please refer to the latest evaluation results of OpenCompass.
* means the result is copied from the original paper.

Requirements

Python >= 3.8
PyTorch >= 1.12.0 (2.0.0 and above are recommended)
Transformers >= 4.38

Usages

InternLM supports a diverse range of well-known upstream and downstream projects, such as LLaMA-Factory, vLLM, llama.cpp, and more. This support enables a broad spectrum of users to utilize the InternLM series models more efficiently and conveniently. Tutorials for selected ecosystem projects are available here for your convenience.

In the following chapters, we will focus on the usages with Transformers, ModelScope, and Web demos. The chat models adopt chatml format to support both chat and agent applications. To ensure a better usage effect, please make sure that the installed transformers library version meets the following requirements before performing inference with Transformers or ModelScope:

transformers >= 4.38

Import from Transformers

To load the InternLM2.5-7B-Chat model using Transformers, use the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2_5-7b-chat", device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
# Output: Hello? How can I help you today?
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Import from ModelScope

To load the InternLM2.5-7B-Chat model using ModelScope, use the following code:

import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm2_5-7b-chat')
tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Dialogue

You can interact with the InternLM Chat 7B model through a frontend interface by running the following code:

pip install streamlit
pip install transformers>=4.38
streamlit run ./chat/web_demo.py

Deployment by LMDeploy

We use LMDeploy for fast deployment of InternLM.

Inference

With only 4 lines of codes, you can perform internlm2_5-7b-chat inference after pip install lmdeploy.

from lmdeploy import pipeline
pipe = pipeline("internlm/internlm2_5-7b-chat")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

To reduce the memory footprint, we offers 4-bit quantized model internlm2_5-7b-chat-4bit, with which the inference can be conducted as follows:

from lmdeploy import pipeline
pipe = pipeline("internlm/internlm2_5-7b-chat-4bit")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Moreover, you can independently activate the 8bit/4bit KV cache feature:

from lmdeploy import pipeline, TurbomindEngineConfig
pipe = pipeline("internlm/internlm2_5-7b-chat-4bit",
                backend_config=TurbomindEngineConfig(quant_policy=8))
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Please refer to the guidance for more usages about model deployment. For additional deployment tutorials, feel free to explore here.

1M-long-context Inference

By enabling the Dynamic NTK feature of LMDeploy, you can acquire the long-context inference power.

Note: 1M context length requires 4xA100-80G.

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(
        rope_scaling_factor=2.5,
        session_len=1048576,  # 1M context length
        max_batch_size=1,
        cache_max_entry_count=0.7,
        tp=4)  # 4xA100-80G.
pipe = pipeline('internlm/internlm2_5-7b-chat-1m', backend_config=backend_config)
prompt = 'Use a long prompt to replace this sentence'
response = pipe(prompt)
print(response)

Agent

InternLM2.5-Chat models have excellent tool utilization capabilities and can work with function calls in a zero-shot manner. It also supports to conduct analysis by collecting information from more than 100 web pages. See more examples in agent section.

Fine-tuning

Please refer to finetune docs for fine-tuning with InternLM.

Note: We have migrated the whole training functionality in this project to InternEvo for easier user experience, which provides efficient pre-training and fine-tuning infra for training InternLM.

Evaluation

We utilize OpenCompass for model evaluation. In InternLM2.5, we primarily focus on standard objective evaluation, long-context evaluation (needle in a haystack), data contamination assessment, agent evaluation, and subjective evaluation.

Objective Evaluation

To evaluate the InternLM model, please follow the guidelines in the OpenCompass tutorial. Typically, we use ppl for multiple-choice questions on the Base model and gen for all questions on the Chat model.

Long-Context Evaluation (Needle in a Haystack)

For the Needle in a Haystack evaluation, refer to the tutorial provided in the documentation. Feel free to try it out.

Data Contamination Assessment

To learn more about data contamination assessment, please check the contamination eval.

Agent Evaluation

To evaluate tool utilization, please refer to T-Eval.
For code interpreter evaluation, use the Math Agent Evaluation provided in the repository.

Subjective Evaluation

Please follow the tutorial for subjective evaluation.

Contribution

We appreciate all the contributors for their efforts to improve and enhance InternLM. Community users are highly encouraged to participate in the project. Please refer to the contribution guidelines for instructions on how to contribute to the project.

License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact [email protected].

Citation

@misc{cai2024internlm2,
      title={InternLM2 Technical Report},
      author={Zheng Cai and Maosong Cao and Haojiong Chen and Kai Chen and Keyu Chen and Xin Chen and Xun Chen and Zehui Chen and Zhi Chen and Pei Chu and Xiaoyi Dong and Haodong Duan and Qi Fan and Zhaoye Fei and Yang Gao and Jiaye Ge and Chenya Gu and Yuzhe Gu and Tao Gui and Aijia Guo and Qipeng Guo and Conghui He and Yingfan Hu and Ting Huang and Tao Jiang and Penglong Jiao and Zhenjiang Jin and Zhikai Lei and Jiaxing Li and Jingwen Li and Linyang Li and Shuaibin Li and Wei Li and Yining Li and Hongwei Liu and Jiangning Liu and Jiawei Hong and Kaiwen Liu and Kuikun Liu and Xiaoran Liu and Chengqi Lv and Haijun Lv and Kai Lv and Li Ma and Runyuan Ma and Zerun Ma and Wenchang Ning and Linke Ouyang and Jiantao Qiu and Yuan Qu and Fukai Shang and Yunfan Shao and Demin Song and Zifan Song and Zhihao Sui and Peng Sun and Yu Sun and Huanze Tang and Bin Wang and Guoteng Wang and Jiaqi Wang and Jiayu Wang and Rui Wang and Yudong Wang and Ziyi Wang and Xingjian Wei and Qizhen Weng and Fan Wu and Yingtong Xiong and Chao Xu and Ruiliang Xu and Hang Yan and Yirong Yan and Xiaogui Yang and Haochen Ye and Huaiyuan Ying and Jia Yu and Jing Yu and Yuhang Zang and Chuyu Zhang and Li Zhang and Pan Zhang and Peng Zhang and Ruijie Zhang and Shuo Zhang and Songyang Zhang and Wenjian Zhang and Wenwei Zhang and Xingcheng Zhang and Xinyue Zhang and Hui Zhao and Qian Zhao and Xiaomeng Zhao and Fengzhe Zhou and Zaida Zhou and Jingming Zhuo and Yicheng Zou and Xipeng Qiu and Yu Qiao and Dahua Lin},
      year={2024},
      eprint={2403.17297},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

internlm-xcomposer's People

Contributors

Stargazers

Watchers

Forkers

eltociear tonywhite11 zhangzhuobys chenxwh vansin 2132660698 xinyexun caoliangjie kennymckormick mayi140611 sidd1609 huiguyy joberzheng niubikala keyhsw open-mmlab-12 pan6zhang zebrajack camenduru kp-forks davidalphafox sdc17 navezjt josephrp wayshall 604487475 hsaigroup fanguoshun alvin6233 xiaoachen98 etelis sorokinvld noahclements coldra1n f901107 lcolok techthiyanes zhaoganglxh wanboyang becauseofai hunseto haorenkk123 del-zhenwu stanleyran samuraibarbi md-hassan chenkenanalytic yushunxiang kustomzone anthonyyuan yzgrfsy up-pika baggiorobertozoba yuhangzang learn01one valeriawong ndavid1 meigaoms saulocatharino komingshyu hongdangshao jjjymmm bpd1069 weifei7 joeaelkhoury keyman9848 b08240 wp1811983038 yeyimilk yangfukui tfgbestneal jesean jiangzongkang haikuoxin jerrywei1985 sssssshf happybuby dlut-lyz e06084 azure-dragon-ai sugary199 segmond apollohuang1 lhagiimn jesusoctavioas jafitz26 qzl164 hadryan meitianjinbu hhy5277 automancursor liunix61 wangbindl lvhan028 evdcush kekewind ductai199x zyjwuyan handongke coding-famer

internlm-xcomposer's Issues

Failed to load 4-bits weights from HuggingFace

Description

Unable to load the quantized weights (4 bits) from HuggingFace

Code

The code is a direct copy from the file examples/example_chat_4bit_en.py

import torch
from transformers import AutoModel, AutoTokenizer

import auto_gptq
from auto_gptq.modeling import BaseGPTQForCausalLM

auto_gptq.modeling._base.SUPPORTED_MODELS = ["InternLMXComposer"]

torch.set_grad_enabled(False)


class InternLMXComposerQForCausalLM(BaseGPTQForCausalLM):
    layers_block_name = "internlm_model.model.layers"
    outside_layer_modules = [
        "query_tokens",
        "flag_image_start",
        "flag_image_end",
        "visual_encoder",
        "Qformer",
        "internlm_model.model.embed_tokens",
        "internlm_model.model.norm",
        "internlm_proj",
        "internlm_model.lm_head",
    ]
    inside_layer_modules = [
        ["self_attn.k_proj", "self_attn.v_proj", "self_attn.q_proj"],
        ["self_attn.o_proj"],
        ["mlp.gate_proj"],
        ["mlp.up_proj"],
        ["mlp.down_proj"],
    ]


# init model and tokenizer
model = InternLMXComposerQForCausalLM.from_quantized(
    "internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True, device="cuda:0"
)
model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer

# example image
image = "examples/images/aiyinsitan.jpg"

# Multi-Turn Text-Image Dialogue
# 1st turn
text = 'Describe this image in detial.'
image = "examples/images/aiyinsitan.jpg"
response, history = model.chat(text, image)
print(f"User: {text}")
print(f"Bot: {response}") 
# The image features a black and white portrait of Albert Einstein, the famous physicist and mathematician. 
# Einstein is seated in the center of the frame, looking directly at the camera with a serious expression on his face. 
# He is dressed in a suit, which adds a touch of professionalism to his appearance.

Error

Traceback (most recent call last):
  File "/mnt/bd/dev-pierre-oreistein-st/sandbox/test_internlm_vl/test_internlm_vl_4bits", line 35, in <module>
    model = InternLMXComposerQForCausalLM.from_quantized(
  File "/home/pierre/.pyenv/versions/dev3.9/lib/python3.9/site-packages/auto_gptq/modeling/_base.py", line 847, in from_quantized
    raise FileNotFoundError(f"Could not find a model in {model_name_or_path} with a name in {', '.join(searched_files)}. Please specify the argument model_basename to use a custom file name.")
FileNotFoundError: Could not find a model in internlm/internlm-xcomposer-7b-4bit with a name in gptq_model-4bit-128g.safetensors, model.safetensors. Please specify the argument model_basename to use a custom file name.

Ideas

According to this similar issue I need to specify the model file. However, I was unable to find it on HuggingFace. Could you help me with this?

Thanks in advance for your help!

Minimum GPU memory to run example_chat.py

Hello, I am interested in your work and curious about the minimum total GPU memory required to run example_chat.py for testing. I tried it on mine, which has 8GB of memory, clearly not enough. Can you show me the rough range for it?

模型推理速度特别慢和webui速度不一致

text = '图片里面的是谁？'
response, history = model.chat(text=text, image=image, history=None)
使用的是这个推理程序

example_demo code contains inter information

Hi, you seem to leave extra information in L51 and L52 in examples/web_demo.py:

 self.llm_model = AutoModel.from_pretrained('/mnt/petrelfs/share_data/dongxiaoyi/share_models/release_chat', trust_remote_code=True)
        tokenizer = AutoTokenizer.from_pretrained('/mnt/petrelfs/share_data/dongxiaoyi/share_models/release_chat', trust_remote_code=True)

in the web_demo.py.
You may want to fix them to avoid the exposure.

请问可以int4量化加载吗

Curious about zero-shot performance for the pretrained VLLM

Is the LLM part of internlm-xcomposer-vl-7b the base version (internlm-7b) or the chat version (internlm-chat-7b or internlm-chat-7b-v1.1)?
If the LLM part is the base version, and there is no sft on xcomposer-vl, then how does it have the ability to follow the instructions of benchmarks to get high performance on zero-shot (instead of few-shot) testing?

Suggestion : make examples/web_demo.py more secure

change the last line in examples/web_demo.py to make more securte , Not everyone need expose service to public

if name == "main":
demo.launch(share=True, server_name="0.0.0.0", server_port=11111)

to
demo.launch(share=False, server_name="127.0.0.1", server_port=11111)

AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

After update web_demo.py ，this error occurred

File "/home/enbo/.cache/huggingface/modules/transformers_modules/internlm-xcomposer/tokenization_InternLM_XComposer.py", line 106, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
File "/home/enbo/.cache/huggingface/modules/transformers_modules/internlm-xcomposer/tokenization_InternLM_XComposer.py", line 94, in vocab_size
return self.sp_model.get_piece_size()
AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

InternLM-XComposer-VL-7B, The chinese ability of the model does not match the demo.

`
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)
model_path = "internlm/internlm-xcomposer-vl-7b"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.tokenizer = tokenizer

image = "./image/aiyinsitan.jpg"
text = '请问这张图片里面的人是谁？并介绍下他。'
response = model.generate(text, image)
print(response)
`

response: albert einstein

I tried a lot of pictures, but the effect of the model is not satisfactory, and the results are basically in English.

support for multiple GPU inference

Hello, I am interested in your work and I am curious about how to run internlm-xcomposer-7b in an environment that only contains 24GB GPUs. I am looking forward to a new version of inference code that supports multiple gpu inference.

Thank you

ModelScope urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer-7b')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()

root@autodl-container-9e2911833c-01d8deff:~/autodl-tmp# python download.py 
2023-10-10 21:52:08,079 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.
2023-10-10 21:52:08,081 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-10-10 21:52:08,119 - modelscope - INFO - Loading done! Current index file version is 1.9.2, with md5 1c9bf186d1e03088e5abfbd8664a1def and a total number of 941 components indexed
2023-10-10 21:52:08,686 - modelscope - WARNING - There is no version specified and there is no version in the model repository,use the master branch, which is fragile, please use it with caution!
2023-10-10 21:52:08,686 - modelscope - INFO - Model revision not specified, use revision: master
Init VIT ... Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1354, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1007, in _send_output
    self.send(msg)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 947, in send
    self.connect()
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1421, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/root/miniconda3/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/root/miniconda3/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/root/miniconda3/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "download.py", line 8, in <module>
    model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
  File "/root/miniconda3/lib/python3.8/site-packages/modelscope/utils/hf_util.py", line 181, in from_pretrained
    module_obj = module_class.from_pretrained(model_dir, *model_args,
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
    return model_class.from_pretrained(
  File "/root/miniconda3/lib/python3.8/site-packages/modelscope/utils/hf_util.py", line 78, in from_pretrained
    return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3085, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_InternLM_XComposer.py", line 43, in __init__
    self.visual_encoder = create_eva_vit_g()
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_vit.py", line 522, in create_eva_vit_g
    cached_file = download_cached_file(url, check_hash=False, progress=True)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_utils.py", line 44, in download_cached_file
    timm_hub.download_cached_file(url, check_hash, progress)
  File "/root/miniconda3/lib/python3.8/site-packages/timm/models/_hub.py", line 85, in download_cached_file
    download_url_to_file(url, cached_file, hash_prefix, progress=progress)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/hub.py", line 457, in download_url_to_file
    u = urlopen(req)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1397, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1357, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>
root@autodl-container-9e2911833c-01d8deff:~/autodl-tmp#

多轮对话网页端和终端结果不符，且终端执行时部分结果为空

请教一个问题：同样的4个question，在网页端可以有比较好的结果，但是在Notebook中执行，第4个answer的结果就为空。尝试其他多轮对话也会出现类似的情况，但不一定是第4个answer，可能是第3个或其他位置为空。
网页端：

终端：

training data?

Will release training data?

图片未生成

(xcomposer) ➜  InternLM-XComposer git:(main) ✗ python examples/web_demo.py
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
Init VIT ... Done
Init Perceive Sampler ... Done
Init InternLM ... Done
Loading checkpoint sha
 load model done:  <class 'transformers_modules.internlm-xcomposer-7b.modeling_InternLM_XComposer.InternLMXComposerForCausalLM'>
/cpfs01/user/huwenxing/InternLM-XComposer/examples/web_demo.py:1009: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  chat_textbox = gr.Textbox(
Running on local URL:  http://0.0.0.0:11111
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/chatbot.py:161: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Chatbot(...)` instead of `return gr.Chatbot.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.Textbox.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
init
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/markdown.py:92: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Markdown(...)` instead of `return gr.Markdown.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/gallery.py:143: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Gallery(...)` instead of `return gr.Gallery.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.Textbox.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/helpers.py:818: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.update(...)
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/asyncio/events.py:80: GradioUnusedKwargWarning: You have unused kwarg parameters in Button, please remove them: {'mode': 'static'}
  self._context.run(self._callback, *self._args)

Could not create share link. Missing file: /cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2. 

Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps: 

1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64
2. Rename the downloaded file to: frpc_linux_amd64_v0.2
3. Move the file to this location: /cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio
<object object at 0x7fb765662e10>
敦煌，位于甘肃省西北部，地处河西走廊西端，是古代丝绸之路上的重要交通枢纽和商埠重镇。它拥有着丰富的历史文化遗产，包括莫高窟、鸣沙山月牙泉、雅丹魔鬼城等著名景点。同时，敦煌也是**历史文化名城之一，有着深厚的文化底蕴和独特的民俗风情。

**一、莫高窟**

莫高窟，又名“千佛洞”，是**四大石窟之一，始建于十六国的前秦时期，距今已有1600多年的历史。它是世界上现存规模最大、内容最丰富的佛教艺术宝库，被誉为“东方卢浮宫”。莫高窟内共有735个洞窟，壁画总面积达45000多平方米，彩塑佛像5000余尊，是世界上最大的佛教艺术中心之一。在这里，游客可以欣赏到精美的壁画、雕塑和音乐表演，感受佛教文化的博大精深。

**二、鸣沙山月牙泉**

鸣沙山月牙泉是一处自然奇观，位于敦煌市西北约40公里处的沙漠中。这里地势平坦，沙丘连绵起伏，形成了一片广袤无垠的沙漠景观。而月牙泉则静静地镶嵌在这片沙漠之中，泉水清澈见底，形状如新月，故称“月牙泉”。每到夜晚，月亮升起时，月牙泉周围会发出阵阵清脆的响声，犹如天籁之音，令人心旷神怡。

**三、雅丹魔鬼城**

雅丹魔鬼城是一座典型的风蚀地貌，位于敦煌市西南约100公里的戈壁滩上。这里的地貌奇特，呈现出一种荒凉、神秘、恐怖的景象。由于长期受到风吹日晒雨淋，这里的岩石表面已经变得凹凸不平，形成了各种形态各异的造型，有的像动物，有的像人物，有的像建筑，让人不禁感叹大自然的鬼斧神工。

**四、其他景点**

除了莫高窟、鸣沙山月牙泉和雅丹魔鬼城之外，敦煌还有许多其他值得一游的景点，如玉门关、阳关、锁阳城、汉长城遗址等。这些景点都具有悠久的历史和文化价值，吸引着众多游客前来参观游览。

**五、特色美食**

敦煌的特色美食也非常丰富，其中最有名的当属驴肉黄面了。驴肉黄面是一道以驴肉为主要食材的面食，味道鲜香可口，深受当地人和游客的喜爱。此外，还有羊肉泡馍、胡羊焖饼、烤全羊等特色美食，都是不容错过的美味佳肴。

**六、旅游小贴士**

1. 敦煌气候干燥，日照强烈，紫外线较强，建议游客做好防晒措施，携带防晒霜、遮阳帽、太阳镜等物品。2. 敦煌属于高原地区，海拔较高，游客应注意休息，避免剧烈运动，以免出现高原反应。3. 敦煌旅游景点较多，游客应提前规划好行程，合理安排时间，避免走马观花，错过重要的景点。4. 在敦煌旅游期间，要注意保护环境，不乱扔垃圾，不破坏文物古迹，做一个文明的游客。总之，敦煌是一座历史悠久、文化底蕴深厚、风景优美的城市，是一个值得一游的好去处。希望这篇文章能够帮助您更好地了解敦煌，为您的旅行提供一些有用的信息。
敦煌，位于甘肃省西北部，地处河西走廊西端，是古代丝绸之路上的重要交通枢纽和商埠重镇。它拥有着丰富的历史文化遗产，包括莫高窟、鸣沙山月牙泉、雅丹魔鬼城等著名景点。同时，敦煌也是**历史文化名城之一，有着深厚的文化底蕴和独特的民俗风情。
**一、莫高窟**
莫高窟，又名“千佛洞”，是**四大石窟之一，始建于十六国的前秦时期，距今已有1600多年的历史。它是世界上现存规模最大、内容最丰富的佛教艺术宝库，被誉为“东方卢浮宫”。莫高窟内共有735个洞窟，壁画总面积达45000多平方米，彩塑佛像5000余尊，是世界上最大的佛教艺术中心之一。在这里，游客可以欣赏到精美的壁画、雕塑和音乐表演，感受佛教文化的博大精深。
**二、鸣沙山月牙泉**
鸣沙山月牙泉是一处自然奇观，位于敦煌市西北约40公里处的沙漠中。这里地势平坦，沙丘连绵起伏，形成了一片广袤无垠的沙漠景观。而月牙泉则静静地镶嵌在这片沙漠之中，泉水清澈见底，形状如新月，故称“月牙泉”。每到夜晚，月亮升起时，月牙泉周围会发出阵阵清脆的响声，犹如天籁之音，令人心旷神怡。
**三、雅丹魔鬼城**
雅丹魔鬼城是一座典型的风蚀地貌，位于敦煌市西南约100公里的戈壁滩上。这里的地貌奇特，呈现出一种荒凉、神秘、恐怖的景象。由于长期受到风吹日晒雨淋，这里的岩石表面已经变得凹凸不平，形成了各种形态各异的造型，有的像动物，有的像人物，有的像建筑，让人不禁感叹大自然的鬼斧神工。
**四、其他景点**
除了莫高窟、鸣沙山月牙泉和雅丹魔鬼城之外，敦煌还有许多其他值得一游的景点，如玉门关、阳关、锁阳城、汉长城遗址等。这些景点都具有悠久的历史和文化价值，吸引着众多游客前来参观游览。
**五、特色美食**
敦煌的特色美食也非常丰富，其中最有名的当属驴肉黄面了。驴肉黄面是一道以驴肉为主要食材的面食，味道鲜香可口，深受当地人和游客的喜爱。此外，还有羊肉泡馍、胡羊焖饼、烤全羊等特色美食，都是不容错过的美味佳肴。
**六、旅游小贴士**
1. 敦煌气候干燥，日照强烈，紫外线较强，建议游客做好防晒措施，携带防晒霜、遮阳帽、太阳镜等物品。2. 敦煌属于高原地区，海拔较高，游客应注意休息，避免剧烈运动，以免出现高原反应。3. 敦煌旅游景点较多，游客应提前规划好行程，合理安排时间，避免走马观花，错过重要的景点。4. 在敦煌旅游期间，要注意保护环境，不乱扔垃圾，不破坏文物古迹，做一个文明的游客。总之，敦煌是一座历史悠久、文化底蕴深厚、风景优美的城市，是一个值得一游的好去处。希望这篇文章能够帮助您更好地了解敦煌，为您的旅行提供一些有用的信息。

question about ShareGPT4V dataset

where can I get the images in ShareGPT4V dataset.

运行web_demo.py，无法初始化模型

Traceback (most recent call last):
File "/home/batch/projects/InternLM-XComposer/examples/web_demo.py", line 816, in
demo_ui = Demo_UI()
File "/home/batch/projects/InternLM-XComposer/examples/web_demo.py", line 47, in init
self.llm_model = AutoModel.from_pretrained(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2675, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/batch/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-vl-7b/modeling_InternLM_XComposer.py", line 49, in init
self.Qformer, self.query_tokens = self.init_qformer(
File "/home/batch/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-vl-7b/modeling_InternLM_XComposer.py", line 122, in init_qformer
encoder_config = BertConfig.from_pretrained("bert-base-uncased")
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 547, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 629, in _get_config_dict
resolved_config_file = cached_file(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/utils/hub.py", line 452, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Which model is used as the Share Captioner in ShareGPT4V?

Great work and congrat!
Can I ask which model is used as the captioner in ShareGPT4V data generation? Seems that the paper does not mention it very clearly.

Video QA with InternLM-XComposer and Fine-tuning Code Availability

1、How does InternLM-XComposer handle video QA? The examples seem to only involve images.
2、Will the fine-tuning code be open-sourced soon?

No module named 'transformers_modules.internlm/internlm-xcomposer-7b'

下了internlm-xcomposer-7b ，放到internlm/internlm-xcomposer-7b、但是有下列报错

PS E:\InternLM-XComposer> python .\examples\web_demo.py
Traceback (most recent call last):
File "E:\cnai\InternLM-XComposer\examples\web_demo.py", line 816, in
demo_ui = Demo_UI()
File "E:\cnai\InternLM-XComposer\examples\web_demo.py", line 47, in init
self.llm_model = AutoModel.from_pretrained(
File "C:\Python\Python310\lib\site-packages\transformers\models\auto\auto_factory.py", line 456, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "C:\Python\Python310\lib\site-packages\transformers\models\auto\configuration_auto.py", line 953, in from_pretrained
config_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "C:\Python\Python310\lib\site-packages\transformers\dynamic_module_utils.py", line 164, in get_class_in_module
module = importlib.import_module(module_path)
File "C:\Python\Python310\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.internlm/internlm-xcomposer-7b'

什么时候可以把微调代码放出来？

How to get the embedding_matrix of the model and text embeddings?

Hello~
I want to get the embedding matrixand of the model and the embeddings when l input a text. Can you tell me how to get it?
l have tried "model.embed_tokens.weight" and “model.model.embed_tokens(input)” but it failed.

请问图片如何进行微调？

如题，这么模型如何进行文字和图片的微调

Hi, will you release the fine-tuned code and how long will it take?

as the title.

Why I meet this problem when I use model to generate?

../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.

Where may the problem live? Thanks!

可以公布multi-task 训练的代码吗？

AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

The code

model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "../internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer

met error

File "/home/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b-4bit/tokenization_InternLM_XComposer.py", line 94, in vocab_size
    return self.sp_model.get_piece_size()
AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

OCR support ?

Is it possible to make it work with ocr capability?

CUDA Out of Memory in Multi-GPU Inference

import torch
from transformers import AutoModel, AutoTokenizer
import argparse

def auto_configure_device_map(num_gpus):
    # visual_encoder 算4层
    # internlm_model.model.embed_tokens 占用1层
    # norm 和 lm_head 占用1层
    # transformer.layers 占用 32 层
    # 总共34层分配到num_gpus张卡上
    num_trans_layers = 32
    per_gpu_layers = 38 / num_gpus

    device_map = {
        'visual_encoder': 0,
        'ln_vision': 0,
        'Qformer': 0,
        'internlm_model.model.embed_tokens': 0,
        'internlm_model.model.norm': 0,
        'internlm_model.lm_head': 0,
        'query_tokens': 0,
        'flag_image_start': 0,
        'flag_image_end': 0,
        'internlm_proj.weight': 0,
        'internlm_proj.bias': 0,
    }

    # device_map = {key: 0 for key in device_map.keys()}
    
    used = 6
    gpu_target = 0
    for i in range(num_trans_layers):
        if used >= per_gpu_layers:
            gpu_target += 1
            used = 0
        assert gpu_target < num_gpus
        device_map[f'internlm_model.model.layers.{i}'] = gpu_target
        used += 1

    return device_map

torch.set_grad_enabled(False)

parser = argparse.ArgumentParser()
parser.add_argument("--num_gpus", default=4, type=int)
args = parser.parse_args()

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer-vl-7b', trust_remote_code=True, cache_dir='/storage/internLM/').cuda().eval()
if args.num_gpus > 1:
    from accelerate import dispatch_model
    device_map = auto_configure_device_map(args.num_gpus)
    model = dispatch_model(model, device_map=device_map)

tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer-vl-7b', trust_remote_code=True, cache_dir='/storage/internLM/')
model.tokenizer = tokenizer


# example image
image = 'examples/images/aiyinsitan.jpg'

# Single-Turn Pure-Text Dialogue
text = 'Please introduce Einstein.'
with torch.no_grad():
    with model.maybe_autocast():
        response = model.generate(text)
print(response)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Can we use this image captioner?

So that captioned images can be used to train stable diffusion

If so how to use as image captioner?

An example with Transformers to generate text + images

Hi,

I see very interesting the examples detailed for interacting with images with Transformers for VQA etc. However, how can we really generate text + images (with the right history context) with HF transformers?

I cannot see an example with this good feature of your work.

Thanks.

不支持同时输入多张图片？

试了一下huggingface上的demo，请问是不能同时上传多张图片让模型来识别吗？

请问可以部署多卡4090的推理吗？

单卡显存不够

请问现在支持中文数据SFT吗？

请问现在有SFT的教程嘛？支持中文数据微调么？

Does the model support tasks related to coordinates?

as above mentioned. Does the model support tasks related to coordinates?

不可以单独生成图片吗？

我这边调用的是internlm-xcomposer-7b的模型，然后输入如下命令：

>>> text='请帮我画一张长城的照片'
>>> response, history = model.chat(text=text, image=None, history=None)
>>> print(response)
很抱歉，作为一个语言模型，我并不具备绘画能力，无法为您画一张长城的照片。但是，如果您愿意，我可以为您提供一些关于长城的资料和信息，帮助您更好地了解这座伟大的建筑。
>>>

他是只能生成图文并茂的文章吗？

在vscode中运行internlm-xcomposer-7b模型，无法生成图文的结果

User: Write a popular science article about “Unraveling the Mysteries of Black Holes: A Scientific Overview” with pictures and illustrations.
Bot: I'm sorry, but as an AI language model, I don't have the capability to create visual content such as pictures and illustrations. However, I can provide you with a text-based summary of the popular science article about "Unraveling the Mysteries of Black Holes: A Scientific Overview".

如图，采用论文中提到的prompt，但是没有得到预期的结果，不知道什么原因

All LoRA weights are 0 in InternLM-XComposer-VL?

In paper, InternLM-XComposer-VL is trained by multi-task training in StageB.
It seems training LoRA, but the weights are 0 when loading the released model.

Where can I find the MMBench-CN and CCBench data?

Where can I find the MMBench-CN and CCBench data? The link in readme just jumps to opencompass.

训练lora部分报错

我想做lora训练，只保冻结了其他部分，只保留了lora_A和lora_B,但是反向传播的时候会报错
# 冻结visual_encoder, ln_vision 和 internlm_model 的参数
for param in model.visual_encoder.parameters():
param.requires_grad = False

for param in model.ln_vision.parameters():
    param.requires_grad = False

for param in model.Qformer.parameters():
    param.requires_grad = False

for param in model.internlm_model.parameters():
    param.requires_grad = False

# 解冻需要训练的lora_A和lora_B的参数
for name, param in model.named_parameters():
    if "lora_A" in name or "lora_B" in name:
        param.requires_grad = True

训练代码：
input_ids = data['input_ids'].to(device, dtype=torch.long)
labels = data['labels'].to(device, dtype=torch.long)
attention_mask= data['attention_mask'].to(device, dtype=torch.long)
outputs = model.internlm_model(
input_ids=input_ids,
labels=labels,
attention_mask=attention_mask
)
loss = outputs.loss
# 反向传播，计算当前梯度
loss.backward()

错误如下：
Traceback (most recent call last):
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 190, in
main()
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 175, in main
train(epoch, model, device, training_loader, optimizer, gradient_accumulation_steps,model_output_dir)
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 55, in train
loss.backward()
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_InternLM.py", line 80, in backward
rotary_emb.apply_rotary(dq1, dq2, rearrange(cos[:seqlen], 's d -> s 1 d'),
NameError: name 'rotary_emb' is not defined

Error: click "Insert a fixed number of Images" button error

Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "C:\Python\Python310\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 468, in adjust_img
caps = self.generate_loc_cap(idx_text_sections, int(img_num), progress)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 177, in generate_loc_cap
inject_text, locs = self.generate_loc(text_sections, image_num,
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 132, in generate_loc
for _ in progress.tqdm([1], desc="image spotting"):
TypeError: Progress.tqdm() missing 1 required positional argument: 'iterable'
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "C:\Python\Python310\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 468, in adjust_img
caps = self.generate_loc_cap(idx_text_sections, int(img_num), progress)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 177, in generate_loc_cap
inject_text, locs = self.generate_loc(text_sections, image_num,
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 132, in generate_loc
for _ in progress.tqdm([1], desc="image spotting"):
TypeError: Progress.tqdm() missing 1 required positional argument: 'iterable'

运行web_demo.py 出现错误

下载的是最新的代码文件，运行时出现如下错误：
正在接受医生的检查，医生在为它量体温。', 10: '一只宠物狗在主人帮助下清理自己的粪便，主人在旁边指导。', 12: '一只宠物狗在主人帮助下处理自己的毛发，主人在旁边指导。'}
https://static.openxlab.org.cn/lingbi/jpg-images/61ec717e9ee8ffd984f79d01838de29e352b6aa9b9a04bb60e56a92f00fa72db.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/78668d1138f169a78284213bb2df7991cf77ff48bbdb6f938a3536d238ccbdc7.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/9db1b1ecdfc698526459d4bb519ebd1dfc3b9be9e4983ff8b862dd970257793a.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/2f3ce59b613d2b7b989a919819ef1aed67dcd56d6381b0ea2af68972c700c367.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/3e4914997caf27f88b1219070586a7aa9ce79c6afacf824a640396abab216230.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/41eb2c71b25921e4ccb423df6e402caf74d71bc981b9bd699a44bc2d66ec0524.jpg
download image with url
image downloaded
model_select_image
Traceback (most recent call last):
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/blocks.py", line 1437, in process_api
result = await self.call_function(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/blocks.py", line 1123, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 512, in async_iteration
return await iterator.anext()
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 505, in anext
return await anyio.to_thread.run_sync(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 488, in run_sync_iterator_async
return next(iterator)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 638, in gen_wrapper
yield from f(*args, **kwargs)
File "/data/liwx/InternLM-XComposer-main/InternLM-XComposer-main/examples/web_demo.py", line 444, in generate_article
self.selected = self.model_select_image(output_text, caps,
File "/data/liwx/InternLM-XComposer-main/InternLM-XComposer-main/examples/web_demo.py", line 299, in model_select_image
pre_img.append(images[len(pre_img) + ans2idx[answer]].cpu())
KeyError: '<'

Are the pictures automatically selected after generating the article or do they need to be selected manually?

请问生成文章后的图片是自动选择的还是需要手动选择？

internlm-xcomposer-7b-4bit 这个量化模型，运行失败

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os

torch.set_grad_enabled(False)

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer-7b-4bit', revision = 'master')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model.tokenizer = tokenizer

大佬，发现一个examplechat的新问题，api服务单次请求速度提上来了，请求次数多了以后，速度会变慢

如何解决

生成的文本存在重复

体验多模态对话的时候，发现会出现重复的结果文本。

What is the difference between InternConvertedInternLMAttention and InternLMAttention?

InternLMAttention is used in huggingface: https://huggingface.co/internlm/internlm-chat-7b/blob/main/modeling_internlm.py#L257
InternConvertedInternLMAttention is used in this repo: https://github.com/InternLM/InternLM-XComposer/blob/main/huggingface/internlm-xcomposer/modeling_InternLM.py#L732

I set intern_converted_llm to false and found that the results were all wrong. What is the difference between InternConvertedInternLMAttention and InternLMAttention?

怎样获得模型的参数量和tops数？

感谢这么出色的工作，想咨询一下，如何获取模型的参数量和tops数？

No response using model.chat

Hi, I'm using internLM-XComposer to generate some data and I have tried your demo, it works fine when I'm using model.generate.
But when I using model.chat(), model can only reply to the first call, subsequent calls are unresponsive and return empty string.

I'm using:
torch==2.0.1
transformers==4.33.2

My hardware is a single 3090 with 24G GPU memory, so I use 4-bit quantized models and tried your examples/example_chat_4bit.py and find this issue.

Is my environmental issue causing this problem or something else?

how can i install `rotary_emb` ?

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('/root/autodl-tmp/models/internlm7bxc', trust_remote_code=True).cuda().eval()
Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 497, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 199, in get_class_in_module
module = importlib.import_module(module_path)
File "/root/miniconda3/envs/llm_chat/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/root/.cache/huggingface/modules/transformers_modules/internlm7bxc/modeling_InternLM_XComposer.py", line 18, in
from .modeling_InternLM import *
File "/root/.cache/huggingface/modules/transformers_modules/internlm7bxc/modeling_InternLM.py", line 5, in
import rotary_emb