Code Monkey home page Code Monkey logo

internlm / internlm-xcomposer Goto Github PK

View Code? Open in Web Editor NEW
2.4K 2.4K 147.0 68.77 MB

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

License: Apache License 2.0

Python 91.89% Shell 3.32% Jupyter Notebook 4.79%
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

internlm-xcomposer's Introduction

InternLM

👋 join us on Discord and WeChat

Introduction

InternLM2.5 series are released with the following features:

  • Outstanding reasoning capability: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.

  • 1M Context window: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with LMDeploy for 1M-context inference. More details and a file chat demo are found here.

  • Stronger tool use: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in Lagent soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See examples.

News

[2024.08.01] We release InternLM2.5-1.8B, InternLM2.5-1.8B-Chat, InternLM2.5-20B and InternLM2.5-20B-Chat. See model zoo below for download or model cards for more details.

[2024.07.19] We release the InternLM2-Reward series of reward models in 1.8B, 7B and 20B sizes. See model zoo below for download or model cards for more details.

[2024.07.03] We release InternLM2.5-7B, InternLM2.5-7B-Chat and InternLM2.5-7B-Chat-1M. See model zoo below for download or model cards for more details.

[2024.03.26] We release InternLM2 technical report. See arXiv for details.

[2024.01.31] We release InternLM2-1.8B, along with the associated chat model. They provide a cheaper deployment option while maintaining leading performance.

[2024.01.23] We release InternLM2-Math-7B and InternLM2-Math-20B with pretraining and SFT checkpoints. They surpass ChatGPT with small sizes. See InternLM-Math for details and download.

[2024.01.17] We release InternLM2-7B and InternLM2-20B and their corresponding chat models with stronger capabilities in all dimensions. See model zoo below for download or model cards for more details.

[2023.12.13] InternLM-7B-Chat and InternLM-20B-Chat checkpoints are updated. With an improved finetuning strategy, the new chat models can generate higher quality responses with greater stylistic diversity.

[2023.09.20] InternLM-20B is released with base and chat versions.

Model Zoo

InternLM2.5

Model Transformers(HF) ModelScope(HF) OpenXLab(HF) OpenXLab(Origin) Release Date
InternLM2.5-1.8B 🤗internlm2_5-1_8b internlm2_5-1_8b Open in OpenXLab Open in OpenXLab 2024-08-05
InternLM2.5-1.8B-Chat 🤗internlm2_5-1_8b-chat internlm2_5-1_8b-chat Open in OpenXLab Open in OpenXLab 2024-08-05
InternLM2.5-7B 🤗internlm2_5-7b internlm2_5-7b Open in OpenXLab Open in OpenXLab 2024-07-03
InternLM2.5-7B-Chat 🤗internlm2_5-7b-chat internlm2_5-7b-chat Open in OpenXLab Open in OpenXLab 2024-07-03
InternLM2.5-7B-Chat-1M 🤗internlm2_5-7b-chat-1m internlm2_5-7b-chat-1m Open in OpenXLab Open in OpenXLab 2024-07-03
InternLM2.5-20B 🤗internlm2_5-20b internlm2_5-20b Open in OpenXLab Open in OpenXLab 2024-08-05
InternLM2.5-20B-Chat 🤗internlm2_5-20b-chat internlm2_5-20b-chat Open in OpenXLab Open in OpenXLab 2024-08-05

Notes:

The release of InternLM2.5 series contains 1.8B, 7B, and 20B versions. 7B models are efficient for research and application and 20B models are more powerful and can support more complex scenarios. The relation of these models are shown as follows.

  1. InternLM2.5: Foundation models pre-trained on large-scale corpus. InternLM2.5 models are recommended for consideration in most applications.
  2. InternLM2.5-Chat: The Chat model that undergoes supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), based on the InternLM2.5 model. InternLM2.5-Chat is optimized for instruction following, chat experience, and function call, which is recommended for downstream applications.
  3. InternLM2.5-Chat-1M: InternLM2.5-Chat-1M supports 1M long-context with compatible performance as InternLM2.5-Chat.

Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.

Supplements: HF refers to the format used by HuggingFace in transformers, whereas Origin denotes the format adopted by the InternLM team in InternEvo.

InternLM2-Reward

InternLM2-Reward is a series of reward models, trained on 2.4 million preference samples, available in 1.8B, 7B, and 20B sizes. These model were applied to the PPO training process of our chat models. See model cards for more details.

Model RewardBench Score Transformers(HF) ModelScope(HF) OpenXLab(HF) Release Date
InternLM2-1.8B-Reward 80.6 🤗internlm2-1_8b-reward internlm2-1_8b-reward Open in OpenXLab 2024-07-19
InternLM2-7B-Reward 86.6 🤗internlm2-7b-reward internlm2-7b-reward Open in OpenXLab 2024-07-19
InternLM2-20B-Reward 89.5 🤗internlm2-20b-reward internlm2-20b-reward Open in OpenXLab 2024-07-19

InternLM2

(click to expand)

Our previous generation models with advanced capabilities in long-context processing, reasoning, and coding. See model cards for more details.

Model Transformers(HF) ModelScope(HF) OpenXLab(HF) OpenXLab(Origin) Release Date
InternLM2-1.8B 🤗internlm2-1.8b internlm2-1.8b Open in OpenXLab Open in OpenXLab 2024-01-31
InternLM2-Chat-1.8B-SFT 🤗internlm2-chat-1.8b-sft internlm2-chat-1.8b-sft Open in OpenXLab Open in OpenXLab 2024-01-31
InternLM2-Chat-1.8B 🤗internlm2-chat-1.8b internlm2-chat-1.8b Open in OpenXLab Open in OpenXLab 2024-02-19
InternLM2-Base-7B 🤗internlm2-base-7b internlm2-base-7b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-7B 🤗internlm2-7b internlm2-7b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Chat-7B-SFT 🤗internlm2-chat-7b-sft internlm2-chat-7b-sft Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Chat-7B 🤗internlm2-chat-7b internlm2-chat-7b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Base-20B 🤗internlm2-base-20b internlm2-base-20b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-20B 🤗internlm2-20b internlm2-20b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Chat-20B-SFT 🤗internlm2-chat-20b-sft internlm2-chat-20b-sft Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Chat-20B 🤗internlm2-chat-20b internlm2-chat-20b Open in OpenXLab Open in OpenXLab 2024-01-17

Performance

We have evaluated InternLM2.5 on several important benchmarks using the open-source evaluation tool OpenCompass. Some of the evaluation results are shown in the table below. You are welcome to visit the OpenCompass Leaderboard for more evaluation results.

Base Model

Benchmark InternLM2.5-7B Llama3-8B Yi-1.5-9B
MMLU (5-shot) 71.6 66.4 71.6
CMMLU (5-shot) 79.1 51.0 74.1
BBH (3-shot) 70.1 59.7 71.1
MATH (4-shot) 34.0 16.4 31.9
GSM8K (4-shot) 74.8 54.3 74.5
GPQA (0-shot) 31.3 31.3 27.8

Chat Model

Benchmark InternLM2.5-7B-Chat Llama3-8B-Instruct Gemma2-9B-IT Yi-1.5-9B-Chat GLM-4-9B-Chat Qwen2-7B-Instruct
MMLU (5-shot) 72.8 68.4 70.9 71.0 71.4 70.8
CMMLU (5-shot) 78.0 53.3 60.3 74.5 74.5 80.9
BBH (3-shot CoT) 71.6 54.4 68.2* 69.6 69.6 65.0
MATH (0-shot CoT) 60.1 27.9 46.9 51.1 51.1 48.6
GSM8K (0-shot CoT) 86.0 72.9 88.9 80.1 85.3 82.9
GPQA (0-shot) 38.4 26.1 33.8 37.9 36.9 38.4
  • We use ppl for the MCQ evaluation on base model.
  • The evaluation results were obtained from OpenCompass , and evaluation configuration can be found in the configuration files provided by OpenCompass.
  • The evaluation data may have numerical differences due to the version iteration of OpenCompass, so please refer to the latest evaluation results of OpenCompass.
  • * means the result is copied from the original paper.

Requirements

  • Python >= 3.8
  • PyTorch >= 1.12.0 (2.0.0 and above are recommended)
  • Transformers >= 4.38

Usages

InternLM supports a diverse range of well-known upstream and downstream projects, such as LLaMA-Factory, vLLM, llama.cpp, and more. This support enables a broad spectrum of users to utilize the InternLM series models more efficiently and conveniently. Tutorials for selected ecosystem projects are available here for your convenience.

In the following chapters, we will focus on the usages with Transformers, ModelScope, and Web demos. The chat models adopt chatml format to support both chat and agent applications. To ensure a better usage effect, please make sure that the installed transformers library version meets the following requirements before performing inference with Transformers or ModelScope:

transformers >= 4.38

Import from Transformers

To load the InternLM2.5-7B-Chat model using Transformers, use the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2_5-7b-chat", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2_5-7b-chat", device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
# Output: Hello? How can I help you today?
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Import from ModelScope

To load the InternLM2.5-7B-Chat model using ModelScope, use the following code:

import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm2_5-7b-chat')
tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Dialogue

You can interact with the InternLM Chat 7B model through a frontend interface by running the following code:

pip install streamlit
pip install transformers>=4.38
streamlit run ./chat/web_demo.py

Deployment by LMDeploy

We use LMDeploy for fast deployment of InternLM.

Inference

With only 4 lines of codes, you can perform internlm2_5-7b-chat inference after pip install lmdeploy.

from lmdeploy import pipeline
pipe = pipeline("internlm/internlm2_5-7b-chat")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

To reduce the memory footprint, we offers 4-bit quantized model internlm2_5-7b-chat-4bit, with which the inference can be conducted as follows:

from lmdeploy import pipeline
pipe = pipeline("internlm/internlm2_5-7b-chat-4bit")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Moreover, you can independently activate the 8bit/4bit KV cache feature:

from lmdeploy import pipeline, TurbomindEngineConfig
pipe = pipeline("internlm/internlm2_5-7b-chat-4bit",
                backend_config=TurbomindEngineConfig(quant_policy=8))
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Please refer to the guidance for more usages about model deployment. For additional deployment tutorials, feel free to explore here.

1M-long-context Inference

By enabling the Dynamic NTK feature of LMDeploy, you can acquire the long-context inference power.

Note: 1M context length requires 4xA100-80G.

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(
        rope_scaling_factor=2.5,
        session_len=1048576,  # 1M context length
        max_batch_size=1,
        cache_max_entry_count=0.7,
        tp=4)  # 4xA100-80G.
pipe = pipeline('internlm/internlm2_5-7b-chat-1m', backend_config=backend_config)
prompt = 'Use a long prompt to replace this sentence'
response = pipe(prompt)
print(response)

Agent

InternLM2.5-Chat models have excellent tool utilization capabilities and can work with function calls in a zero-shot manner. It also supports to conduct analysis by collecting information from more than 100 web pages. See more examples in agent section.

Fine-tuning

Please refer to finetune docs for fine-tuning with InternLM.

Note: We have migrated the whole training functionality in this project to InternEvo for easier user experience, which provides efficient pre-training and fine-tuning infra for training InternLM.

Evaluation

We utilize OpenCompass for model evaluation. In InternLM2.5, we primarily focus on standard objective evaluation, long-context evaluation (needle in a haystack), data contamination assessment, agent evaluation, and subjective evaluation.

Objective Evaluation

To evaluate the InternLM model, please follow the guidelines in the OpenCompass tutorial. Typically, we use ppl for multiple-choice questions on the Base model and gen for all questions on the Chat model.

Long-Context Evaluation (Needle in a Haystack)

For the Needle in a Haystack evaluation, refer to the tutorial provided in the documentation. Feel free to try it out.

Data Contamination Assessment

To learn more about data contamination assessment, please check the contamination eval.

Agent Evaluation

  • To evaluate tool utilization, please refer to T-Eval.
  • For code interpreter evaluation, use the Math Agent Evaluation provided in the repository.

Subjective Evaluation

  • Please follow the tutorial for subjective evaluation.

Contribution

We appreciate all the contributors for their efforts to improve and enhance InternLM. Community users are highly encouraged to participate in the project. Please refer to the contribution guidelines for instructions on how to contribute to the project.

License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].

Citation

@misc{cai2024internlm2,
      title={InternLM2 Technical Report},
      author={Zheng Cai and Maosong Cao and Haojiong Chen and Kai Chen and Keyu Chen and Xin Chen and Xun Chen and Zehui Chen and Zhi Chen and Pei Chu and Xiaoyi Dong and Haodong Duan and Qi Fan and Zhaoye Fei and Yang Gao and Jiaye Ge and Chenya Gu and Yuzhe Gu and Tao Gui and Aijia Guo and Qipeng Guo and Conghui He and Yingfan Hu and Ting Huang and Tao Jiang and Penglong Jiao and Zhenjiang Jin and Zhikai Lei and Jiaxing Li and Jingwen Li and Linyang Li and Shuaibin Li and Wei Li and Yining Li and Hongwei Liu and Jiangning Liu and Jiawei Hong and Kaiwen Liu and Kuikun Liu and Xiaoran Liu and Chengqi Lv and Haijun Lv and Kai Lv and Li Ma and Runyuan Ma and Zerun Ma and Wenchang Ning and Linke Ouyang and Jiantao Qiu and Yuan Qu and Fukai Shang and Yunfan Shao and Demin Song and Zifan Song and Zhihao Sui and Peng Sun and Yu Sun and Huanze Tang and Bin Wang and Guoteng Wang and Jiaqi Wang and Jiayu Wang and Rui Wang and Yudong Wang and Ziyi Wang and Xingjian Wei and Qizhen Weng and Fan Wu and Yingtong Xiong and Chao Xu and Ruiliang Xu and Hang Yan and Yirong Yan and Xiaogui Yang and Haochen Ye and Huaiyuan Ying and Jia Yu and Jing Yu and Yuhang Zang and Chuyu Zhang and Li Zhang and Pan Zhang and Peng Zhang and Ruijie Zhang and Shuo Zhang and Songyang Zhang and Wenjian Zhang and Wenwei Zhang and Xingcheng Zhang and Xinyue Zhang and Hui Zhao and Qian Zhao and Xiaomeng Zhao and Fengzhe Zhou and Zaida Zhou and Jingming Zhuo and Yicheng Zou and Xipeng Qiu and Yu Qiao and Dahua Lin},
      year={2024},
      eprint={2403.17297},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

internlm-xcomposer's People

Contributors

del-zhenwu avatar eltociear avatar jjjymmm avatar li-jinsong avatar lightdxy avatar lvhan028 avatar myownskyw7 avatar panzhang0212 avatar sdc17 avatar v3det avatar vansin avatar xiaoachen98 avatar yeyimilk avatar yhcao6 avatar yuhangzang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

internlm-xcomposer's Issues

Failed to load 4-bits weights from HuggingFace

Description

Unable to load the quantized weights (4 bits) from HuggingFace

Code

The code is a direct copy from the file examples/example_chat_4bit_en.py

import torch
from transformers import AutoModel, AutoTokenizer

import auto_gptq
from auto_gptq.modeling import BaseGPTQForCausalLM

auto_gptq.modeling._base.SUPPORTED_MODELS = ["InternLMXComposer"]

torch.set_grad_enabled(False)


class InternLMXComposerQForCausalLM(BaseGPTQForCausalLM):
    layers_block_name = "internlm_model.model.layers"
    outside_layer_modules = [
        "query_tokens",
        "flag_image_start",
        "flag_image_end",
        "visual_encoder",
        "Qformer",
        "internlm_model.model.embed_tokens",
        "internlm_model.model.norm",
        "internlm_proj",
        "internlm_model.lm_head",
    ]
    inside_layer_modules = [
        ["self_attn.k_proj", "self_attn.v_proj", "self_attn.q_proj"],
        ["self_attn.o_proj"],
        ["mlp.gate_proj"],
        ["mlp.up_proj"],
        ["mlp.down_proj"],
    ]


# init model and tokenizer
model = InternLMXComposerQForCausalLM.from_quantized(
    "internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True, device="cuda:0"
)
model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "internlm/internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer

# example image
image = "examples/images/aiyinsitan.jpg"

# Multi-Turn Text-Image Dialogue
# 1st turn
text = 'Describe this image in detial.'
image = "examples/images/aiyinsitan.jpg"
response, history = model.chat(text, image)
print(f"User: {text}")
print(f"Bot: {response}") 
# The image features a black and white portrait of Albert Einstein, the famous physicist and mathematician. 
# Einstein is seated in the center of the frame, looking directly at the camera with a serious expression on his face. 
# He is dressed in a suit, which adds a touch of professionalism to his appearance. 

Error

Traceback (most recent call last):
  File "/mnt/bd/dev-pierre-oreistein-st/sandbox/test_internlm_vl/test_internlm_vl_4bits", line 35, in <module>
    model = InternLMXComposerQForCausalLM.from_quantized(
  File "/home/pierre/.pyenv/versions/dev3.9/lib/python3.9/site-packages/auto_gptq/modeling/_base.py", line 847, in from_quantized
    raise FileNotFoundError(f"Could not find a model in {model_name_or_path} with a name in {', '.join(searched_files)}. Please specify the argument model_basename to use a custom file name.")
FileNotFoundError: Could not find a model in internlm/internlm-xcomposer-7b-4bit with a name in gptq_model-4bit-128g.safetensors, model.safetensors. Please specify the argument model_basename to use a custom file name.

Ideas

According to this similar issue I need to specify the model file. However, I was unable to find it on HuggingFace. Could you help me with this?

Thanks in advance for your help!

Minimum GPU memory to run example_chat.py

Hello, I am interested in your work and curious about the minimum total GPU memory required to run example_chat.py for testing. I tried it on mine, which has 8GB of memory, clearly not enough. Can you show me the rough range for it?

example_demo code contains inter information

Hi, you seem to leave extra information in L51 and L52 in examples/web_demo.py:

 self.llm_model = AutoModel.from_pretrained('/mnt/petrelfs/share_data/dongxiaoyi/share_models/release_chat', trust_remote_code=True)
        tokenizer = AutoTokenizer.from_pretrained('/mnt/petrelfs/share_data/dongxiaoyi/share_models/release_chat', trust_remote_code=True)

in the web_demo.py.
You may want to fix them to avoid the exposure.

Suggestion : make examples/web_demo.py more secure

change the last line in examples/web_demo.py to make more securte , Not everyone need expose service to public

if name == "main":
demo.launch(share=True, server_name="0.0.0.0", server_port=11111)

to
demo.launch(share=False, server_name="127.0.0.1", server_port=11111)

AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

After update web_demo.py ,this error occurred

File "/home/enbo/.cache/huggingface/modules/transformers_modules/internlm-xcomposer/tokenization_InternLM_XComposer.py", line 106, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
File "/home/enbo/.cache/huggingface/modules/transformers_modules/internlm-xcomposer/tokenization_InternLM_XComposer.py", line 94, in vocab_size
return self.sp_model.get_piece_size()
AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

InternLM-XComposer-VL-7B, The chinese ability of the model does not match the demo.

`
import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)
model_path = "internlm/internlm-xcomposer-vl-7b"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.tokenizer = tokenizer

image = "./image/aiyinsitan.jpg"
text = '请问这张图片里面的人是谁?并介绍下他。'
response = model.generate(text, image)
print(response)
`

response: albert einstein

I tried a lot of pictures, but the effect of the model is not satisfactory, and the results are basically in English.

support for multiple GPU inference

Hello, I am interested in your work and I am curious about how to run internlm-xcomposer-7b in an environment that only contains 24GB GPUs. I am looking forward to a new version of inference code that supports multiple gpu inference.

Thank you

ModelScope urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer-7b')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
root@autodl-container-9e2911833c-01d8deff:~/autodl-tmp# python download.py 
2023-10-10 21:52:08,079 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.
2023-10-10 21:52:08,081 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-10-10 21:52:08,119 - modelscope - INFO - Loading done! Current index file version is 1.9.2, with md5 1c9bf186d1e03088e5abfbd8664a1def and a total number of 941 components indexed
2023-10-10 21:52:08,686 - modelscope - WARNING - There is no version specified and there is no version in the model repository,use the master branch, which is fragile, please use it with caution!
2023-10-10 21:52:08,686 - modelscope - INFO - Model revision not specified, use revision: master
Init VIT ... Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1354, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1007, in _send_output
    self.send(msg)
  File "/root/miniconda3/lib/python3.8/http/client.py", line 947, in send
    self.connect()
  File "/root/miniconda3/lib/python3.8/http/client.py", line 1421, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/root/miniconda3/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/root/miniconda3/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/root/miniconda3/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "download.py", line 8, in <module>
    model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
  File "/root/miniconda3/lib/python3.8/site-packages/modelscope/utils/hf_util.py", line 181, in from_pretrained
    module_obj = module_class.from_pretrained(model_dir, *model_args,
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
    return model_class.from_pretrained(
  File "/root/miniconda3/lib/python3.8/site-packages/modelscope/utils/hf_util.py", line 78, in from_pretrained
    return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3085, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_InternLM_XComposer.py", line 43, in __init__
    self.visual_encoder = create_eva_vit_g()
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_vit.py", line 522, in create_eva_vit_g
    cached_file = download_cached_file(url, check_hash=False, progress=True)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_utils.py", line 44, in download_cached_file
    timm_hub.download_cached_file(url, check_hash, progress)
  File "/root/miniconda3/lib/python3.8/site-packages/timm/models/_hub.py", line 85, in download_cached_file
    download_url_to_file(url, cached_file, hash_prefix, progress=progress)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/hub.py", line 457, in download_url_to_file
    u = urlopen(req)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1397, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/root/miniconda3/lib/python3.8/urllib/request.py", line 1357, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>
root@autodl-container-9e2911833c-01d8deff:~/autodl-tmp# 

图片未生成

(xcomposer) ➜  InternLM-XComposer git:(main) ✗ python examples/web_demo.py
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
Init VIT ... Done
Init Perceive Sampler ... Done
Init InternLM ... Done
Loading checkpoint sha
 load model done:  <class 'transformers_modules.internlm-xcomposer-7b.modeling_InternLM_XComposer.InternLMXComposerForCausalLM'>
/cpfs01/user/huwenxing/InternLM-XComposer/examples/web_demo.py:1009: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  chat_textbox = gr.Textbox(
Running on local URL:  http://0.0.0.0:11111
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/chatbot.py:161: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Chatbot(...)` instead of `return gr.Chatbot.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.Textbox.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
init
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/markdown.py:92: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Markdown(...)` instead of `return gr.Markdown.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/gallery.py:143: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Gallery(...)` instead of `return gr.Gallery.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/button.py:89: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Button(...)` instead of `return gr.Button.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/components/textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.Textbox.update(...)`.
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/helpers.py:818: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. `return gr.Textbox(...)` instead of `return gr.update(...)
  warnings.warn(
/cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/asyncio/events.py:80: GradioUnusedKwargWarning: You have unused kwarg parameters in Button, please remove them: {'mode': 'static'}
  self._context.run(self._callback, *self._args)

Could not create share link. Missing file: /cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2. 

Please check your internet connection. This can happen if your antivirus software blocks the download of this file. You can install manually by following these steps: 

1. Download this file: https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64
2. Rename the downloaded file to: frpc_linux_amd64_v0.2
3. Move the file to this location: /cpfs01/user/huwenxing/miniconda/envs/xcomposer/lib/python3.10/site-packages/gradio
<object object at 0x7fb765662e10>
敦煌,位于甘肃省西北部,地处河西走廊西端,是古代丝绸之路上的重要交通枢纽和商埠重镇。它拥有着丰富的历史文化遗产,包括莫高窟、鸣沙山月牙泉、雅丹魔鬼城等著名景点。同时,敦煌也是**历史文化名城之一,有着深厚的文化底蕴和独特的民俗风情。

**一、莫高窟**

莫高窟,又名“千佛洞”,是**四大石窟之一,始建于十六国的前秦时期,距今已有1600多年的历史。它是世界上现存规模最大、内容最丰富的佛教艺术宝库,被誉为“东方卢浮宫”。莫高窟内共有735个洞窟,壁画总面积达45000多平方米,彩塑佛像5000余尊,是世界上最大的佛教艺术中心之一。在这里,游客可以欣赏到精美的壁画、雕塑和音乐表演,感受佛教文化的博大精深。

**二、鸣沙山月牙泉**

鸣沙山月牙泉是一处自然奇观,位于敦煌市西北约40公里处的沙漠中。这里地势平坦,沙丘连绵起伏,形成了一片广袤无垠的沙漠景观。而月牙泉则静静地镶嵌在这片沙漠之中,泉水清澈见底,形状如新月,故称“月牙泉”。每到夜晚,月亮升起时,月牙泉周围会发出阵阵清脆的响声,犹如天籁之音,令人心旷神怡。

**三、雅丹魔鬼城**

雅丹魔鬼城是一座典型的风蚀地貌,位于敦煌市西南约100公里的戈壁滩上。这里的地貌奇特,呈现出一种荒凉、神秘、恐怖的景象。由于长期受到风吹日晒雨淋,这里的岩石表面已经变得凹凸不平,形成了各种形态各异的造型,有的像动物,有的像人物,有的像建筑,让人不禁感叹大自然的鬼斧神工。

**四、其他景点**

除了莫高窟、鸣沙山月牙泉和雅丹魔鬼城之外,敦煌还有许多其他值得一游的景点,如玉门关、阳关、锁阳城、汉长城遗址等。这些景点都具有悠久的历史和文化价值,吸引着众多游客前来参观游览。

**五、特色美食**

敦煌的特色美食也非常丰富,其中最有名的当属驴肉黄面了。驴肉黄面是一道以驴肉为主要食材的面食,味道鲜香可口,深受当地人和游客的喜爱。此外,还有羊肉泡馍、胡羊焖饼、烤全羊等特色美食,都是不容错过的美味佳肴。

**六、旅游小贴士**

1. 敦煌气候干燥,日照强烈,紫外线较强,建议游客做好防晒措施,携带防晒霜、遮阳帽、太阳镜等物品。2. 敦煌属于高原地区,海拔较高,游客应注意休息,避免剧烈运动,以免出现高原反应。3. 敦煌旅游景点较多,游客应提前规划好行程,合理安排时间,避免走马观花,错过重要的景点。4. 在敦煌旅游期间,要注意保护环境,不乱扔垃圾,不破坏文物古迹,做一个文明的游客。总之,敦煌是一座历史悠久、文化底蕴深厚、风景优美的城市,是一个值得一游的好去处。希望这篇文章能够帮助您更好地了解敦煌,为您的旅行提供一些有用的信息。
敦煌,位于甘肃省西北部,地处河西走廊西端,是古代丝绸之路上的重要交通枢纽和商埠重镇。它拥有着丰富的历史文化遗产,包括莫高窟、鸣沙山月牙泉、雅丹魔鬼城等著名景点。同时,敦煌也是**历史文化名城之一,有着深厚的文化底蕴和独特的民俗风情。
**一、莫高窟**
莫高窟,又名“千佛洞”,是**四大石窟之一,始建于十六国的前秦时期,距今已有1600多年的历史。它是世界上现存规模最大、内容最丰富的佛教艺术宝库,被誉为“东方卢浮宫”。莫高窟内共有735个洞窟,壁画总面积达45000多平方米,彩塑佛像5000余尊,是世界上最大的佛教艺术中心之一。在这里,游客可以欣赏到精美的壁画、雕塑和音乐表演,感受佛教文化的博大精深。
**二、鸣沙山月牙泉**
鸣沙山月牙泉是一处自然奇观,位于敦煌市西北约40公里处的沙漠中。这里地势平坦,沙丘连绵起伏,形成了一片广袤无垠的沙漠景观。而月牙泉则静静地镶嵌在这片沙漠之中,泉水清澈见底,形状如新月,故称“月牙泉”。每到夜晚,月亮升起时,月牙泉周围会发出阵阵清脆的响声,犹如天籁之音,令人心旷神怡。
**三、雅丹魔鬼城**
雅丹魔鬼城是一座典型的风蚀地貌,位于敦煌市西南约100公里的戈壁滩上。这里的地貌奇特,呈现出一种荒凉、神秘、恐怖的景象。由于长期受到风吹日晒雨淋,这里的岩石表面已经变得凹凸不平,形成了各种形态各异的造型,有的像动物,有的像人物,有的像建筑,让人不禁感叹大自然的鬼斧神工。
**四、其他景点**
除了莫高窟、鸣沙山月牙泉和雅丹魔鬼城之外,敦煌还有许多其他值得一游的景点,如玉门关、阳关、锁阳城、汉长城遗址等。这些景点都具有悠久的历史和文化价值,吸引着众多游客前来参观游览。
**五、特色美食**
敦煌的特色美食也非常丰富,其中最有名的当属驴肉黄面了。驴肉黄面是一道以驴肉为主要食材的面食,味道鲜香可口,深受当地人和游客的喜爱。此外,还有羊肉泡馍、胡羊焖饼、烤全羊等特色美食,都是不容错过的美味佳肴。
**六、旅游小贴士**
1. 敦煌气候干燥,日照强烈,紫外线较强,建议游客做好防晒措施,携带防晒霜、遮阳帽、太阳镜等物品。2. 敦煌属于高原地区,海拔较高,游客应注意休息,避免剧烈运动,以免出现高原反应。3. 敦煌旅游景点较多,游客应提前规划好行程,合理安排时间,避免走马观花,错过重要的景点。4. 在敦煌旅游期间,要注意保护环境,不乱扔垃圾,不破坏文物古迹,做一个文明的游客。总之,敦煌是一座历史悠久、文化底蕴深厚、风景优美的城市,是一个值得一游的好去处。希望这篇文章能够帮助您更好地了解敦煌,为您的旅行提供一些有用的信息。

image

运行web_demo.py,无法初始化模型

Traceback (most recent call last):
File "/home/batch/projects/InternLM-XComposer/examples/web_demo.py", line 816, in
demo_ui = Demo_UI()
File "/home/batch/projects/InternLM-XComposer/examples/web_demo.py", line 47, in init
self.llm_model = AutoModel.from_pretrained(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2675, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/batch/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-vl-7b/modeling_InternLM_XComposer.py", line 49, in init
self.Qformer, self.query_tokens = self.init_qformer(
File "/home/batch/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-vl-7b/modeling_InternLM_XComposer.py", line 122, in init_qformer
encoder_config = BertConfig.from_pretrained("bert-base-uncased")
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 547, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/configuration_utils.py", line 629, in _get_config_dict
resolved_config_file = cached_file(
File "/home/batch/rt/InternLM-X/lib/python3.10/site-packages/transformers/utils/hub.py", line 452, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

No module named 'transformers_modules.internlm/internlm-xcomposer-7b'

下了internlm-xcomposer-7b ,放到internlm/internlm-xcomposer-7b、但是有下列报错

PS E:\InternLM-XComposer> python .\examples\web_demo.py
Traceback (most recent call last):
File "E:\cnai\InternLM-XComposer\examples\web_demo.py", line 816, in
demo_ui = Demo_UI()
File "E:\cnai\InternLM-XComposer\examples\web_demo.py", line 47, in init
self.llm_model = AutoModel.from_pretrained(
File "C:\Python\Python310\lib\site-packages\transformers\models\auto\auto_factory.py", line 456, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "C:\Python\Python310\lib\site-packages\transformers\models\auto\configuration_auto.py", line 953, in from_pretrained
config_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "C:\Python\Python310\lib\site-packages\transformers\dynamic_module_utils.py", line 164, in get_class_in_module
module = importlib.import_module(module_path)
File "C:\Python\Python310\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.internlm/internlm-xcomposer-7b'

Why I meet this problem when I use model to generate?

../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [6,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.

Where may the problem live? Thanks!

AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

The code

model = model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "../internlm-xcomposer-7b-4bit", trust_remote_code=True
)
model.model.tokenizer = tokenizer

met error

File "/home/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b-4bit/tokenization_InternLM_XComposer.py", line 94, in vocab_size
    return self.sp_model.get_piece_size()
AttributeError: 'InternLMXComposerTokenizer' object has no attribute 'sp_model'

OCR support ?

Is it possible to make it work with ocr capability?

CUDA Out of Memory in Multi-GPU Inference

import torch
from transformers import AutoModel, AutoTokenizer
import argparse

def auto_configure_device_map(num_gpus):
    # visual_encoder 算4层
    # internlm_model.model.embed_tokens 占用1层
    # norm 和 lm_head 占用1层
    # transformer.layers 占用 32 层
    # 总共34层分配到num_gpus张卡上
    num_trans_layers = 32
    per_gpu_layers = 38 / num_gpus

    device_map = {
        'visual_encoder': 0,
        'ln_vision': 0,
        'Qformer': 0,
        'internlm_model.model.embed_tokens': 0,
        'internlm_model.model.norm': 0,
        'internlm_model.lm_head': 0,
        'query_tokens': 0,
        'flag_image_start': 0,
        'flag_image_end': 0,
        'internlm_proj.weight': 0,
        'internlm_proj.bias': 0,
    }

    # device_map = {key: 0 for key in device_map.keys()}
    
    used = 6
    gpu_target = 0
    for i in range(num_trans_layers):
        if used >= per_gpu_layers:
            gpu_target += 1
            used = 0
        assert gpu_target < num_gpus
        device_map[f'internlm_model.model.layers.{i}'] = gpu_target
        used += 1

    return device_map

torch.set_grad_enabled(False)

parser = argparse.ArgumentParser()
parser.add_argument("--num_gpus", default=4, type=int)
args = parser.parse_args()

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer-vl-7b', trust_remote_code=True, cache_dir='/storage/internLM/').cuda().eval()
if args.num_gpus > 1:
    from accelerate import dispatch_model
    device_map = auto_configure_device_map(args.num_gpus)
    model = dispatch_model(model, device_map=device_map)

tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer-vl-7b', trust_remote_code=True, cache_dir='/storage/internLM/')
model.tokenizer = tokenizer


# example image
image = 'examples/images/aiyinsitan.jpg'

# Single-Turn Pure-Text Dialogue
text = 'Please introduce Einstein.'
with torch.no_grad():
    with model.maybe_autocast():
        response = model.generate(text)
print(response)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

An example with Transformers to generate text + images

Hi,

I see very interesting the examples detailed for interacting with images with Transformers for VQA etc. However, how can we really generate text + images (with the right history context) with HF transformers?

I cannot see an example with this good feature of your work.

Thanks.

不可以单独生成图片吗?

我这边调用的是internlm-xcomposer-7b的模型,然后输入如下命令:

>>> text='请帮我画一张长城的照片'
>>> response, history = model.chat(text=text, image=None, history=None)
>>> print(response)
很抱歉,作为一个语言模型,我并不具备绘画能力,无法为您画一张长城的照片。但是,如果您愿意,我可以为您提供一些关于长城的资料和信息,帮助您更好地了解这座伟大的建筑。
>>>

他是只能生成图文并茂的文章吗?

在vscode中运行internlm-xcomposer-7b模型,无法生成图文的结果

User: Write a popular science article about “Unraveling the Mysteries of Black Holes: A Scientific Overview” with pictures and illustrations.
Bot: I'm sorry, but as an AI language model, I don't have the capability to create visual content such as pictures and illustrations. However, I can provide you with a text-based summary of the popular science article about "Unraveling the Mysteries of Black Holes: A Scientific Overview".

如图,采用论文中提到的prompt,但是没有得到预期的结果,不知道什么原因

训练lora部分报错

我想做lora训练,只保冻结了其他部分,只保留了lora_A和lora_B,但是反向传播的时候会报错
# 冻结visual_encoder, ln_vision 和 internlm_model 的参数
for param in model.visual_encoder.parameters():
param.requires_grad = False

for param in model.ln_vision.parameters():
    param.requires_grad = False

for param in model.Qformer.parameters():
    param.requires_grad = False

for param in model.internlm_model.parameters():
    param.requires_grad = False

# 解冻需要训练的lora_A和lora_B的参数
for name, param in model.named_parameters():
    if "lora_A" in name or "lora_B" in name:
        param.requires_grad = True

训练代码:
input_ids = data['input_ids'].to(device, dtype=torch.long)
labels = data['labels'].to(device, dtype=torch.long)
attention_mask= data['attention_mask'].to(device, dtype=torch.long)
outputs = model.internlm_model(
input_ids=input_ids,
labels=labels,
attention_mask=attention_mask
)
loss = outputs.loss
# 反向传播,计算当前梯度
loss.backward()

错误如下:
Traceback (most recent call last):
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 190, in
main()
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 175, in main
train(epoch, model, device, training_loader, optimizer, gradient_accumulation_steps,model_output_dir)
File "/data/xinyuuliu/InternLM-XComposer/train_model/train.py", line 55, in train
loss.backward()
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/envs/internLM/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer-7b/modeling_InternLM.py", line 80, in backward
rotary_emb.apply_rotary(dq1, dq2, rearrange(cos[:seqlen], 's d -> s 1 d'),
NameError: name 'rotary_emb' is not defined

Error: click "Insert a fixed number of Images" button error

Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "C:\Python\Python310\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 468, in adjust_img
caps = self.generate_loc_cap(idx_text_sections, int(img_num), progress)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 177, in generate_loc_cap
inject_text, locs = self.generate_loc(text_sections, image_num,
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 132, in generate_loc
for _ in progress.tqdm([1], desc="image spotting"):
TypeError: Progress.tqdm() missing 1 required positional argument: 'iterable'
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "C:\Python\Python310\lib\site-packages\gradio\route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1437, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 650, in wrapper
response = f(*args, **kwargs)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 468, in adjust_img
caps = self.generate_loc_cap(idx_text_sections, int(img_num), progress)
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 177, in generate_loc_cap
inject_text, locs = self.generate_loc(text_sections, image_num,
File "E:\ai\InternLM-XComposer\examples\web_demo.py", line 132, in generate_loc
for _ in progress.tqdm([1], desc="image spotting"):
TypeError: Progress.tqdm() missing 1 required positional argument: 'iterable'

运行web_demo.py 出现错误

下载的是最新的代码文件,运行时出现如下错误:
正在接受医生的检查,医生在为它量体温。', 10: '一只宠物狗在主人帮助下清理自己的粪便,主人在旁边指导。', 12: '一只宠物狗在主人帮助下处理自己的毛发,主人在旁边指导。'}
https://static.openxlab.org.cn/lingbi/jpg-images/61ec717e9ee8ffd984f79d01838de29e352b6aa9b9a04bb60e56a92f00fa72db.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/78668d1138f169a78284213bb2df7991cf77ff48bbdb6f938a3536d238ccbdc7.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/9db1b1ecdfc698526459d4bb519ebd1dfc3b9be9e4983ff8b862dd970257793a.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/2f3ce59b613d2b7b989a919819ef1aed67dcd56d6381b0ea2af68972c700c367.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/3e4914997caf27f88b1219070586a7aa9ce79c6afacf824a640396abab216230.jpg
download image with url
image downloaded
https://static.openxlab.org.cn/lingbi/jpg-images/41eb2c71b25921e4ccb423df6e402caf74d71bc981b9bd699a44bc2d66ec0524.jpg
download image with url
image downloaded
model_select_image
Traceback (most recent call last):
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/queueing.py", line 388, in call_prediction
output = await route_utils.call_process_api(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/route_utils.py", line 219, in call_process_api
output = await app.get_blocks().process_api(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/blocks.py", line 1437, in process_api
result = await self.call_function(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/blocks.py", line 1123, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 512, in async_iteration
return await iterator.anext()
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 505, in anext
return await anyio.to_thread.run_sync(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 488, in run_sync_iterator_async
return next(iterator)
File "/home/enbo/anaconda3/envs/llama2-accessory/lib/python3.10/site-packages/gradio/utils.py", line 638, in gen_wrapper
yield from f(*args, **kwargs)
File "/data/liwx/InternLM-XComposer-main/InternLM-XComposer-main/examples/web_demo.py", line 444, in generate_article
self.selected = self.model_select_image(output_text, caps,
File "/data/liwx/InternLM-XComposer-main/InternLM-XComposer-main/examples/web_demo.py", line 299, in model_select_image
pre_img.append(images[len(pre_img) + ans2idx[answer]].cpu())
KeyError: '<'

internlm-xcomposer-7b-4bit 这个量化模型,运行失败

import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os

torch.set_grad_enabled(False)

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer-7b-4bit', revision = 'master')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model.tokenizer = tokenizer

image

What is the difference between InternConvertedInternLMAttention and InternLMAttention?

InternLMAttention is used in huggingface: https://huggingface.co/internlm/internlm-chat-7b/blob/main/modeling_internlm.py#L257
InternConvertedInternLMAttention is used in this repo: https://github.com/InternLM/InternLM-XComposer/blob/main/huggingface/internlm-xcomposer/modeling_InternLM.py#L732

I set intern_converted_llm to false and found that the results were all wrong. What is the difference between InternConvertedInternLMAttention and InternLMAttention?

No response using model.chat

Hi, I'm using internLM-XComposer to generate some data and I have tried your demo, it works fine when I'm using model.generate.
But when I using model.chat(), model can only reply to the first call, subsequent calls are unresponsive and return empty string.

I'm using:
torch==2.0.1
transformers==4.33.2

My hardware is a single 3090 with 24G GPU memory, so I use 4-bit quantized models and tried your examples/example_chat_4bit.py and find this issue.

Is my environmental issue causing this problem or something else?

how can i install `rotary_emb` ?

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('/root/autodl-tmp/models/internlm7bxc', trust_remote_code=True).cuda().eval()
Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 497, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/llm_chat/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 199, in get_class_in_module
module = importlib.import_module(module_path)
File "/root/miniconda3/envs/llm_chat/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/root/.cache/huggingface/modules/transformers_modules/internlm7bxc/modeling_InternLM_XComposer.py", line 18, in
from .modeling_InternLM import *
File "/root/.cache/huggingface/modules/transformers_modules/internlm7bxc/modeling_InternLM.py", line 5, in
import rotary_emb

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.