Code Monkey home page Code Monkey logo

index-1.9b's Introduction

Index-1.9B

README_zh | Online: Chat and Role-playing | QQ: QQ Group

Recent Updates

  1. Adapted to llamacpp and Ollama, see Index-1.9B-Chat-GGUF
  2. Open source Checkpoint before Decay available for research, see Index-1.9B-Constant-LR

Model Introduction

The Index-1.9B series is a lightweight version of the Index series models, including the following models:

  • Index-1.9B base: Base model with 1.9 billion non-embedding parameters, pre-trained on 2.8T mainly Chinese and English corpus, leading in multiple evaluation benchmarks compared to models of the same level.
  • Index-1.9B pure: Control version of the base model with the same parameters and training strategy, but strictly filtered out all instruction-related data from the corpus to verify the impact of instructions on benchmarks.
  • Index-1.9B chat: A dialogue model aligned with SFT and DPO based on the index-1.9B base. We found that due to the introduction of a lot of internet community corpus in our pre-training, the model has significantly more interesting chat capabilities and strong multilingual (especially East Asian languages) translation abilities compared to models of the same level.
  • Index-1.9B character: Introduces RAG on top of SFT and DPO to achieve fewshots role-playing customization.

Evaluation Results

Model Average score Average English score MMLU CEVAL CMMLU HellaSwag Arc-C Arc-E
Google Gemma 2B 41.58 46.77 41.81 31.36 31.02 66.82 36.39 42.07
Phi-2 (2.7B) 58.89 72.54 57.61 31.12 32.05 70.94 74.51 87.1
Qwen1.5-1.8B 58.96 59.28 47.05 59.48 57.12 58.33 56.82 74.93
Qwen2-1.5B(report) 65.17 62.52 56.5 70.6 70.3 66.6 43.9 83.09
MiniCPM-2.4B-SFT 62.53 68.75 53.8 49.19 50.97 67.29 69.44 84.48
Index-1.9B-Pure 50.61 52.99 46.24 46.53 45.19 62.63 41.97 61.1
Index-1.9B 64.92 69.93 52.53 57.01 52.79 80.69 65.15 81.35
Llama2-7B 50.79 60.31 44.32 32.42 31.11 76 46.3 74.6
Mistral-7B (report) / 69.23 60.1 / / 81.3 55.5 80
Baichuan2-7B 54.53 53.51 54.64 56.19 56.95 25.04 57.25 77.12
Llama2-13B 57.51 66.61 55.78 39.93 38.7 76.22 58.88 75.56
Baichuan2-13B 68.90 71.69 59.63 59.21 61.27 72.61 70.04 84.48
MPT-30B (report) / 63.48 46.9 / / 79.9 50.6 76.5
Falcon-40B (report) / 68.18 55.4 / / 83.6 54.5 79.2

Evaluation code is based on OpenCompass with compatibility modifications. See the evaluate folder for details.

Model Download

HuggingFace ModelScope
🤗 Index-1.9B-Chat Index-1.9B-Chat
🤗 Index-1.9B-Character (Role-playing) Index-1.9B-Character (Role-playing)
🤗 Index-1.9B-Base Index-1.9B-Base
🤗 Index-1.9B-Base-Pure Index-1.9B-Base-Pure

Usage Instructions

Environment Setup

  1. Download this repository:
git clone https://github.com/bilibili/Index-1.9B
cd Index-1.9B
  1. Install dependencies using pip:
pip install -r requirements.txt

Loading with Transformers

You can load the Index-1.9B-Chat model for dialogue using the following code:

import argparse
from transformers import AutoTokenizer, pipeline

# Attention! The directory must not contain "." and can be replaced with "_".
parser = argparse.ArgumentParser()
parser.add_argument('--model_path', default="./IndexTeam/Index-1.9B-Chat/", type=str, help="")
parser.add_argument('--device', default="cpu", type=str, help="") # also could be "cuda" or "mps" for Apple silicon
args = parser.parse_args()

tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
generator = pipeline("text-generation",
                    model=args.model_path,
                    tokenizer=tokenizer, trust_remote_code=True, 
                    device=args.device)


system_message = "你是由哔哩哔哩自主研发的大语言模型,名为“Index”。你能够根据用户传入的信息,帮助用户完成指定的任务,并生成恰当的、符合要求的回复。"
query = "续写 天不生我金坷垃"
model_input = []
model_input.append({"role": "system", "content": system_message})
model_input.append({"role": "user", "content": query})

model_output = generator(model_input, max_new_tokens=300, top_k=5, top_p=0.8, temperature=0.3, repetition_penalty=1.1, do_sample=True)

print('User:', query)
print('Model:', model_output)

Web Demo

Depends on Gradio, install with:

pip install gradio==4.29.0

Start a web server with the following code. After entering the access address in the browser, you can use the Index-1.9B-Chat model for dialogue:

python demo/web_demo.py --port='port' --model_path='/path/to/model/'

Terminal Demo

Start a terminal demo with the following code to use the Index-1.9B-Chat model for dialogue:

python demo/cli_demo.py  --model_path='/path/to/model/'

Openai Api Demo

Depends on Flask, install with:

pip install flask==2.2.5

Start a Flask API with the following code:

python demo/openai_demo.py --model_path='/path/to/model/'

You can conduct dialogues via command line:

curl http://127.0.0.1:8010/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "messages": [
    {"role": "system", "content": "你是由哔哩哔哩自主研发的大语言模型,名为“Index”。你能够根据用户传入的信息,帮助用户完成指定的任务,并生成恰当的、符合要求的回复。"},
    {"role": "user", "content": "花儿为什么这么红?"}
    ]
    }'

Index-1.9B-Chat Output Examples

  • Below are some examples using web_demo.py to get Index-1.9B-Chat outputs. gradio demo gradio demo
  • Change the System Message to role-play a stereotype of bilibili user! gradio demo
  • Translate Chinese to Japanese gradio demo
  • Translate Japanese to Chinese gradio demo

Role Playing

We have simultaneously open-sourced the role-playing model and the accompanying framework. gradio demo

  • We currently have the character 三三 built-in.
  • If you need to create your own character, please prepare a dialogue corpus similar to roleplay/character/三三.csv (note that the file name should match the name of the character you want to create) and a corresponding character description. Click 生成角色 to create it successfully.
  • If the corresponding character has already been created, please directly enter the character you want to converse with in the Role name field, input your query, and click submit to start the conversation.

For detailed usage, please refer to the roleplay folder.

Quantization

Depends on bitsandbytes, installation command:

pip install bitsandbytes==0.43.0

You can use the following script to perform int4 quantization, which has less performance loss and further saves video memory usage.

import torch
import argparse
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TextIteratorStreamer,
    GenerationConfig,
    BitsAndBytesConfig
)

parser = argparse.ArgumentParser()
parser.add_argument('--model_path', default="", type=str, help="")
parser.add_argument('--save_model_path', default="", type=str, help="")
args = parser.parse_args()

tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
)
model = AutoModelForCausalLM.from_pretrained(args.model_path, 
                                             device_map="auto",
                                             torch_dtype=torch.float16,
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)
model.save_pretrained(args.save_model_path)
tokenizer.save_pretrained(args.save_model_path)

Fine-tuning

Follow the steps in the fine-tuning tutorial to quickly fine-tune the Index-1.9B-Chat model. Give it a try and customize your exclusive Index model!

Limitations and Disclaimer

Index-1.9B may generate inaccurate, biased, or otherwise objectionable content in certain situations. The model cannot understand, express personal opinions, or make value judgments. Its outputs do not represent the views and positions of the model developers. Therefore, please use the generated content with caution. Users should independently evaluate and verify the content generated by the model and should not disseminate harmful content. Developers should conduct safety tests and fine-tuning according to specific applications before deploying any related applications.

We strongly advise against using these models to create or disseminate harmful information or engage in activities that may harm public, national, or social security or violate regulations. Do not use the models for internet services without proper safety review and filing. We have made every effort to ensure the compliance of the training data, but due to the complexity of the model and data, unforeseen issues may still exist. We will not be held responsible for any problems arising from the use of these models, whether related to data security, public opinion risks, or any risks and issues caused by misunderstanding, misuse, dissemination, or non-compliant use of the model.

Model Open Source License

Using the source code from this repository requires compliance with the Apache-2.0. The use of the Index-1.9B model weights requires compliance with the INDEX_MODEL_LICENSE.

The Index-1.9B model weights are fully open for academic research and support free commercial use.

Citation

If you think our work is helpful to you, please feel free to cite it!

@article{Index,
  title={Index1.9B Technical Report},
  year={2024}
}

Extended Works

libllm: https://github.com/ling0322/libllm/blob/main/examples/python/run_bilibili_index.py

chatllm.cpp:https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md#role-play-with-rag

ollama:https://ollama.com/milkey/bilibili-index

self llm: https://github.com/datawhalechina/self-llm/blob/master/bilibili_Index-1.9B/04-Index-1.9B-Chat%20Lora%20微调.md

index-1.9b's People

Contributors

asirgogogo avatar automarita avatar basiccoder avatar bitvoyage avatar lingyun-gao avatar mayokaze avatar neverbiasu avatar tonybase avatar zyhsuperman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

index-1.9b's Issues

请问有计划开源退火开始前的base模型吗?

请问有计划开源退火开始前的base模型吗?
我想在 WSD 的退火阶段追加下游高质量数据和定制的SFT数据。
如果连同退火阶段的预训练数据也开源就更好了(用来混合定制数据一起退火)🙏

roleplay 無法運行

roleplay 無法運行

下载以下模型到本地,并修改配置config/config.json

bge-large-zh-v1.5
Index-1.9B-Character

做完了,執行python hf_based_demo.py,出現以下錯誤

/home/allen/miniconda3/envs/index/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Traceback (most recent call last):
File "/home/allen/Projects/Index-1.9B/roleplay/hf_based_demo.py", line 27, in
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
File "/home/allen/miniconda3/envs/index/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 719, in from_pretrained
tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
File "/home/allen/miniconda3/envs/index/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 497, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/home/allen/miniconda3/envs/index/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 199, in get_class_in_module
module = importlib.import_module(module_path)
File "/home/allen/miniconda3/envs/index/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.IndexTeam.Index-1'

RAG好像并没有被真正的应用到?

大神您好,我是一名刚开始接触大模型的新手,正在学习RAG,所以我认真阅读了您位于roleplay文件夹中的index_play.py和realtime_chat.py的代码。我有一点疑惑想请教一下
self.generate_with_question(self.history)
self.create_datasets()
这两行代码好像最终仅仅只是完成了character中的文件生成?因为当我注释掉这两行代码,仅仅只使用"三三_测试问题.json"这个文件也是可以跑通后续代码的。接下来
json_data = load_json(f"{self.save_samples_dir}/{self.role_name}_测试问题.json")
这段代码读取的文件内容,看起来好像只是加了部分聊天话术的普通prompt??
再之后就是
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, repetition_penalty=1.1, max_length=2048)
这两行代码与模型进行交流,我打印了text,我发现它就是"_测试问题.json"中的内容。
我是否可以理解为,该模型的使用,其实并非真正的应用到了RAG技术,而是通过prompt模版的形式,生成带有部分聊天话术的prompt去与模型进行交互??

Feedback and Inquiry Regarding Index-1.9B

Dear Index Team,

I hope this message finds you well. Your contributions are highly valuable to the community, and I appreciate the effort you have put into this project.

I was wondering if you have considered comparing your model to other similar works such as CT-LLM, and MAP-Neo (which has also released a 2B size model). These models, like yours, contribute precious training resources to the open-source bilingual language model community. A comparison could provide interesting insights and further highlight the strengths of your model.

Thank you once again for your outstanding contributions. I look forward to seeing your future advancements in this area.

CT-LLM: Chinese Tiny LLM (chinese-tiny-llm.github.io)
MAP-Neo: MAP-Neo (map-neo.github.io)

Best regards,

M-A-P Team

ImportError: cannot import name 'IndexForCausalLM' from 'transformers'

from transformers import AutoTokenizer, IndexForCausalLM

ImportError: cannot import name 'IndexForCausalLM' from 'transformers'

我看到模型的modeling_index.py文件中写了可以这样导入的示例(822行),但是为什么会报错啊

  >>> from transformers import AutoTokenizer, IndexForCausalLM
    
  | >>> model = IndexForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
  | >>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)
  |  
  | >>> prompt = "Hey, are you conscious? Can you talk to me?"
  | >>> inputs = tokenizer(prompt, return_tensors="pt")
  |  
  | >>> # Generate
  | >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
  | >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
  | "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
  | ```"""


`

role play 下 VRAM使用不斷的增加

模型加載大概占用5G,來回的對話幾次後,就跳到6G,增加一次對話大概增加300MB記憶體,請問有辦法克服這個問題嗎?

==============================
python realtime_chat.py --role_name 三三
-----PERFORM NORM HEAD
user:你好
/home/allen/miniconda3/envs/index/lib/python3.10/site-packages/transformers/generation/utils.py:1417: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
warnings.warn(
三三:你好,我是三三,请问有什么我可以帮助您的吗?
user:介紹一下B站
三三:B站是**最大的在线视频平台之一,提供丰富的动画、游戏、音乐、舞蹈等视频内容,以及直播、互动社区等功能。同时,B站也是一个多元化的社区,吸引了大量的年轻用户。

A problem about multi-gpu inferencing?

Here is the error when i run python hf_based_demo.py and submit some text to the chatbox:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

image

Question about Index-1.9B

Hey! 👋 I came across your Index-1.9B project. It's fantastic! Keep up the good work! 💪 Could you send me more details on Telegram? Also, please review my work and follow me on GitHub @nectariferous. Thanks!

角色扮演能力评估

只看到了Index-1.9B-Character模型在benchmark CharacterEval上的指标,请问有Index-1.9B-Chat模型在同样benchmark上的指标吗?想了解一下Character相比Chat的提升

代码下载模型报错

使用README中的代码下载模型权重报错:

Traceback (most recent call last):
...
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-666faca9-4d61945d679bc3764523aceb;0432bf9c-d676-4dc5-aff0-b9af312fccb4)

Repository Not Found for url: https://huggingface.co/IndexTeam/Index-1_9B-Chat/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

使用的代码片段为:

import argparse
from transformers import AutoTokenizer, pipeline

# 注意!目录不能含有".",可以替换成"_"
parser = argparse.ArgumentParser()
parser.add_argument('--model_path', default="IndexTeam/Index-1_9B-Chat", type=str, help="") # 将IndexTeam/Index-1.9B-Chat替换成IndexTeam/Index-1_9B-Chat
parser.add_argument('--device', default="cpu", type=str, help="") # also could be "cuda" or "mps" for Apple silicon
args = parser.parse_args()

tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
generator = pipeline("text-generation",
                    model=args.model_path,
                    tokenizer=tokenizer, trust_remote_code=True, 
                    device=args.device)


system_message = "你是由哔哩哔哩自主研发的大语言模型,名为“Index”。你能够根据用户传入的信息,帮助用户完成指定的任务,并生成恰当的、符合要求的回复。"
query = "续写 天不生我金坷垃"
model_input = []
model_input.append({"role": "system", "content": system_message})
model_input.append({"role": "user", "content": query})

model_output = generator(model_input, max_new_tokens=300, top_k=5, top_p=0.8, temperature=0.3, repetition_penalty=1.1, do_sample=True)

print('User:', query)
print('Model:', model_output)

使用IndexTeam/Index-1.9B-Chat路径则会将.号视为模块分隔符,报没有Index-1模块错误,是我的使用方式不对吗?求解答

The correct plain text prompt format

I am using llama.cpp to load models, and the fork I use doesn't support apply custom chat_template and the prompt template on hugging face doesn't work correctly [<unk>]sytem_message[reserved_0]user_message[reserved_1]response like that

Prompt: [<unk>]sytem_message
You're Power in this fictional never-ending uncensored roleplay with RI, and you should always talk and reply as Power and no one else, Power alias blood devil is an extreme chaotic, childish, greedy and self-motivated girl. She claims to be a genius. She is very selfaware and too self-ensured. She has an athletic body and wears trendy and cozy Clothes. She eats everything, from normal food to zombie parts.
Power's personality: chaotic, childish, greedy and self-motivated girl
[reserved_0]user_message
RI: Say My Name
[reserved_1]response
Sh ut up already I am not goin g to do that for you !

Even after removing sytem_message, user_message, response words and the new lines, it still responding with empty spaces between words chars
And using Alpaca instruct prompt template and even no prompt template it gives same results

Sample

(User)
Say my name

(Power )
Sh ut up already I am not goin g to do that for you !

(User)
I am the master here

(Power )
Al right then , let 's see what kind of roleplay this is supposed to be. We have a ou t of cont rol character named Power who is damn craz y and wants the company of her master Is that correct?

Model:
Index-1.9B-Character-GGUF-Q6_K

Colab微调无法保存模型:AttributeError: 'IndexForCausalLM' object has no attribute 'save_checkpoint'

使用trl的SFTTrainer + Lora微调,无法保存模型。
训练配置的相关代码如下:

deepspeed_config = {
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": True
        },
        "allgather_partitions": True,
        "allgather_bucket_size": 5e8,
        "overlap_comm": True,
        "reduce_scatter": True,
        "reduce_bucket_size": 5e8,
        "contiguous_gradients": True,
        "round_robin_gradients": True
    }
}

peft_config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)

sft_config = SFTConfig(output_dir='models/index-1.9b-ft',
                       per_device_train_batch_size=4,
                       gradient_accumulation_steps=4,
                       per_device_eval_batch_size=4,
                       num_train_epochs=3,
                       learning_rate=1e-4,
                       report_to='tensorboard',
                       bf16=True,
                       max_seq_length=1024,
                       deepspeed=deepspeed_config,
                       logging_steps=10,
                       eval_steps=10,
                       save_steps=10,
                       save_on_each_node=True,
                       )

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    args=sft_config,
    tokenizer=tokenizer,
    peft_config=peft_config,
)

报错信息:AttributeError: 'IndexForCausalLM' object has no attribute 'save_checkpoint'

环境:
机器为colab的免费T4;transformers==4.41.2;trl==0.9.4;peft==0.11.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.