01-ai / yi Goto Github PK

View Code? Open in Web Editor NEW

7.2K 7.2K 428.0 11.92 MB

A series of large language models trained from scratch by developers @01-ai

Home Page: https://01.ai

License: Apache License 2.0

Dockerfile 1.15% Python 97.67% Shell 1.18%

large-language-models

yi's People

Contributors

Stargazers

Watchers

Forkers

yeoedward kfxhjz xlg123 lindalala corian timothykoh wonabru kku1993 lykwan zzdx713 object-weep xfg0913 o-s-a-m-a-k-h-a-n kalay1995 linkinng chenwesley wbing520 mbyase cyrilkong onerai fingerx thelongestusernameofall yibit ftgreat hertera1 tzhangchi g711ab waytoowill haikuoxin cgoder zky001 aimetrics markshawn2020 s-j-garg atdavidpark majiajue davpark11 phplaoyao yunjiao-chen aidenli starstylesky pe1221 fword chiyee sxm1129 whitemagic2014 apollohuang1 oceantalk kunlun-zhu cywjava goodjava jqk6 okoge-kaz pardon110 akaroka liunix61 ythpy82421 huyang19881115 xiaoyichao ishine sluk3r ilyamk yuwenfu saeedfarhang aiworkspace evanoxu ffff0505k mz0in tomchapin sorokinvld woodshope ediboko1980 huiguyy li3498637 tdolan21 zliang01 danielchen888 kingfener f901107 piaoyao424 chenhuayou abowwang fangwudi robinrowe snoopycn cyl2k19 mcx xiebinbb qq823626715 isunspot xiqiyangyang19 hihaluemen dfnl230 artart788 jadeluo animesh yangbo524 tommytang930 johnsonman stwab

yi's Issues

prompt format?

this is not an issue but did not know where to put it. Is there a specific prompt format to use?

6B-模型推理需要多大显存？

6B-模型推理需要多大显存？直接加载推理爆显存， 24G不够，跟chatglm2-3的只需要15G左右不一样么？

generate的max_len解惑

示例代码中的，max_len是 max_new_tokens的意思？还是prompt+max_new_tokens的意思？

outputs = model.generate(
inputs.input_ids.cuda(),
max_length=max_length,
eos_token_id=tokenizer.eos_token_id
)

will chat model opensource as well?

Looks like currently opensourced not chat model at all

是否能支持 huggingface/tokenizers

最近使用 candle , 想做 Yi 系列的支持，candle 使用 https://github.com/huggingface/tokenizers 这个库，使用时候需要一个 tokenizer.json , 在 Yi 系列中没有这个文件，一些其他模型如：https://huggingface.co/bert-base-chinese ,https://huggingface.co/Salesforce/blip-image-captioning-large 等有相关支持。
看了一下 transformer 文档，似乎是 fast-tokenziers 这个模块 https://huggingface.co/docs/transformers/fast_tokenizers

之前咨询 ChatGLM 的时候， candle 那边回复如下，不知道 Yi 系列是否能够支持？
candle issue:
huggingface/candle#1177 (comment)

以下是 candle 支持 marian-mt 修改的 convert_slow_tokenizer.py 的代码
https://github.com/huggingface/candle/blob/main/candle-examples/examples/marian-mt/convert_slow_tokenizer.py#L1262C32-L1262C32

About the details and exciting story of the training process of the YI Foundation Model

Could you please introduce the details and exciting story of the training process of the YI Foundation Model?
People are very interested in YI AI Infra.

eos token is not set in the demo code.

Output of the base model Yi-6B:

There's a place where time stands still. A place of breath taking wonder, but also a place of great danger. A place where the past and the future meet. A place where the dead and the living walk together. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible

Once I add the EOS token:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model
model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-6B", device_map="auto", torch_dtype="auto", trust_remote_code=True)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-6B", trust_remote_code=True)
# Encode the input text
inputs = tokenizer("There's a place where time stands still. A place of breath taking wonder, but also", return_tensors="pt")
# Set a hard limit for the maximum length of the sequence
max_length = 256  

# Generate output with the end-of-sequence token
outputs = model.generate(
    inputs.input_ids.cuda(),
    max_length=max_length,
    eos_token_id=tokenizer.eos_token_id  # Use the EOS token ID which is 2
)
# Decode and print the output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

I get this output:

There's a place where time stands still. A place of breath taking wonder, but also a place of great danger. A place where the past and the future collide. A place where the dead walk the earth.
The place is called the Forbidden City.
The Forbidden City is a place of great power. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept.

Once the EOS token is set, the model generates correctly.

I would be happy to make a pull request and update the demo code, but either way it should be updated.

求一下技术报告

这玩意怎么打的广告？有商机

Model architecture?

Hi there.

This model looks pretty interesting. Are there any details available about the model architecture so other projects (like llama.cpp) could potentially support it?

I saw your information says it was trained from scratch. Is it also a custom type of model, or does it use the same architecture as some other common model type such as LLaMA2?

预训练语料问题

预训练阶段放入了大量的CoT数据吗。
全量微调了34b版本的模型，发现模型非常倾向于输出CoT过程。
这也导致经常得不到我想要的prompt对应的格式要求输出。

Whats the context window size for this 34B model?

The website says it supports up to 200k but a huggingface post says 32k

Whats the current models real context window size?

how to sft support larger context length?

if set max_sequence_len to 4k, does the model able to do extrapolation automatically?

Tokenizer class YiTokenizer does not exist or is not currently imported.

Thank you for your contributions to the community.

I tried loading Yi for inference, but I got the following error:

tokenizer = self.AUTO_TOKENIZER_CLASS.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 748, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class YiTokenizer does not exist or is not currently imported.

I am using transformers 4.34.0 and I set trust_remote_code=True.

I am aware that since this is a "custom" model, files like "configuration_yi.py", "tokenization_yi.py", and "modeling_yi.py" will be executed.

In addition, I am ware that AutoTokenizer does NOT have YiTokenizer pre-registered [https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/tokenization_auto.py](The source code of AutoTokenizer)

Can you please provide your valuable insights? Thank you very much!

YiModel's `forward` cannot work on `input_ids==None` when not using `flash_ atten`

YiModel's forward function supports input_ Ids==None and inputs_ embeds != None , but _prepare_decoder_attention_mask is not work on input_ids==None when not using flash_ atten.
This way, without using flash_ atten, it will cause an error.

    def _prepare_decoder_attention_mask(
        self, attention_mask, input_ids, inputs_embeds, past_key_values_length
    ):
        input_shape = input_ids.shape
        # create causal mask
        # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
        combined_attention_mask = None
        if input_shape[-1] > 1:
            combined_attention_mask = _make_causal_mask(
                input_shape,
                inputs_embeds.dtype,
                device=inputs_embeds.device,
                past_key_values_length=past_key_values_length,
            )

        if attention_mask is not None:
            # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
            expanded_attn_mask = _expand_mask(
                attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
            ).to(inputs_embeds.device)
            combined_attention_mask = (
                expanded_attn_mask
                if combined_attention_mask is None
                else expanded_attn_mask + combined_attention_mask
            )

        return combined_attention_mask

Request to replace the model acceleration technique "flash attention" with a more versatile "vllm".

I hope that in the future, the YI model can replace the accelerated default "flash attention" with vllm, so that we can benefit from the latest inference speed technology.

（"This statement means that vllm already supports the latest 'flash decoding' and there are plans to support 'flash decoding++' in the future."）

How to apply for commercial use?

请问模型源码公开吗

Any technical report？

Is there any technical report？Thanks a lot!

d

Hallucination after finetuning

After lora sft Yi-34B with sharegpt and oaast_sft datasets, the self cognition like this

Back to the base model, try to use generate() to answer the self cognition question...

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("/models/Yi-34B", device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("/models/Yi-34B", trust_remote_code=True)
inputs = tokenizer("我是一个AI助手，可以回答您的问题并提供信息。我由", return_tensors="pt")
outputs = model.generate(inputs.input_ids.cuda(), max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

the output like this

我是一个AI助手，可以回答您的问题并提供信息。我由OpenAI开发，旨在提供准确、无偏见的信息。如果您有任何问题，请随时提问。

A large amount of irrelevant information is generated

Why did I only ask what 1+1 equals, and it added a bunch of irrelevant information in addition to answering that it equals 2?

Why requires autogptq and exllama?

Hi,

Just a quick question on the py requirement.txt

https://github.com/01-ai/Yi/blob/main/requirements.txt#L5

What do you require autogptq and exllama? I don't think you either use gptq for quantization or use exllama for inferencing?

Thanks!

Setup github action to build & push docker image to ghcr.io

Any plans for upgrading to multimodal capability?

Hi, Is there any plans to upgrade the LLM with multimodal capability in the near future?

Does Yi model support function call?

or other something awsome can run ReAct agents?

matihello

6B的模型输出是不是太奇怪了

model_path = './01ai/Yi-6B/'
model = AutoModelForCausalLM.from_pretrained(model_path, device_map={'': 0}, torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
prompt = '你会做什么？'
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs.input_ids.cuda(), max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

结果回复我：

你会做什么？
A. 把车开到路边，然后打电话给保险公司
B. 把车开到路边，然后打电话给警察
C. 把车开到路边，然后打电话给朋友
D. 把车开到路边，然后打电话给家人
E. 把车开到路边，然后打电话给保险公司，然后打电话给警察
F. 把车开到路边，然后打电话给警察，然后打电话给保险公司
G. 把车开到路边，然后打电话给朋友，然后打电话给保险公司
H. 把车开到路边，然后打电话给家人，然后打电话给保险公司
I. 把车开到路边，然后打电话给保险公司，然后打电话给警察，然后打电话给朋友
J. 把车开到路边，然后打电话给保险公司，然后打电话给警察，然后打电话给家人
K. 把车开到路边，然后打电话给保险公司，然后打电话给警察，然后打电话给朋友，然后打电话给家人
L. 把车开到路边，然后打电话给保险公司，然后打电话给警察，然后打电话给朋友，然后打电话给家人，然后打电话给朋友
M. 把车开到路边，然后打电话给

Thanks.

If not, what are proportions of them?

If yes, is the end of November 2023 a likely release date?

Thank you!

Yi-6B 推理输出异常

代码同 README:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-34B", device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-34B", trust_remote_code=True)
inputs = tokenizer("There's a place where time stands still. A place of breath taking wonder, but also", return_tensors="pt")
max_length = 256  

outputs = model.generate(
    inputs.input_ids.cuda(),
    max_length=max_length,
    eos_token_id=tokenizer.eos_token_id 
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

输出如下：

There's a place where time stands still. A place of breath taking wonder, but also a are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is

输入 tokenizer.encode 后结果为

tensor([[ 6444, 59610, 59575,   562,  1700,  1151,   922,  8954,  1451,    98,
           647,  1700,   593,  8253,  2863,  3755,    97,   796,   962]])

输出 tokenizer.decode 前结果为

tensor([[ 6444, 59610, 59575,   562,  1700,  1151,   922,  8954,  1451,    98,
           647,  1700,   593,  8253,  2863,  3755,    97,   796,   962,   562,
           678,   620,   678,   620,   678,   620,   678,   620,   678,   620,
           678,   620,   678,   620,   678,   620,   678,   620,   678,   620,
           678,   620,   678,   620,   678,   620,   678,   620,   678,   620,
           ..........
           678,   620,   678,   620,   678,   620]], device='cuda:0')

因此判断是 generate 过程中的问题；01-ai/Yi-6B 与 01-ai/Yi-34B 的输出相同。

已经从 HuggingFace 拉取最新模型；关键依赖包版本如下：

torch==1.13.1
transformers==4.34.1

200k上下文

请问推理200k上下文需要多大的资源？看模型文件，没看到常规的提升长度手段