01-ai / yi Goto Github PK
View Code? Open in Web Editor NEWA series of large language models trained from scratch by developers @01-ai
Home Page: https://01.ai
License: Apache License 2.0
A series of large language models trained from scratch by developers @01-ai
Home Page: https://01.ai
License: Apache License 2.0
this is not an issue but did not know where to put it. Is there a specific prompt format to use?
6B-模型推理需要多大显存? 直接加载推理爆显存, 24G不够, 跟chatglm2-3的只需要15G左右不一样么?
示例代码中的,max_len是 max_new_tokens的意思?还是prompt+max_new_tokens的意思?
outputs = model.generate(
inputs.input_ids.cuda(),
max_length=max_length,
eos_token_id=tokenizer.eos_token_id
)
exllama这个包怎么安装不上
Looks like currently opensourced not chat model at all
最近使用 candle , 想做 Yi 系列的支持,candle 使用 https://github.com/huggingface/tokenizers 这个库, 使用时候需要一个 tokenizer.json , 在 Yi 系列 中没有这个文件,一些其他模型如:https://huggingface.co/bert-base-chinese ,https://huggingface.co/Salesforce/blip-image-captioning-large 等有相关支持。
看了一下 transformer 文档, 似乎是 fast-tokenziers 这个模块 https://huggingface.co/docs/transformers/fast_tokenizers
之前咨询 ChatGLM 的时候, candle 那边回复如下,不知道 Yi 系列是否能够支持?
candle issue:
huggingface/candle#1177 (comment)
transformers 的一些相关代码 https://github.com/huggingface/transformers/blob/main/src/transformers/convert_slow_tokenizer.py
以下是 candle 支持 marian-mt 修改的 convert_slow_tokenizer.py 的代码
https://github.com/huggingface/candle/blob/main/candle-examples/examples/marian-mt/convert_slow_tokenizer.py#L1262C32-L1262C32
Could you please introduce the details and exciting story of the training process of the YI Foundation Model?
People are very interested in YI AI Infra.
Output of the base model Yi-6B:
There's a place where time stands still. A place of breath taking wonder, but also a place of great danger. A place where the past and the future meet. A place where the dead and the living walk together. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible becomes possible. A place where the impossible
Once I add the EOS token:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model
model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-6B", device_map="auto", torch_dtype="auto", trust_remote_code=True)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-6B", trust_remote_code=True)
# Encode the input text
inputs = tokenizer("There's a place where time stands still. A place of breath taking wonder, but also", return_tensors="pt")
# Set a hard limit for the maximum length of the sequence
max_length = 256
# Generate output with the end-of-sequence token
outputs = model.generate(
inputs.input_ids.cuda(),
max_length=max_length,
eos_token_id=tokenizer.eos_token_id # Use the EOS token ID which is 2
)
# Decode and print the output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
I get this output:
There's a place where time stands still. A place of breath taking wonder, but also a place of great danger. A place where the past and the future collide. A place where the dead walk the earth.
The place is called the Forbidden City.
The Forbidden City is a place of great power. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept. It is a place where the dead are kept, and where the living are kept.
Once the EOS token is set, the model generates correctly.
I would be happy to make a pull request and update the demo code, but either way it should be updated.
Hi there.
This model looks pretty interesting. Are there any details available about the model architecture so other projects (like llama.cpp) could potentially support it?
I saw your information says it was trained from scratch. Is it also a custom type of model, or does it use the same architecture as some other common model type such as LLaMA2?
预训练阶段放入了大量的CoT数据吗。
全量微调了34b版本的模型,发现模型非常倾向于输出CoT过程。
这也导致经常得不到我想要的prompt对应的格式要求输出。
The website says it supports up to 200k but a huggingface post says 32k
Whats the current models real context window size?
if set max_sequence_len to 4k, does the model able to do extrapolation automatically?
Thank you for your contributions to the community.
I tried loading Yi for inference, but I got the following error:
tokenizer = self.AUTO_TOKENIZER_CLASS.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 748, in from_pretrained
raise ValueError(
ValueError: Tokenizer class YiTokenizer does not exist or is not currently imported.
I am using transformers 4.34.0 and I set trust_remote_code=True.
I am aware that since this is a "custom" model, files like "configuration_yi.py", "tokenization_yi.py", and "modeling_yi.py" will be executed.
In addition, I am ware that AutoTokenizer does NOT have YiTokenizer pre-registered [https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/tokenization_auto.py](The source code of AutoTokenizer)
Can you please provide your valuable insights? Thank you very much!
YiModel's forward
function supports input_ Ids==None
and inputs_ embeds != None
, but _prepare_decoder_attention_mask
is not work on input_ids==None
when not using flash_ atten
.
This way, without using flash_ atten
, it will cause an error.
def _prepare_decoder_attention_mask(
self, attention_mask, input_ids, inputs_embeds, past_key_values_length
):
input_shape = input_ids.shape
# create causal mask
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
combined_attention_mask = None
if input_shape[-1] > 1:
combined_attention_mask = _make_causal_mask(
input_shape,
inputs_embeds.dtype,
device=inputs_embeds.device,
past_key_values_length=past_key_values_length,
)
if attention_mask is not None:
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
expanded_attn_mask = _expand_mask(
attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
).to(inputs_embeds.device)
combined_attention_mask = (
expanded_attn_mask
if combined_attention_mask is None
else expanded_attn_mask + combined_attention_mask
)
return combined_attention_mask
I hope that in the future, the YI model can replace the accelerated default "flash attention" with vllm, so that we can benefit from the latest inference speed technology.
("This statement means that vllm already supports the latest 'flash decoding' and there are plans to support 'flash decoding++' in the future.")
请问模型源码公开吗
Is there any technical report?Thanks a lot!
After lora sft Yi-34B
with sharegpt
and oaast_sft
datasets, the self cognition like this
Back to the base model, try to use generate()
to answer the self cognition question...
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("/models/Yi-34B", device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("/models/Yi-34B", trust_remote_code=True)
inputs = tokenizer("我是一个AI助手,可以回答您的问题并提供信息。我由", return_tensors="pt")
outputs = model.generate(inputs.input_ids.cuda(), max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
the output like this
我是一个AI助手,可以回答您的问题并提供信息。我由OpenAI开发,旨在提供准确、无偏见的信息。如果您有任何问题,请随时提问。
Why did I only ask what 1+1 equals, and it added a bunch of irrelevant information in addition to answering that it equals 2?
Hi,
Just a quick question on the py requirement.txt
https://github.com/01-ai/Yi/blob/main/requirements.txt#L5
What do you require autogptq and exllama? I don't think you either use gptq for quantization or use exllama for inferencing?
Thanks!
Hi, Is there any plans to upgrade the LLM with multimodal capability in the near future?
or other something awsome can run ReAct agents?
model_path = './01ai/Yi-6B/'
model = AutoModelForCausalLM.from_pretrained(model_path, device_map={'': 0}, torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
prompt = '你会做什么?'
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs.input_ids.cuda(), max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
结果回复我:
你会做什么?
A. 把车开到路边,然后打电话给保险公司
B. 把车开到路边,然后打电话给警察
C. 把车开到路边,然后打电话给朋友
D. 把车开到路边,然后打电话给家人
E. 把车开到路边,然后打电话给保险公司,然后打电话给警察
F. 把车开到路边,然后打电话给警察,然后打电话给保险公司
G. 把车开到路边,然后打电话给朋友,然后打电话给保险公司
H. 把车开到路边,然后打电话给家人,然后打电话给保险公司
I. 把车开到路边,然后打电话给保险公司,然后打电话给警察,然后打电话给朋友
J. 把车开到路边,然后打电话给保险公司,然后打电话给警察,然后打电话给家人
K. 把车开到路边,然后打电话给保险公司,然后打电话给警察,然后打电话给朋友,然后打电话给家人
L. 把车开到路边,然后打电话给保险公司,然后打电话给警察,然后打电话给朋友,然后打电话给家人,然后打电话给朋友
M. 把车开到路边,然后打电话给
Will the 6B model with a length of 200K be released? I heard that this version of the model is a fine-tuned model, I'm not sure if there will be a checkpoint.
MT-Bench | AGIEval | BBH MC | TruthfulQA | MMLU | HumanEval | BBH CoT | GSM8K
only change --max_seq_len 204800
in the finetune/scripts/run_sft_Yi_6b.sh?
Thanks for your models.
Are there any details available about how the model support 200k context length?
Thanks.
你好,可以帮我讲一下新时代社会主义不
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn
你好 🖐️
Did you explicitly filter other languages (non English and non Chinese) from the pretrain dataset?
If not, what are proportions of them?
yi-34b需要的资源是多少?单卡3090或4090能跑吗?还是需要多卡?
建议搞官方的微信支持群,方便响应,提高社区交互能力
As the title, ollama is quite convenient in LLM.
Hello,
Do we plan to have a fine-tuned version for coding and HumanEval for the Yi-6B-200K-Chat
model?
If yes, is the end of November 2023 a likely release date?
Thank you!
代码同 README:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-34B", device_map="auto", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-34B", trust_remote_code=True)
inputs = tokenizer("There's a place where time stands still. A place of breath taking wonder, but also", return_tensors="pt")
max_length = 256
outputs = model.generate(
inputs.input_ids.cuda(),
max_length=max_length,
eos_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
输出如下:
There's a place where time stands still. A place of breath taking wonder, but also a are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is are is
输入 tokenizer.encode 后结果为
tensor([[ 6444, 59610, 59575, 562, 1700, 1151, 922, 8954, 1451, 98,
647, 1700, 593, 8253, 2863, 3755, 97, 796, 962]])
输出 tokenizer.decode 前结果为
tensor([[ 6444, 59610, 59575, 562, 1700, 1151, 922, 8954, 1451, 98,
647, 1700, 593, 8253, 2863, 3755, 97, 796, 962, 562,
678, 620, 678, 620, 678, 620, 678, 620, 678, 620,
678, 620, 678, 620, 678, 620, 678, 620, 678, 620,
678, 620, 678, 620, 678, 620, 678, 620, 678, 620,
..........
678, 620, 678, 620, 678, 620]], device='cuda:0')
因此判断是 generate 过程中的问题;01-ai/Yi-6B
与 01-ai/Yi-34B
的输出相同。
已经从 HuggingFace 拉取最新模型;关键依赖包版本如下:
torch==1.13.1
transformers==4.34.1
请问推理200k上下文需要多大的资源?看模型文件,没看到常规的提升长度手段
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.