Code Monkey home page Code Monkey logo

alpaca_eval's People

Contributors

44670 avatar actions-user avatar aligninc avatar c1rn09 avatar gblazex avatar genezc avatar haniitani avatar hendrydong avatar hyperdrivehustle avatar imoneoi avatar inferllm avatar jdf-prog avatar jetrunner avatar jondurbin avatar kyleliang919 avatar lxuechen avatar muennighoff avatar nbl97 avatar reign12 avatar rtaori avatar sanderland avatar tiiiger avatar victorsungo avatar vpeterv avatar winglian avatar xianxl avatar yanndubs avatar yuani114 avatar yulinchen99 avatar zfang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alpaca_eval's Issues

[style] fix ill-formatted logging message

Some of the log messages are single multi-line strings (for example this). These multiline strings don't display nicely on console due to implicit tabs. Can reformat to use newlines.

Post-analysis of the annotations

Since the two outputs are randomly ordered during evaluation, it is hard to conduct post-analysis of the evaluation results. Adding extra fields such as output_1_source, output_2_source, data_source in the output json file would be great.

[LOG] improve the logging for OpenAI maximum context length

I tried to run the chatgpt evaluator, but your OpenAPI requestor seems to go into infinite retry loops:

WARNING:root:OpenAIError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4452 tokens. Please reduce the length of the messages..
WARNING:root:Hit request rate limit; retrying...
(repeats forever)

And similar loops happen on trying gpt4 if you do not have access to the model or use an invalid api key. These seem to be caused by assuming any other error is a rate-limiting message.

I tried to fix the token limit loop by:

if "Please reduce your prompt" in str(e) or "This model's maximum context length" in str(e):

but it can still crash due to this line, probably since the sum of the generated samples is too long to fit.

if kwargs["max_tokens"] == 0:
   raise e

GPT4 rate limit

Hi, I am trying to evaluate our model output using alpaca_eval.

Here is the command:

export OPENAI_API_KEY="sk-xxxxxx"
alpaca_eval --model_outputs 'output/alpaca_eval/outputs.json'

Problem: While running, there are always "Rate limit reached" error.

NFO:openai:error_code=rate_limit_exceeded error_message='Rate limit reached for default-gpt-4 in organization org-0Ibh47ogWcbeM3DJsyr3EC29 on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues.' error_param=None error_type=tokens message='OpenAI API error received' stream_error=False
WARNING:root:OpenAIError: Rate limit reached for default-gpt-4 in organization org-0Ibh47ogWcbeM3DJsyr3EC29 on tokens per min. Limit: 40000 / min. Please try again in 1ms. Contact us through our help center at help.openai.com if you continue to have issues..
WARNING:root:Hit request rate limit; retrying...

Question:

  1. How could I control the GPT4 access rate (e.g. add some delay in the code)?
  2. After evaluation, I got n_total = 800, does this mean that there are 5 tests failed? Are these failures related to GPT4 rate limit error?

API call fails with long outputs

When the evaluated model outputs long responses, the evaluation API call will fail and keep retrying. Consider counting tokens with tiktoken and truncating the trailing X tokens of the evaluated model to reduce the total length to 8192 tokens.

WARNING:root:Unknown error This model's maximum context length is 8192 tokens. However, your messages resulted in 8344 tokens. Please reduce the length of the messages..

code review

  • code review. Doesn't have to be extremely thorough, but let's make sure there are no big issues / things that could be greatly simplified
  • documentation review. You should test most important commands using the documentation, and update documentation if unclear.

TypeError when trying to run alpaca_eval

Using python 3.9. Getting the following error when trying to run alpaca_eval with any model

alpaca_eval evaluate_from_model --model_configs 'chatgpt' --annotators_config 'alpaca_eval_gpt4'

Traceback (most recent call last): File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/bin/alpaca_eval", line 8, in <module> sys.exit(main()) File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/lib/python3.9/site-packages/alpaca_eval/main.py", line 468, in main fire.Fire(ALL_FUNCTIONS) File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/lib/python3.9/site-packages/alpaca_eval/main.py", line 214, in evaluate_from_model evaluation_dataset = utils.load_or_convert_to_dataframe(evaluation_dataset) File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/lib/python3.9/site-packages/alpaca_eval/utils.py", line 264, in load_or_convert_to_dataframe if isinstance(df, AnyPath): File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/lib/python3.9/typing.py", line 720, in __instancecheck__ return self.__subclasscheck__(type(obj)) File "/home/makeshn/ssd1/miniconda3/envs/alpaca_eval/lib/python3.9/typing.py", line 723, in __subclasscheck__ raise TypeError("Subscripted generics cannot be used with" TypeError: Subscripted generics cannot be used with class and instance checks

Dataset 'tatsu-lab/alpaca_eval' doesn't exist on the Hub

Hello,

I'm trying to evaluate my model using alpaca_eval, but I'm getting an error when loading the evaluation set. Dataset 'tatsu-lab/alpaca_eval' doesn't exist on the Hub. I checked the tatsu-lab huggingface account and it is indeed not available. Can you please advise?

Best regards,
Hani

Use chatGPT as baseline?

the number are getting close to 100% win rate we should consider recalibrating win rates by comparing to chatGPT

add leaderboard of base models

Not needed for monday.

  • simple script to perform SFT
  • create leaderboard of models

benefit is that new base models will directly be evaluated on our leaderboard => no work for us to do

Why evaluate_from_model run so slow on my side

I am running with 8 A40 GPU and I think it should be fast. I set up the environment and run alpaca_eval evaluate_from_model --model_configs 'robin-v2-7b' --annotators_config 'claude' and alpaca_eval evaluate_from_model --model_configs 'robin-v2-7b' --annotators_config 'alpaca_eval_gpt4' but it takes a few days.
Also it is surprising that I didn't provide any API key but it still runs. Why is it? Thank you so much for you help!

add configs for all models we tested

  • add configs of models we tested
  • add prompts of models we tested
  • add comments or mini readme in the config

this will be all the verified models in the leaderboard.

Question about 805 eval examples

Hi, thanks for your excellent work, especially the great efforts on designing and evaluating the evaluators.

Compared to these well-designed details, the 805 eval instructions does not seem to have much explanation. The alpacafarm paper only provides the root verb distribution and the source of the instructions. I would like to ask if the topics of these instructions are carefully selected, such as whether they cover mathematics, coding, reasoning, etc., and could you provide some principles on building the eval instructions set?

Thanks a lot!

Make pypy package and test it

would be good to test a couple of commands from the documentation, like this you can update the documentation if something is unclear

chatgpt_fn returned json parsing error

It seems that the Open AI updated their returned results for ChatGPT queries, when server overloaded (or exceeding query limit)? I am now getting a json parsing error after completing partial annotations with chatgpt_fn.

Error traceback attached below.

INFO:root:Creating the annotator from chatgpt_fn.
INFO:root:Saving annotations to /home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/evaluators_configs/chatgpt_fn/annotations_seed0_configs.json.
INFO:root:Loading all annotations from /home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/evaluators_configs/chatgpt_fn/annotations_seed0_configs.json.
WARNING:root:The length of outputs before and after merge are not the same. We have len(outputs_1)==
805, len(outputs_2)==657, and len(df_annotated)==657.
This means that there are missing examples or duplicates. We are taking a SQL inner join.

INFO:root:Annotating 640 examples with chatgpt_fn
INFO:root:Using openai_completions on 640 prompts using gpt-3.5-turbo-16k-0613.
INFO:root:Kwargs to completion: {'max_tokens': 50, 'temperature': 0, 'function_call': {'name': 'print_best_model'}, 'functions': [{'name': 'print_best_model', 'description': 'Print the best model given the preferred output.', 'parameters': {'type': 'object', 'properties': {'best_output': {'type': 'string', 'description': "Name of the best output, should be 'Output (a)' or 'Output (b)'"}}}, 'required': ['best_output']}]}
INFO:root:Kwargs to completion: {'n': 1, 'model': 'gpt-3.5-turbo-16k-0613', 'is_chat': True, 'max_tokens': 50, 'temperature': 0, 'function_call': {'name': 'print_best_model'}, 'functions': [{'name': 'print_best_model', 'description': 'Print the best model given the preferred output.', 'parameters': {'type': 'object', 'properties': {'best_output': {'type': 'string', 'description': "Name of the best output, should be 'Output (a)' or 'Output (b)'"}}}, 'required': ['best_output']}]}
prompt_batches: 15%|████████████████████▌ | 99/640 [00:12<01:08, 7.90it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/decoders/openai.py", line 205, in _openai_completion_helper
all_args = json.loads(choice.message.function_call.arguments)

File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/bin/alpaca_eval", line 8, in
sys.exit(main())
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/main.py", line 483, in main
fire.Fire(evaluate)
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/main.py", line 126, in evaluate
annotations = annotator.annotate_head2head(
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/annotators/pairwise_evaluator.py", line 316, in annotate_head2head
out = self.annotate_pairs(df_to_annotate, **decoding_kwargs)
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/annotators/pairwise_evaluator.py", line 346, in annotate_pairs
df_annotated = self._annotate(df_to_annotate, **decoding_kwargs)
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/annotators/pairwise_evaluator.py", line 437, in _annotate
curr_annotated = self.annotators[annotator](df_annotated.loc[curr_idcs, self.all_keys], **decoding_kwargs)
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/annotators/pairwise_evaluator.py", line 676, in call
completions = self.fn_completions(prompts=prompts, **self.completions_kwargs, **decoding_kwargs)
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/alpaca_eval/decoders/openai.py", line 140, in openai_completions
completions = list(
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/home/liu/.conda/envs/hao_alpaca_eval_py310/lib/python3.10/multiprocessing/pool.py", line 873, in next
raise value
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Pandas merge error

Trying to test the eval script:

alpaca_eval --model_outputs 'example/outputs.json'

I tried with python3.10 and python3.11, and got the same error like this:

  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/bin/alpaca_eval", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/alpaca_eval/main.py", line 483
, in main
    fire.Fire(evaluate)
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/fire/core.py", line 141, in Fi
re
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/fire/core.py", line 475, in _F
ire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/fire/core.py", line 691, in _C
allAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/alpaca_eval/main.py", line 126
, in evaluate
    annotations = annotator.annotate_head2head(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/alpaca_eval/annotators/pairwis
e_evaluator.py", line 316, in annotate_head2head
    out = self.annotate_pairs(df_to_annotate, **decoding_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/alpaca_eval/annotators/pairwise_evaluator.py", line 344, in annotate_pairs
    df_to_annotate = self._preprocess(to_annotate)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/alpaca_eval/annotators/pairwise_evaluator.py", line 387, in _preprocess
    df_to_annotate = self._merge_annotations(df_to_annotate, self.df_annotations)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/alpaca_eval/annotators/pairwise_evaluator.py", line 533, in _merge_annotations
    df_to_annotate = df_to_annotate.merge(
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/pandas/core/frame.py", line 9843, in merge
    return merge(
           ^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 148, in merge
    op = _MergeOperation(
         ^^^^^^^^^^^^^^^^
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 741, in __init__
    self._maybe_coerce_merge_keys()
  File "/export/home/cxia/congyingxia-scratchpad/alpaca_eval/envs/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 1401, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Add documentation / process for contributing

  • discord
  • README (outputs / number)

Ideally, we would have the outputs but I don't want the package to be too large so I don't want the package to have the outputs. One possibility is to make it optional but say that outputs need to be on hugging face hub. Then in the leaderboard we say whether the data is there, which would make the results more believable

add analysis of eval set

use pairwise ttest on ranking on all leaderboards from the dataset

  • heatmap plotting
  • compute the number of samples to get statistical significance at a given rate
  • return mean and max p value

Falcon support for generation

Running

alpaca_eval evaluate_from_model --model_configs 'falcon-7b-instruct'

Gives the following warning

The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

It then proceeds with generation. We need to investigate if this is a bug or not. We should probably just rewrite the inference code to not use the HF generation pipeline and roll our own loop.

llama 13b as evaluate_from_model does not use GPU

The code is running on 1 * A100, I am using nvidia-smi to check and find that the GPU is not used. After running alpaca_eval evaluate_from_model --model_outputs $PWD/qa.json --annotators_config 'alpaca_eval_gpt4' --model_configs $PWD/llama for a few minutes reports an error. May I ask what the cause is? Where am I not set up properly ?

ded_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/root/miniconda3/envs/eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1035, in unk_token
    return str(self._unk_token)
RecursionError: maximum recursion depth exceeded while calling a Python object

Run command

alpaca_eval evaluate_from_model --model_outputs $PWD/qa.json --annotators_config 'alpaca_eval_gpt4' --model_configs $PWD/llama

model config

# configs.yaml
knowlm-13b:
  prompt_template: "/root/model-eval/llama/prompt.txt"
  fn_completions: "huggingface_local_completions"
  completions_kwargs:
    model_name: "/root/.cache/LLAMA/" # LLAMA 13B
    model_kwargs:
      torch_dtype: 'float32'
    max_new_tokens: 2000
    temperature: 0.7
    top_p: 1.0
    do_sample: True
  pretty_name: "LLAMA 13B"
  link: "https://example.com/"

env

> python --version
Python 3.10.11

>>> import torch
>>> print(torch.cuda.is_available())
True
>>> print(torch.__version__)
2.0.1+cu117

> tree /root/.cache/LLAMA/
|-- config.json
|-- generation_config.json
|-- model-00002-of-00006.safetensors
|-- pytorch_model-00001-of-00006.bin
|-- pytorch_model-00002-of-00006.bin
|-- pytorch_model-00003-of-00006.bin
|-- pytorch_model-00004-of-00006.bin
|-- pytorch_model-00005-of-00006.bin
|-- pytorch_model-00006-of-00006.bin
|-- pytorch_model.bin.index.json
|-- special_tokens_map.json
|-- tokenizer.model
|-- tokenizer_config.json

Separate files per provider?

hey @YannDubs - we should have palm-2-chat-bison in by the end of this week, will add it then.

Y'all have an interesting approach of an individual file per provider
image

Why do it this way - vs. making the completion call inside the init.py?

I also noticed you're calculating cost per provider <- why is that?

Strange prompt(s)

  1. Investigating some results, I came across this prompt, which is very strange.
    It is formatted as a conversation, with no particular instruction about it.
  {
    "instruction":"User : Hi dear \nAgent : helo , can ai help you \nUser : plaes tell me about what frequency conscious of ai\nAgent : the conscious ai is a complex actually need specific information to provide about conscious.\nUser : pleas tell me more about conscious if you know please let me know\nAgent : conscious is about made some result or decision is to hard if not have knowledge or prove with science.\nUser : Please tell more about that.\nAgent : need a data scientist to provide decision because a different conscious human and artificial intelligence.",
    "output":"The conscious AI requires data scientists to make decisions because the conscious of humans and artificial intelligence are different. This requires extensive knowledge and proof from science in order to make the correct decisions or achieve the desired results.",
    "generator":"text_davinci_003",
    "dataset":"oasst"
  },
  1. Can you elaborate on the prompt "templates" which use "Instruction: ... Output: ..." on models that are already instruction-finetuned? What is the best way to just have them use instructions.

[DOC] Update Anthropic docstring

Anthropic changed their python sdk - making this code line outdated.

Additional kwargs to pass to `anthropic.Client.completion`.


Would love to know if this might help - https://github.com/BerriAI/litellm

~Simple I/O library, that standardizes all the llm api calls to the OpenAI call

from litellm import completion

## set ENV variables
# ENV variables can be set in .env file, too. Example in .env.example
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["ANTHROPIC_API_KEY"] = "anthropic key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# anthropic call
response = completion("claude-v-2", messages)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.