Code Monkey home page Code Monkey logo

gorilla's Introduction

Gorilla: Large Language Model Connected with Massive APIs [Project Website]

πŸš’ GoEx: A Runtime for executing LLM generated actions like code & API calls GoEx presents β€œundo” and β€œdamage confinement” abstractions for mitigating the risk of unintended actions taken in LLM-powered systems. Release blog Paper.

πŸŽ‰ Berkeley Function Calling Leaderboard How do models stack up for function calling? 🎯 Releasing the Berkeley Function Calling Leaderboard. Read more in our Release Blog.

πŸ† Gorilla OpenFunctions v2 Sets new SoTA for open-source LLMs πŸ’ͺ On-par with GPT-4 πŸ™Œ Supports more languages πŸ‘Œ Blog.

πŸ”₯ Gorilla OpenFunctions is a drop-in alternative for function calling! Release Blog

🟒 Gorilla is Apache 2.0 With Gorilla being fine-tuned on MPT, and Falcon, you can use Gorilla commercially with no obligations! β›³

πŸš€ Try Gorilla in 60s Colab

πŸ’» Use Gorilla in your CLI with pip install gorilla-cli

πŸ“  Checkout our blogs for all things tools-use/function-calling!

πŸ—žοΈ Checkout our paper! arXiv

πŸ‘‹ Join our Discord! Discord

Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them! Hop on our Discord, or open a PR, or email us if you would like to have your API incorporated as well.

News

  • ⏰: [04/01] Introducing cost and latency metrics into Berkeley function calling leaderboard!
  • πŸš€ [03/15] RAFT: Adapting Language Model to Domain Specific RAG is live! [MSFT-Meta blog] [Berkeley Blog]
  • πŸ† [02/26] Berkeley Function Calling Leaderboard is live!
  • 🎯 [02/25] OpenFunctions v2 sets new SoTA for open-source LLMs!
  • πŸ”₯ [11/16] Excited to release Gorilla OpenFunctions
  • πŸ’» [06/29] Released gorilla-cli, LLMs for your CLI!
  • 🟒 [06/06] Released Commercially usable, Apache 2.0 licensed Gorilla models
  • πŸš€ [05/30] Provided the CLI interface to chat with Gorilla!
  • πŸš€ [05/28] Released Torch Hub and TensorFlow Hub Models!
  • πŸš€ [05/27] Released the first Gorilla model! Colab or πŸ€—!
  • πŸ”₯ [05/27] We released the APIZoo contribution guide for community API contributions!
  • πŸ”₯ [05/25] We release the APIBench dataset and the evaluation code of Gorilla!

Gorilla Gradio

Try Gorilla LLM models in HF Spaces or Gradio Colab gorilla_webUI_2

Get Started

Inference: Run Gorilla locally inference/README.md

Evaluation: We have included prompts and responses for the APIBench with and without retrievers along with the Abstract Syntax Tree (AST) matching evaluation script at evaluation.

Repository Organization

Our repository organization is shown below.

  • The berkeley-function-call-leaderboard folder contains scripts for evaluating function-calling ability of models.
  • The data folder contains all the evaluation APIs (APIBench) and the community contributed APIs.
  • The eval folder contains all our evaluation code as well as the Gorilla outputs.
  • The inference folder contains all the inference code for running Gorilla locally.
  • The openfunctions folder contains the inference code for the OpenFunctions model(s).

For our dataset collections, all the 1640 API documentation is in data/api. We also include the APIBench dataset created by self-instruct in data/apibench. For evaluation, we convert this into a LLM-friendly chat format, and the questions are in eval/eval-data/questions, and the corresponding responses are in eval/eval-data/responses. We have also included the evaluation scripts are in eval/eval-scripts. This would be entirely sufficient to train Gorilla yourself, and reproduce our results. Please see evaluation for the details on how to use our evaluation pipeline.

Additionally, we have released all the model weights. gorilla-7b-hf-v0 lets you invoke over 925 Hugging Face APIs. Similarly, gorilla-7b-tf-v0 and gorilla-7b-th-v0 have 626 (exhaustive) Tensorflow v2, and 94 (exhaustive) Torch Hub APIs. gorilla-mpt-7b-hf-v0 and gorilla-falcon-7b-hf-v0 are Apache 2.0 licensed models (commercially usable) fine-tuned on MPT-7B and Falcon-7B respectively. We will release a model with all three combined with generic chat capability and community contributed APIs as soon as we can scale our serving infrastructure. You can run Gorilla locally from instructions in the inference/ sub-directory, or we also provide a hosted Gorilla chat completion API (see Colab)! If you have any suggestions, or if you run into any issues please feel free to reach out to us either through Discord or email or raise a Github issue.

gorilla
|-- berkeley-function-call-leaderboard (data and scripts to eval model's function-calling ability)
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ api (TF/HF/TH APIs used in generating apibench)
β”‚   β”‚   β”œβ”€β”€ {api_name}_api.jsonl
β”‚   β”œβ”€β”€ apibench (Evaluating LLM models) v-1.0
β”‚   β”‚   β”œβ”€β”€ {api_name}_train.jsonl, {api_name}_eval.jsonl
|   |── apizoo (Contributed by the community - evolving)
β”‚   |   β”œβ”€β”€ username1.json
β”‚   β”‚   β”œβ”€β”€ username2.json
β”‚   β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ eval
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ get_llm_responses.py
β”‚   β”œβ”€β”€ eval-scripts
β”‚   β”‚   β”œβ”€β”€ ast_eval_{api_name}.py
β”‚   β”œβ”€β”€ eval-data
β”‚   β”‚   β”œβ”€β”€ questions
β”‚   β”‚   β”‚   β”œβ”€β”€ API name
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ questions_{api_name}_{eval_metric}.jsonl
β”‚   β”‚   β”œβ”€β”€ responses
β”‚   β”‚   β”‚   β”œβ”€β”€ API name
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ responses_{api_name}_Gorilla_FT_{eval_metric}.jsonl
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ responses_{api_name}_Gorilla_RT_{eval_metric}.jsonl
β”œβ”€β”€ inference
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ serve
β”‚   β”‚   β”œβ”€β”€ gorilla_cli.py
β”‚   β”‚   β”œβ”€β”€ conv_template.py
β”œβ”€β”€ openfunctions
|   β”œβ”€β”€ openfunctions-v1 (data and scripts for openfunctions-v0 and v1)
|   β”œβ”€β”€ utils (parsing script for openfunctions-v2)
|   β”œβ”€β”€ inference_* (openfunctions-v2 hosted/local inference code)

Contributing Your API

We aim to build an open-source, one-stop-shop for all APIs, LLMs can interact with! Any suggestions and contributions are welcome! Please see the details on how to contribute. THIS WILL ALWAYS REMAIN OPEN SOURCE.

FAQ(s)

  1. I would like to use Gorilla commercially. Is there going to be a Apache 2.0 licensed version?

Yes! We now have models that you can use commercially without any obligations.

  1. Can we use Gorilla with other tools like Langchain etc?

Absolutely! You've highlighted a great aspect of our tools. Gorilla is an end-to-end model, specifically tailored to serve correct API calls (tools) without requiring any additional coding. It's designed to work as part of a wider ecosystem and can be flexibly integrated within agentic frameworks and other tools.

Langchain, is a versatile developer tool. Its "agents" can efficiently swap in any LLM, Gorilla included, making it a highly adaptable solution for various needs.

The beauty of these tools truly shines when they collaborate, complementing each other's strengths and capabilities to create an even more powerful and comprehensive solution. This is where your contribution can make a difference. We enthusiastically welcome any inputs to further refine and enhance these tools.

Check out our blog on How to Use Gorilla: A Step-by-Step Walkthrough to see all the different ways you can integrate Gorilla in your projects.

Project Roadmap

In the immediate future, we plan to release the following:

  • BFCL metrics to evaluate contamination
  • BFCL systems metrics including cost and latency
  • BFCL update with "live" data and user-votes
  • Openfunctions-v3 model to support more languages and multi-turn capability
  • Berkeley Function Calling leaderboard (BFCL) for evaluating tool-calling/function-calling models [Feb 26, 2024]
  • Openfunctions-v2 with more languages (Java, JS, Python), relevance detection [Feb 26, 2024]
  • API Zoo Index for easy access to all APIs [Feb 16, 2024]
  • Openfunctions-v1, Apache 2.0, with parallel and multiple function calling [Nov 16, 2023]
  • Openfunctions-v0, Apache 2.0 function calling model [Nov 16, 2023]
  • Release a commercially usable, Apache 2.0 licensed Gorilla model [Jun 5, 2023]
  • Release weights for all APIs from APIBench [May 28, 2023]
  • Run Gorilla LLM locally [May 28, 2023]
  • Release weights for HF model APIs [May 27, 2023]
  • Hosted Gorilla LLM chat for HF model APIs [May 27, 2023]
  • Opening up the APIZoo for contributions from community
  • Dataset and Eval Code

Propose a new task you would like to work on 🀩

Citation

If you use Gorilla or APIBench, please cite our paper:

@article{patil2023gorilla,
  title={Gorilla: Large Language Model Connected with Massive APIs},
  author={Shishir G. Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez},
  year={2023},
  journal={arXiv preprint arXiv:2305.15334},
} 

gorilla's People

Contributors

amiraflak avatar aryanvichare avatar benjaminhuo avatar cedricvidal avatar charliejcj avatar dangeo773 avatar danielfleischer avatar danielskry avatar eitanturok avatar eltociear avatar fanjia-yan avatar hannesgith avatar huanzhimao avatar jasonzhu1313 avatar joedevon avatar kaiwen129 avatar meenakshi-mittal avatar morganmcg1 avatar mzamini92 avatar noppapon avatar rajveer43 avatar ramanv0 avatar ricklamers avatar royh02 avatar saikolasani avatar shawnharmsen avatar shishirpatil avatar tanmaydoesai avatar tianjunz avatar viniciuslazzari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gorilla's Issues

The bm25 and gpt-index scripts ?

          For the different retrievers, we use bm25 (https://en.wikipedia.org/wiki/Okapi_BM25), gpt-index simply uses `Davinci v1` from OpenAI to embed all the documents and do simple cosine similarity match during the inference time. For oracle, we just provided the golden truth answer to Gorilla. Hope this helps and let me know if there are any further questions!

Originally posted by @tianjunz in #21 (comment)

Would you be willing to release the bm25 and gpt-index scripts to help the community reproduce the experimental results?

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: What is the ...

what are the document retrievers mentioned in your paper?

Hi!

thanks for the wonderful work! During reading your paper, I'm confused about the document retrievers mentioned in your paper. You mentioned several of them, such as gpt and oracle. I cannot find more specific reference or hyperlinks in your paper. I'm wondering where can I find websites or illustrations of these retrievers?

Thank you.

Encountered 1 file(s) that may not have been copied correctly on Windows

I encounter this problem downloading model weights. Seems weights larger than 4 GB are not correctly handled on Windows. Do you upload the models from windows system?

root@4bd793bb2ded:/workspace/gorilla# git lfs install
Updated git hooks.
Git LFS initialized.

root@4bd793bb2ded:/workspace/gorilla# git clone https://huggingface.co/gorilla-llm/gorilla-mpt-7b-hf-v0
Cloning into 'gorilla-mpt-7b-hf-v0'...
remote: Enumerating objects: 35, done.
remote: Counting objects: 100% (35/35), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 35 (delta 5), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (35/35), 621.68 KiB | 1.84 MiB/s, done.
Filtering content: 100% (2/2), 4.38 GiB | 57.36 MiB/s, done.
Encountered 1 file(s) that may not have been copied correctly on Windows:
        pytorch_model-00001-of-00002.bin

See: `git lfs help smudge` for more details.
root@4bd793bb2ded:/workspace/gorilla/gorilla-mpt-7b-hf-v0# ls -al
total 12989212
drwxr-xr-x 3 root root       4096 Jun  7 00:17 .
drwxr-xr-x 8 root root        161 Jun  7 00:16 ..
drwxr-xr-x 9 root root        174 Jun  7 00:18 .git
-rw-r--r-- 1 root root       1477 Jun  7 00:16 .gitattributes
-rw-r--r-- 1 root root       2068 Jun  7 00:16 README.md
-rw-r--r-- 1 root root       1752 Jun  7 00:16 adapt_tokenizer.py
-rw-r--r-- 1 root root      16818 Jun  7 00:16 attention.py
-rw-r--r-- 1 root root       2493 Jun  7 00:16 blocks.py
-rw-r--r-- 1 root root       1284 Jun  7 00:16 config.json
-rw-r--r-- 1 root root       9080 Jun  7 00:16 configuration_mpt.py
-rw-r--r-- 1 root root      28182 Jun  7 00:16 flash_attn_triton.py
-rw-r--r-- 1 root root        112 Jun  7 00:16 generation_config.json
-rw-r--r-- 1 root root      27219 Jun  7 00:16 hf_prefixlm_converter.py
-rw-r--r-- 1 root root       3639 Jun  7 00:16 meta_init_context.py
-rw-r--r-- 1 root root      17406 Jun  7 00:16 modeling_mpt.py
-rw-r--r-- 1 root root       2563 Jun  7 00:16 norm.py
-rw-r--r-- 1 root root      12558 Jun  7 00:16 param_init_fns.py
-rw-r--r-- 1 root root 9943040275 Jun  7 00:18 pytorch_model-00001-of-00002.bin
-rw-r--r-- 1 root root 3355599187 Jun  7 00:17 pytorch_model-00002-of-00002.bin
-rw-r--r-- 1 root root      16023 Jun  7 00:16 pytorch_model.bin.index.json
-rw-r--r-- 1 root root        129 Jun  7 00:16 special_tokens_map.json
-rw-r--r-- 1 root root    2113738 Jun  7 00:16 tokenizer.json
-rw-r--r-- 1 root root        264 Jun  7 00:16 tokenizer_config.json

deploying to replicate

Describe the solution you'd like
I would love to see a model of Gorilla hosted to Replicate, it would be nice to be able to utilize their API and hosting.
Additional context
Had a blast playing with the colab

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0ec18dabf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2302 tokens (1790 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

Is there any way to just cut the completion / request to the first 2048 tokens?

[feature] FOOM detection.

This seems like the sort of project that could accidentally produce a self-improving superhuman system. Does anyone on the project have an understanding of AI Alignment? Are there efforts to measure the potential for systems built with gorilla to FOOM?

Augmenting additional API to the Gorilla-LLm

I hope you are doing well, a great thanks for this work.
Is it possible to add additional APIs(private APIs) to Gorilla? We have a large database of APIs and we need to add them to Gorilla, How can we do this? Should we fine-tune the Gorilla LLM? or something like this?

eval resutls

Hi, thanks for your excellent work.

I ran the eval-scrip

python ast_eval_th.py --api_dataset ../../data/api/torchhub_api.jsonl --apibench ../../data/apibench/torchhub_eval.json --llm_responses ../eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl

and get the results:

Final Functionality accuracy:  0.7580645161290323
Final hallucination:  0.16129032258064516

I find these results are inconsistent with the results reported in the paper.

image

I would like to ask where I got it wrong.

Thanks.

Gorilla Self-Hosted

Hi,

is it also possible to self host Gorilla with an API that is compatible with the OpenAI chat completion API?
So essentially the same as depicted in the Colab?

De-duplicate APIBench eval data (?)

The evaluation data for APIBench is duplicated between data/apibench/*_eval.json and eval/eval-data/questions/. I think the only difference is formatting. Maybe we should just keep the eval/eval-data/responses and have data/apibench for only data used to train the model.

Initially we made two copies with the following rationale:
apibench should have all the data self-contained, which the community is using to train/benchmark their LLMs.
eval/ would have the eval data in a format that would be easy to eyeball and understand what is going on.

Maybe this is one of those few cases where it might be ok to have the same data twice in the repository in different formats?

Starting this issue in case anyone has comments on this.

The provided response file test results are not consistent with the paper[bug] Hosted Gorilla: <Issue>

Describe the bug

We use the file /eval/eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl and then use the code /eval/eval-scripts/ast_eval_th.py to calculate the metrics The final calculated result is Final Functionality accuracy: 75.80 Final hallucination: 16.12, which is the same as the final Functionality accuracy of zero-shot of torchhub published in Table1 of the paper. 59.13 Final hallucination: 6.98 is a big difference

To Reproduce
Steps to reproduce the behavior:

  1. We use the file /eval/eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl and then use the code /eval/eval-scripts/ast_eval_th.py to calculate the metrics

Screenshots
None

Proposed Solution
None

Additional context
We would like to know why there is a large discrepancy with the original published results, whether it is because an update was made or we compared the wrong table.

[feature] Run gorilla locally without GPUs 🦍

Today, Gorilla end-points run on UC Berkeley hosted servers 🐻 When you try our colab, or our chat completion API, or the CLI tool, it hits our GPUs for inference. A popular ask among our users is to run Gorilla locally on Macbooks/Linux/WSL.

Describe the solution you'd like:
Have the model(s) running locally on MPS/CPU/GPU and listening to a port. All the current gorilla end-points can then just hit localhost to get the response to any given prompt.

Additional context:
Here is an application that would immediately use it: https://github.com/gorilla-llm/gorilla-cli
Given, we have LLaMA models, these should be plug-and-play: ggerganov/llama.cpp and karpathy/llama2.c
Also relevant: https://huggingface.co/TheBloke/gorilla-7B-GPTQ

Update 1: If you happen to have an RTX, or V100 or A100 or H100, you can use Gorilla today without any latency hit. The goal of this enhancement is to help those who may not have access to and greatest GPUs.

License?

Hello, thanks for making your work available! Have you chosen a license yet?

Train with mpt 8k

Is the feature request related to a problem?

Would it be expensive to train with mpt 8k? Can you provide an mpt 8k model?

Describe the solution you'd like
When I run gorilla, I want to see an 8k context window.

Prefer to keep Apache 2 licensing.

Additional context
Add any other context or screenshots about the feature request here.

https://huggingface.co/mosaicml/mpt-7b-8k

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f455ea077f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f912081fb50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f4f57fc10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

Leveraging Llama 2

I don’t see any existing discussion about leveraging Meta’s new Llama 2 model. Curious if you guys have any plans in the making for using this new base model in gorilla.

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa8bdf53c40>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

When applying these deltas to these base weights I get the following error:

$ python apply_delta.py --base-model-path ../../llama-7b-hf/ --target-model-path ../../gorilla-7b-hf-v0/ --delta-path ../../gorilla-7b-hf-delta-v0/
Loading the delta weights from ../../gorilla-7b-hf-delta-v0/
Traceback (most recent call last):
  File "/home/paperspace/projects/gorilla/gorilla/inference/apply_delta.py", line 167, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/home/paperspace/projects/gorilla/gorilla/inference/apply_delta.py", line 129, in apply_delta
    delta_tokenizer = AutoTokenizer.from_pretrained(delta_path, use_fast=False)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
    return cls._from_pretrained(
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/paperspace/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/home/paperspace/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 

Specs:

$ nvidia-smi
Thu Jun  1 17:50:22 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro M4000        Off  | 00000000:00:05.0  On |                  N/A |
| 46%   32C    P8    16W / 120W |    189MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1532      G   /usr/lib/xorg/Xorg                121MiB |
|    0   N/A  N/A      2011      G   /usr/bin/gnome-shell               59MiB |
|    0   N/A  N/A      2571      G   ...bexec/gnome-initial-setup        2MiB |
+-----------------------------------------------------------------------------+
$ LC_ALL=C lspci -v | grep -EA10 "3D|VGA" | grep 'prefetchable' 
	Memory at f4000000 (32-bit, prefetchable) [size=8M]
	Memory at f3000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0000000 (64-bit, prefetchable) [size=32M]
$ free -h
              total        used        free      shared  buff/cache   available
Mem:           29Gi       1.2Gi       5.6Gi        13Mi        22Gi        27Gi
Swap:            0B          0B          0B

load-8bit flag doesn't work

Describe the issue
When I use the --load-8bit flag it's returning a load_compress_model that's not imported anywhere (and for that reason -I guess- it's failing?).

Any ideas on how to go about this issue? I've searched for this obj in the code itself and in hugging face's API but couldn't find it, so I'm kind of clueless on what to do.

I'm running this on a one GPU machine. It's an old T420 with archlinux.

Thanks!

[bug] Testing Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7bba974da140>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-tf-v0, for prompt: I want to build a robot that can detecting objects in an image

The returned results show garbled content?

The running command used is:
python3 serve/gorilla_cli.py --model-path model/gorilla-7b-th-v0/

But the returned results show garbled content
image

How did this problem arise and how should it be resolved?

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(CUDA error: uncorrectable ECC error encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n)","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: I would like to translate 'I feel very good today.' from English to Chinese

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: I would like to translate 'I feel very good today.' from English to Chinese

How to run this project?

Describe the issue

I saw the scene described in the video, which seems to be running on the command line and obtaining API access methods through dialogue. But I didn't find where to run it to get such results. Do I need to train first or do I need to run a specific Python file? Please advise..

GPT4 cutoff date is September 2021 - how did this impact evals?

Any new API info would not be in GPT4 training.

How much impact do you think this has with respect to relative performance between GPT4 and Gorilla?

Did you do any eval on APIs that existed prior to 09/21 versus those after?

I reviewed the paper but could not find any discussion on this. https://arxiv.org/abs/2305.15334

To be clear, I am not saying this invalidates the ideas, which I think were a fantastic contribution to OS LLMs, but rather that it would be good to understand the precise reason for the superior performance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.