Code Monkey home page Code Monkey logo

invoker's Introduction

Invoker

The one who calls upon... Functions!

Invoker is a suite of large language models based on Llama-2 and is finetuned to plan between calling functions and providing responses directly. Currently, we have released the 13B version and there are plans for the 7B and 34B versions to be trained and released in the future.

News

  • [2023/09] We released Invoker-13B-GPTQ, which is a 4-bit quantized GPTQ implementation of Invoker-13B. Download weights. We also added ExllamaV2 integration!
  • [2023/09] We released Invoker-13B, a model trained on function-calling and multi-turn conversation datasets. Download weights

Installation & Usage

The usage of Invoker follows exactly like OpenAI's function calling. Simply install the required dependencies:

pip install -r requirements.txt

Launching the Server

Kick-start the FastAPI server. You can indicate the model details via environment variables. The Invoker server currently supports 2 different ways to load the model. If you would like to load the full fp16 model using HuggingFace transformers, run the following commands:

export INVOKER_MODEL_TYPE=hf
export INVOKER_MODEL_NAME_OR_PATH=jeffrey-fong/invoker-13b
uvicorn server_fastapi:app

If you would like to load 4-bit quantized Invoker GPTQ models using ExLlamaV2, clone the model repository into your local machine. Then, run the following commands:

export INVOKER_MODEL_TYPE=exllamav2
export INVOKER_MODEL_NAME_OR_PATH=path_to_downloaded_invoker-13b-GPTQ-model_dir
uvicorn server_fastapi:app

The full list of models are indicated here.

Inference

Inference can then be performed exactly like OpenAI function-calling. Provide the chat and the functions in the messages and functions arguments respectively. Invoker also supports the following generation hyperparameters:

  • temperature: float = 0.5 Accepts values between 0.0 and 1.0. Defaults to 0.5 if the temperature is not passed in.
  • top_p: float = 1.0 Accepts values between 0.0 and 1.0. Defaults to 1.0 if the top_p is not passed in.
import openai

openai.api_base = "http://localhost:8000"
openai.api_key = "test"

messages = [{"role": "user", "content": "Can you check what is the time in Singapore?"}]
response = openai.ChatCompletion.create(
    model="jeffrey-fong/invoker-13b",
    messages=messages,
    functions=[
        {
            "name": "get_time",
            "description": "Get the current time",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. New York City, NY"
                    },
                    "format": {
                    "type": "string",
                    "enum": ["12-hour", "24-hour"]
                    }
                },
                "required": ["location"]
            }
        }
    ]
)
response_message = response["choices"][0]["message"]

The model can choose to call a function; if so, the content will be a stringified JSON object indicating a function call with the function name and arguments generated by the model (note: the model may generate invalid JSON or hallucinate parameters). To allow the model to summarize the results of the function response, parse the string into JSON in your code, and call your function with the provided arguments if they exist. Perform another inference with the model after appending the function response as a new message.

Using the above example again,

if response_message.get("function_call"):
  available_functions = {"get_time": get_time}
  function_name = response_message["function_call"]["name"]
  function_to_call = available_functions[function_name]
  function_args = json.loads(response_message["function_call"]["arguments"])
  function_response = function_to_call(
      location=function_args.get("location"),
      unit=function_args.get("format"),
  )
  messages.append(response_message)
  messages.append(
      {
          "role": "function",
          "name": function_name,
          "content": function_response,
      }
  )
  second_response = openai.ChatCompletion.create(
      model="jeffrey-fong/invoker-13b",
      messages=messages,
  )
  print(second_response["choices"][0]["message"])

Refer to the example client code here for a more detailed example.

Using the model directly

Please refer to the model card in HuggingFace to see how to use the model directly, including the prompt format, etc.

Model Download

Model Link Version
Invoker-13B Huggingface Repo v1.0
Invoker-13B-GPTQ Huggingface Repo v1.0
Invoker-7B Coming Soon v1.0
Invoker-34B Coming Soon v1.0

Training

Training was performed using QLora which significantly reduces the computational resources required to train the models. Similar to FastChat, we only consider the gradients for the assistant responses when computing the loss for backpropagation and ignore all other outputs and responses.

We accelerated training with DeepSpeed Zero Stage 2 for fast data parallelism. QLora is currently not compatible with DeepSpeed Zero Stage 3 which shards the model into multiple GPUs.

Training code will released in the future.

Training hyperparameters

Hyperparameter Value
Total batch size 192
Epochs 2
Learning rate 2e-05
Lora r 64
Lora alpha 16
Lora dropout 0.05
Weight decay 0.0
Warmup ratio 0.03

Training Data

We use a variety of sources when building our training dataset. All the datasets are carefully chosen to improve both the conversational and function-calling capability of the model.

  • ToolBench (0830 updated) ToolBench is an open-source, large-scale and high quality instruction tuning SFT dataset to facilitate the training of LLMs with general tool-use capability. It consists of multi-turn conversations where the assistant, who is presented with several potential functions to call, will call one or multiple functions before returning its response to the user. We had undergone rigorous cleaning of the data where we

    1. Removed all datapoints that do not end with the assistant returning a summarized response
    2. Cleaned datapoints with unnecessary calls to the same function
    3. Changed all function names and descriptions to not include the domain name, so the functions feels more generic
  • ShareGPT-34K ShareGPT-34K is a filtered dataset containing high quality multi-turn conversations between a user and an assistant. Some of the assistant responses are generated from OpenAI's GPT-3.5-Turbo while some are from GPT-4.

  • OASST1 OASST1 is a human-generated and human-annotated assistant-style conversation corpus. We filtered out the conversations in English.

All the datasets used are under Apache-2.0 License. Therefore, this dataset will also be under the same license.

To-Dos

  • Quantize 13B model
  • Work on GPTQ-based servers (ExLlama and/or ExLlamaV2)
  • Work on validating function names, descriptions, etc. Just like OpenAI's function calling
  • Converting Invoker to other formats like:
    • GGUF
    • AWQ
  • Train 7B Llama-2 model and 34B CodeLlama model
  • Investigate ways to evaluate function calling

Citation

If this work is helpful, please kindly cite as:

@Misc{invoker-function-calling,
  title = {Invoker: The one who calls upon functions - Function-Calling Language Model},
  author = {jeffrey-fong},
  howpublished = {\url{https://github.com/jeffrey-fong/Invoker}},
  year = {2023}
}

invoker's People

Contributors

jeffrey-fong avatar jeffreymeetkai avatar

Stargazers

Praveen Sridhar avatar Sandalots avatar Ismail Pelaseyed avatar  avatar Arpan Tripathi avatar  avatar  avatar Prasad Chalasani avatar Bander Alsulami avatar  avatar Laeeth Isharc avatar  avatar Pavel Klymenko avatar  avatar Galloway avatar  avatar Bartowski avatar  avatar Scott Ullrich avatar Luc Stepniewski avatar  avatar simpx avatar Sixten Klementsson avatar FelixTang avatar  avatar  avatar Andrew Chung avatar  avatar Freeman avatar Gavin Blair avatar  avatar Moritz avatar Alexander Sniffin avatar Qaziquza avatar Shrikrishna Holla avatar  avatar

Watchers

 avatar

invoker's Issues

APIConnectionError

i am getting APIConnectionError: Error communicating with OpenAI: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x79bdb1a1e2c0>: Failed to establish a new connection: [Errno 111] Connection refused')) issue while running the scripts.
Is there any way to fix it?????

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.