Code Monkey home page Code Monkey logo

Comments (12)

letmefocus avatar letmefocus commented on July 24, 2024

By the way, function calling supports both prompt turn and chat turn models.

from m.i.l.e.s.

small-cactus avatar small-cactus commented on July 24, 2024

from m.i.l.e.s.

letmefocus avatar letmefocus commented on July 24, 2024

If it's not possible I'd be willing to open a different fork and help out with the code for it.
I'll try provide a quick sample of the function definitions.

from m.i.l.e.s.

letmefocus avatar letmefocus commented on July 24, 2024

From what it looks like using both the Curl and Python implementations, you can implement the functions as an array. If not possible using the existing package in python, you can always write a separate package for Gemini based on how it works with the Curl implementation.

https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/function-calling#curl_1
https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/function-calling#python_1

From the Curl method it's similar between the python function for the OpenAI API and the Gemini API:

OpenAI:

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-0125",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [{
        "text": "What is the weather in Boston?"
      }]
    }],
    "tools": [{
      "function_declarations": [
        {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              }
            },
            "required": [
              "location"
            ]
          }
        }
      ]
    }]
  }'

Reference for the OpenAI Function Calling example:
https://platform.openai.com/docs/guides/function-calling

from m.i.l.e.s.

letmefocus avatar letmefocus commented on July 24, 2024

You could also very possibly use Litellm: https://docs.litellm.ai/docs/providers/gemini

from m.i.l.e.s.

small-cactus avatar small-cactus commented on July 24, 2024

Adding that support would be out of the scope of my skills, I've played around with gemini as a voice assistant and it's not viable, so I don't see value in adding it, there are many reasons, here are some I can name off the top of my head:

All gemini models tend to over explain and don't know when to "shut up". Eg they over explain everything and add extra sentences when not necessary. It'll give you the answer in the first sentence, and then keep talking and talking and talking for like 4 more paragraphs.

Gemini models do not follow instructions very well, they act non deterministic.

Gemini models are trained on VASTLY different training data sets from ChatGPT models, so my entire system prompt would be useless for ideal performance.

Response times are very slow, sometimes it can be fast, but again, they over explain and run time up, a 1 second delay is the difference between a slow reply and thinking that it stopped working.

Believe it or not, OpenAI models are trained to be spoken to, the response format fits perfectly with spoken language, Gemini models are fine in this regard, but it's really not the best. They also don't pay attention to the system instructions enough to understand that I put "You are talking to the user in a voice conversation" into the prompt.

But you are extremely welcome to fork and try to add gemini support, any details you need, I can help with and I will provide them. Here's some stuff to get you started, all tools are located in tools.json, the actual functions to correspond to these arrays are located in main.py randomly within the code (bad I know). The only part you should have to modify is the change AI model function, and "maybe" the change personality function, because it changes the system prompt and I don't know if the format is the same.

Other than that, all openai code (besides webcam recognition) is within the ask function. Let me know if you need anything else!

from m.i.l.e.s.

letmefocus avatar letmefocus commented on July 24, 2024

From what I've seen, if you use LiteLLM, it supports using the Gemini model through an OpenAI API structure. It also supports using the OpenAI python package (by changing the model and base URL), and function calling. I haven't tested the function calling using LiteLLM yet, but it looks promising.

from m.i.l.e.s.

letmefocus avatar letmefocus commented on July 24, 2024

https://github.com/BerriAI/litellm/blob/7ffd3d40fa0338f2cb1e7bae9e5b608dde7862ee/model_prices_and_context_window.json#L979

https://docs.litellm.ai/docs/providers/vertex

References are above.

Code sample:

import openai
client = openai.OpenAI(
    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="team1-gemini-pro",
    messages = [
        {
            "role": "user",
            "content": "what llm are you"
        }
    ],
)

print(response)

from m.i.l.e.s.

letmefocus avatar letmefocus commented on July 24, 2024

image
sample image here

from m.i.l.e.s.

letmefocus avatar letmefocus commented on July 24, 2024

Also, if we do go through the plan of integrating gemini, and a "standalone app" or "bundle", I'd be willing to lend out apikeys for Gemini with the LiteLLM proxy, as it supports seeing how many credits a user can use.

from m.i.l.e.s.

letmefocus avatar letmefocus commented on July 24, 2024

image
second example of Gemini working on an OpenAI API proxy using LiteLLM

from m.i.l.e.s.

small-cactus avatar small-cactus commented on July 24, 2024

I think im gonna use groq's API as they have native function calling support using OpenAI's schema for other models

from m.i.l.e.s.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.