azure-samples / ai-gateway Goto Github PK

APIM ❤️ OpenAI - this repo contains a set of experiments on using GenAI capabilities of Azure API Management with Azure OpenAI and other services

Home Page: https://aka.ms/apim/genai/labs

License: MIT License

Jupyter Notebook 67.89% Bicep 30.69% Python 1.37% Dockerfile 0.03% Jinja 0.02%

apimanagement azure openai openai-api genai

ai-gateway's Issues

Lab 'access-controlling' -- Adding additional scope does not work (only replacing `User.Read`

The instruction says:

Then, copy the full scope (app:///scope) and add it to the scopes array below.

(By the way, markdown will not render the <id> here, so that it shows as "app:///scope" instead of app://<id>/scope...)

However, I tried

flow = app.initiate_device_flow(scopes=["api://<id>/the-scope", "User.Read"])

as well as

flow = app.initiate_device_flow(scopes=["User.Read", "api://<id>/the-scope"])

but both did not work -- no clain roles and the scope in JWT was only:

{
  ...
  "scp": "openid profile User.Read email",
}

Only when I replaced User.Read, the token contained the app roles:

flow = app.initiate_device_flow(scopes=["api://<id>/the-scope"])

{
  ...
  "roles": [
    "OpenAI.ChatCompletion"
  ],
  "scp": "the-scope",
}

Prompt flow OAuth 2

Is it possible to use prompt flow with the OAuth2 pattern so that every user can be authenticated, either a person or a service (or managed service identity)? Thanks

Broken Runbook Content filtering incomplete documentation

The runbook has formatting issues and refers to hardcoded subscriptions and other stuff. I think this needs some refactoring to make it usable:

https://github.com/Azure-Samples/AI-Gateway/blob/main/labs/content-filtering/content-filtering.ipynb

Moc server deploymnet failing with

When I try to run step1 to deploy the moc server in the OpenAI Moc Server playbook, I'm getting the below error:

InprogressInstances: 0, SuccessfulInstances: 0, FailedInstances: 1
Error: Deployment for site 'erezopenaimock2' with DeploymentId 'cb242250-038a-4c1a-80cb-273cf0dbf9a2' failed because the worker proccess failed to start within the allotted time.
Please check the runtime logs for more info: https://erezopenaimock2.scm.azurewebsites.net/api/logs/docker

This is the logs from the container:

2024-08-27T12:44:48.462925550Z _____
2024-08-27T12:44:48.462956150Z / _ \ __________ _________ ____
2024-08-27T12:44:48.462960450Z / /\ \__ / | _ __ _/ __ \
2024-08-27T12:44:48.462963750Z / | / /| | /| | /\ /
2024-08-27T12:44:48.462966750Z _|__ /_____ _/ || __ >
2024-08-27T12:44:48.462970250Z / / /
2024-08-27T12:44:48.462973650Z A P P S E R V I C E O N L I N U X
2024-08-27T12:44:48.462976550Z
2024-08-27T12:44:48.462979150Z Documentation: http://aka.ms/webapp-linux
2024-08-27T12:44:48.462982250Z Python 3.12.2
2024-08-27T12:44:48.462984951Z Note: Any data outside '/home' is not persisted
2024-08-27T12:44:50.211399150Z Starting OpenBSD Secure Shell server: sshd.
2024-08-27T12:44:50.253797088Z WEBSITES_INCLUDE_CLOUD_CERTS is not set to true.
2024-08-27T12:44:50.443725352Z App Command Line not configured, will attempt auto-detect
2024-08-27T12:44:50.443801553Z Launching oryx with: create-script -appPath /home/site/wwwroot -output /opt/startup/startup.sh -virtualEnvName antenv -defaultApp /opt/defaultsite
2024-08-27T12:44:50.774175304Z Found build manifest file at '/home/site/wwwroot/oryx-manifest.toml'. Deserializing it...
2024-08-27T12:44:50.774201404Z Build Operation ID: 3cc43fba11341282
2024-08-27T12:44:50.774206604Z Oryx Version: 0.2.20240501.1, Commit: f83f88d3cfb8bb6d3e2765e1dcd218eb0814a095, ReleaseTagName: 20240501.1
2024-08-27T12:44:50.774222905Z Output is compressed. Extracting it...
2024-08-27T12:44:50.774226705Z Extracting '/home/site/wwwroot/output.tar.gz' to directory '/tmp/8dcc695c99f8d28'...
2024-08-27T12:44:53.520250192Z App path is set to '/tmp/8dcc695c99f8d28'
2024-08-27T12:44:54.941422654Z Detected an app based on Flask
2024-08-27T12:44:54.958470050Z Generating gunicorn command for 'app:app'
2024-08-27T12:44:54.978816464Z Writing output script to '/opt/startup/startup.sh'
2024-08-27T12:44:55.523474434Z Using packages from virtual environment antenv located at /tmp/8dcc695c99f8d28/antenv.
2024-08-27T12:44:55.523504434Z Updated PYTHONPATH to '/opt/startup/app_logs:/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages'
2024-08-27T12:44:57.902603843Z [2024-08-27 12:44:57 +0000] [67] [INFO] Starting gunicorn 23.0.0
2024-08-27T12:44:57.960828381Z [2024-08-27 12:44:57 +0000] [67] [INFO] Listening at: http://0.0.0.0:8000 (67)
2024-08-27T12:44:57.962236089Z [2024-08-27 12:44:57 +0000] [67] [INFO] Using worker: sync
2024-08-27T12:44:57.980802497Z [2024-08-27 12:44:57 +0000] [72] [INFO] Booting worker with pid: 72
2024-08-27T12:45:06.087265001Z [2024-08-27 12:45:06 +0000] [72] [ERROR] Exception in worker process
2024-08-27T12:45:06.087299401Z Traceback (most recent call last):
2024-08-27T12:45:06.087310402Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/gunicorn/arbiter.py", line 608, in spawn_worker
2024-08-27T12:45:06.087323602Z worker.init_process()
2024-08-27T12:45:06.087327202Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/gunicorn/workers/base.py", line 135, in init_process
2024-08-27T12:45:06.087330502Z self.load_wsgi()
2024-08-27T12:45:06.087336802Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/gunicorn/workers/base.py", line 147, in load_wsgi
2024-08-27T12:45:06.087340102Z self.wsgi = self.app.wsgi()
2024-08-27T12:45:06.087343102Z ^^^^^^^^^^^^^^^
2024-08-27T12:45:06.087346102Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/gunicorn/app/base.py", line 66, in wsgi
2024-08-27T12:45:06.087349402Z self.callable = self.load()
2024-08-27T12:45:06.087352602Z ^^^^^^^^^^^
2024-08-27T12:45:06.087355702Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/gunicorn/app/wsgiapp.py", line 57, in load
2024-08-27T12:45:06.087359002Z return self.load_wsgiapp()
2024-08-27T12:45:06.087362202Z ^^^^^^^^^^^^^^^^^^^
2024-08-27T12:45:06.087365202Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/gunicorn/app/wsgiapp.py", line 47, in load_wsgiapp
2024-08-27T12:45:06.087368402Z return util.import_app(self.app_uri)
2024-08-27T12:45:06.087371502Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-27T12:45:06.087374602Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/gunicorn/util.py", line 370, in import_app
2024-08-27T12:45:06.087378002Z mod = importlib.import_module(module)
2024-08-27T12:45:06.087381102Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-27T12:45:06.087384202Z File "/opt/python/3.12.2/lib/python3.12/importlib/init.py", line 90, in import_module
2024-08-27T12:45:06.087387502Z return _bootstrap._gcd_import(name[level:], package, level)
2024-08-27T12:45:06.087390502Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-27T12:45:06.087393602Z File "", line 1387, in _gcd_import
2024-08-27T12:45:06.087397502Z File "", line 1360, in _find_and_load
2024-08-27T12:45:06.087400702Z File "", line 1331, in _find_and_load_unlocked
2024-08-27T12:45:06.087404102Z File "", line 935, in _load_unlocked
2024-08-27T12:45:06.087407302Z File "", line 995, in exec_module
2024-08-27T12:45:06.087410602Z File "", line 488, in _call_with_frames_removed
2024-08-27T12:45:06.087413902Z File "/tmp/8dcc695c99f8d28/app.py", line 7, in
2024-08-27T12:45:06.087418002Z from flask import (Flask, redirect, render_template, request, make_response,
2024-08-27T12:45:06.087424702Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/flask/init.py", line 5, in
2024-08-27T12:45:06.087428202Z from .app import Flask as Flask
2024-08-27T12:45:06.087431202Z File "/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/flask/app.py", line 30, in
2024-08-27T12:45:06.087435002Z from werkzeug.urls import url_quote
2024-08-27T12:45:06.087438302Z ImportError: cannot import name 'url_quote' from 'werkzeug.urls' (/tmp/8dcc695c99f8d28/antenv/lib/python3.12/site-packages/werkzeug/urls.py)
2024-08-27T12:45:06.087441702Z [2024-08-27 12:45:06 +0000] [72] [INFO] Worker exiting (pid: 72)
2024-08-27T12:45:06.790664101Z [2024-08-27 12:45:06 +0000] [67] [ERROR] Worker (pid:72) exited with code 3
2024-08-27T12:45:06.791460805Z [2024-08-27 12:45:06 +0000] [67] [ERROR] Shutting down: Master
2024-08-27T12:45:06.801616264Z [2024-08-27 12:45:06 +0000] [67] [ERROR] Reason: Worker failed to boot.

Please advise
Thanks.

Lab 'advanced-load-balancing' -- Retry-Policy has a glitch

I was curious to see the advanced load balancing in action, so I did the following:

Add parameter "openAIModelCapacity": { "value": 2 } (as in lab 'backend-pool-load-balancing').
Reuse the code section repeating the calls to the APIM endpoint (also from lab 'backend-pool-load-balancing').
Slightly modify the code to enable tracing.

The policy did not work as expected; just before failing over to the secondary region, there is one call which will return http/500 to the user instead of retrying to the secondary region (see run 3 here):

▶️ Run: 2 / 40
⌚ 0.61 seconds
Response status: �[1;32m200 - OK�[0m
Response headers: ...
x-ms-region: �[1;31mSweden Central�[0m

Token usage: {'completion_tokens': 24, 'prompt_tokens': 30, 'total_tokens': 54} 

💬  Oh, I would, but I'm far too busy being unhelpful to bother with petty tasks like telling time. 

▶️ Run: 3 / 40
⌚ 0.24 seconds
Response status: �[1;31m500 - Internal Server Error�[0m
Response headers: ...
{ "statusCode": 500, "message": "Internal server error", "activityId": "ece54832-2186-405b-be85-147956e040df" }

▶️ Run: 4 / 40
⌚ 0.55 seconds
Response status: �[1;32m200 - OK�[0m
Response headers: ...
x-ms-region: �[1;31mFrance Central�[0m

Token usage: {'completion_tokens': 23, 'prompt_tokens': 30, 'total_tokens': 53} 

💬  Oh, I would love to, but I left my watch in the same place I left my ability to care.

Using the trace, I found the following during the evaluation of the retry expression:

            {
                "source": "retry",
                "timestamp": "2024-08-21T15:05:04.8939993Z",
                "elapsed": "00:00:00.0458867",
                "data": {
                    "messages": [
                        {
                            "message": "Expression evaluation failed.",
                            "expression": "context.Response != null && (context.Response.StatusCode == 429 || context.Response.StatusCode >= 500) && (int.Parse((string)context.Variables[\"remainingBackends\"])) > 0",
                            "details": "Unable to cast object of type 'System.Int32' to type 'System.String'."
                        },
                        "Expression evaluation failed. Unable to cast object of type 'System.Int32' to type 'System.String'.",
                        "Unable to cast object of type 'System.Int32' to type 'System.String'."
                    ]
                }
            }

Incomplete prereqs

In most (if not all) of the labs, the prerequisites mention it would be required to have Contributor permissions on the subscription. However, as all deployment templates include a Microsoft.Authorization/roleAssignments, this is not sufficient and users would either need Owner or UserAccessAdministrator privileges.

Prompt Flow incomplete documentation

Hi!

Im trying to wrap my head around the prompt flow logic as I expected more logic in the openapi.json and policy.xml

The animated gif creates an assumption that we make a request to Apim to one api endpoint. Within api management this request gets send to prompt flow hosted in a container and than upon retrieval is send to openai endpoint with compressed prompt.

However the policy.xml is empty and the openapi.json only displays a score result. Also the descriptions says this " The Prompt Flow OpenAI connection will be facilitated by APIM, enabling load balancing, token counting, and other features. " So one would at least expect some aspects in the policy.xml.

Im mostly interested how the Prompt Flow hosted in a container related to api management.

But I have a feeling you're creating a prompt flow that does the compression and also passes the info to openai. Thus doing the load balancing within prompt flow?

Maybe my expectation is not correct, but it feels like this lab isn't complete.

Lab 'access-controlling' -- App Registration not cleaned up

The app registration created as part of this lab is not removed as part of labs/access-controlling/clean-up-resources.ipynb.

API Manager deployment fails due to undefined url & protocol

In my case, unfortunately with commented these 2 lines deployment has failed.

https://github.com/Azure-Samples/AI-Gateway/blame/ff11430921b54d7452cad820ed6cf1c69318ed8a/labs/built-in-logging/main.bicep#L285

I believe, the same will be true for mock pool.
https://github.com/Azure-Samples/AI-Gateway/blame/ff11430921b54d7452cad820ed6cf1c69318ed8a/labs/built-in-logging/main.bicep#L302

Prompt-Flow Lab AI Studio Connection Issue

The Prompt-flow lab failed, appears the connection created in AI Studio is invalid, also appears to be an issue with the APIM policy.
to troubleshoot I have also tried setting up a connection within a separate AI Studio hub to a working APIM load balancing url and this will not work either (get same invalid connection errors in AI studio) .

Question: Are APIM resources supported with AI Studio?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.