Code Monkey home page Code Monkey logo

ai-gateway's Introduction

APIM ❤️ OpenAI

Open Source Love

Contents

  1. 🧠 AI Gateway
  2. 🧪 Labs
  3. 🚀 Getting started
  4. 🏛️ Well Architected Framework
  5. 🪞 Mock Server
  6. 🎒 Show and tell
  7. 🥇 Other Resources

The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.

AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI services.

With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production..

🧠 AI Gateway

AI-Gateway flow

This repo explores the AI Gateway pattern through a series of experimental labs. Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.

🧪 Labs

Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and APIM policies:

Request forwarding flow Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server. APIM uses the system managed identity to authenticate into the Azure OpenAI service. 💬
Backend circuit breaking flow Playground to try the built-in backend circuit breaker functionality of APIM to either an Azure OpenAI endpoints or a mock server. 💬
Backend pool load balancing flow Playground to try the built-in load balancing backend pool functionality of APIM to either a list of Azure OpenAI endpoints or mock servers. 💬
Advanced load balancing flow Playground to try the advanced load balancing (based on a custom APIM policy) to either a list of Azure OpenAI endpoints or mock servers. 💬
Response streaming flow Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with streaming. 💬
Vector searching flow Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. All the endpoints are managed via APIM. 💬
Built-in logging flow Playground to try the buil-in logging capabilities of API Management. The requests are logged into Application Insights and it's easy to track request/response details and token usage with provided notebook. 💬
SLM self-hosting flow Playground to try the self-hosted phy-3 Small Language Model (SLM) trough the APIM self-hosted gateway with OpenAI API compatibility. 💬
Access controlling flow Playground to try the OAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client. 💬
Token rate limiting flow Playground to try the token rate limiting policy to either a list of Azure OpenAI endpoints or mock servers. 💬
Semantic caching flow Playground to try the sementic caching policy. 💬
Token metrics emitting flow Playground to try the emit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs. 💬
GPT-4o inferencing flow Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats. 💬

Backlog of experiments

  • Developer tooling
  • App building
  • Token counting
  • Function calling
  • Assistants load balancing
  • Semantic Kernel plugin
  • Cost tracking
  • Content filtering
  • PII handling
  • Prompt storing
  • Prompt guarding
  • Prompt model routing
  • Llama inferencing

Tip

Kindly use the feedback discussion so that we can continuously improve with your experiences, suggestions, ideas or lab requests.

🚀 Getting Started

Prerequisites

Quickstart

  1. Clone this repo and configure your local machine with the prerequisites. Or just create a GitHub Codespace and run it on the browser or in VS Code.
  2. Navigate through the available labs and select one that best suits your needs. For starters we recommend the request forwarding with just the Azure CLI or the backend pool load balancing with Bicep.
  3. Open the notebook and run the provided steps.
  4. Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate your submission of a pull request.

Note

🪲 Please feel free to open a new issue if you find something that should be fixed or enhanced.

🏛️ Well-Architected Framework

The Azure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.

Lab Security Reliability Performance Operations Costs
Request forwarding
Backend circuit breaking
Backend pool load balancing
Advanced load balancing
Response streaming
Vector searching
Built-in logging
SLM self-hosting

🪞 Mock Server

The AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with APIM and other use cases. The app.py can be customized to tailor the Mock server to specific use cases.

🎒 Show and tell

Tip

Install the VS Code Reveal extension, open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code. Or just open the AI-GATEWAY.pptx for a plain old PowerPoint experience.

🥇 Other resources

Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.

We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.

🌐 WW GBB initiative

GBB

Disclaimer

Important

This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.

ai-gateway's People

Contributors

vieiraae avatar microsoftopensource avatar odaibert avatar

Stargazers

Julien S avatar  avatar Maximus avatar Jan Wächter avatar Marten Sjo avatar Hugo Girard avatar Ryland DeGregory avatar Ying Xiang avatar Joshua Thomas avatar h.shinbo avatar Pankaj Agrawal avatar Spiros Konstantopoulos avatar Jason Leong avatar Sayan Ghosh avatar ks6088ts avatar Yasuki Takami avatar  avatar Alex Khaerov avatar Luke Murray avatar Renato Ribeiro avatar  avatar  avatar  avatar  avatar James Croft avatar  avatar Sean Keegan avatar Elena Neroslavskaya avatar David Navalho avatar George Luiz Bittencourt avatar  avatar Johnny Harbieh avatar Sunil Sattiraju avatar Stephane Eyskens avatar Luca Milan avatar Miguel P Z avatar SB avatar Brad Stevens avatar Gavita Regunath avatar ChrisNGP avatar Sébastien JULIEN avatar Richard Chong avatar SHAFFIULLAH  avatar Kim avatar SouthPaw avatar  avatar Menno Laan avatar  avatar Pamela Fox avatar André Ribeiro avatar Rodrigo Groener avatar Shashi Kumar Nagulakonda avatar Chris King avatar Mohamed Chorfa avatar Emre Sebat avatar Bassam Al-Mahamid avatar Abozar avatar CS Detective avatar Ahmed Magdy avatar Gerrit Toxopeus avatar Justin  avatar Julio avatar Sadha avatar Unai Huete Beloki avatar Carlos Mendible avatar Fernando Cortés Hierro avatar  avatar Raphael Bickel avatar Aymen avatar  avatar

Watchers

Erik St. Martin avatar Christian Eduardo Palomares Peralta avatar  avatar John Scott avatar Rafał Mielowski avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar Maciej Treder avatar Pascal van der Heiden avatar  avatar

ai-gateway's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.