epam / ai-dial-sdk Goto Github PK

View Code? Open in Web Editor NEW

22.0 19.0 4.0 276 KB

Framework to create applications and model adapters for AI DIAL

Home Page: https://epam-rail.com

License: Apache License 2.0

Makefile 0.72% Python 99.28%

ai-dial llm

ai-dial-sdk's Introduction

AI DIAL Python SDK

Overview

Framework to create applications and model adapters for AI DIAL.

Applications and model adapters implemented using this framework will be compatible with AI DIAL API that was designed based on Azure OpenAI API.

Usage

Install the library using pip:

pip install aidial-sdk

Echo application example

The echo application example replies to the user by repeating their last message:

# Save this as app.py
import uvicorn

from aidial_sdk import DIALApp
from aidial_sdk.chat_completion import ChatCompletion, Request, Response


# ChatCompletion is an abstract class for applications and model adapters
class EchoApplication(ChatCompletion):
    async def chat_completion(
        self, request: Request, response: Response
    ) -> None:
        # Get last message (the newest) from the history
        last_user_message = request.messages[-1]

        # Generate response with a single choice
        with response.create_single_choice() as choice:
            # Fill the content of the response with the last user's content
            choice.append_content(last_user_message.content or "")


# DIALApp extends FastAPI to provide a user-friendly interface for routing requests to your applications
app = DIALApp()
app.add_chat_completion("echo", EchoApplication())

# Run built app
if __name__ == "__main__":
    uvicorn.run(app, port=5000)

Run

python3 app.py

Check

Send the next request:

curl http://127.0.0.1:5000/openai/deployments/echo/chat/completions \
  -H "Content-Type: application/json" \
  -H "Api-Key: DIAL_API_KEY" \
  -d '{
    "messages": [{"role": "user", "content": "Repeat me!"}]
  }'

You will see the JSON response as:

{
    "choices":[
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "Repeat me!"
            }
        }
    ],
    "usage": null,
    "id": "d08cfda2-d7c8-476f-8b95-424195fcdafe",
    "created": 1695298034,
    "object": "chat.completion"
}

Developer environment

This project uses Python>=3.8 and Poetry>=1.6.1 as a dependency manager.

Check out Poetry's documentation on how to install it on your system before proceeding.

To install requirements:

poetry install

This will install all requirements for running the package, linting, formatting and tests.

IDE configuration

The recommended IDE is VSCode. Open the project in VSCode and install the recommended extensions.

The VSCode is configured to use PEP-8 compatible formatter Black.

Alternatively you can use PyCharm.

Set-up the Black formatter for PyCharm manually or install PyCharm>=2023.2 with built-in Black support.

Environment Variables

Variable	Default	Description
DIAL_SDK_LOG	WARNING	DIAL SDK log level

Lint

Run the linting before committing:

make lint

To auto-fix formatting issues run:

make format

Test

Run unit tests locally for available python versions:

make test

Run unit tests for the specific python version:

make test PYTHON=3.11

Clean

To remove the virtual environment and build artifacts run:

make clean

Build

To build the package run:

make build

Publish

To publish the package to PyPI run:

make publish

ai-dial-sdk's People

Contributors

Stargazers

Watchers

Forkers

achikovani avavilau suyambuganesh82 weaidiscovery

ai-dial-sdk's Issues

Support metrics publishing in case of multiple workers

Change default log level to warning

Allow custom chat/completions endpoints

Currently it's impossible to declare chat/completions route other than via DIALApp.add_chat_completion method.

Which leads to the following server returning 404 when calling my-deployment chat completions:

app = DIALApp()

@app.post("/openai/deployments/my-deployment/chat/completions")
async def chat_completion(deployment_id: str, request: Request):
   pass

The same goes about /rate, /tokenize, /truncate_prompt endpoints.

It's impossible to implement /chat/completions endpoint for an arbitrary deployment id. Meaning that the list of deployment names must be known beforehand. This is not very convenient.

Extend chat completion request class with the recently supported fields

As per documentation the following fields are currently missing in SDK:

seed
logprobs
top_logprobs
response_format

Note that it also solves the issue recently introduced in lanchain-openai==0.1.17 where logprobs is defaulted to False instead of None, so that the following code:

from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    openai_api_version="2023-12-01-preview",
    azure_deployment="gemini-1.5-flash-001", # or any other vertexai/bedrock model
)
llm.invoke("2+3=?")

fails with error:

BadRequestError: Error code: 400 - {'error': {'message': 'Your request contained invalid structure on path logprobs. extra fields not permitted', 'type': 'invalid_request_error'}}

Thus, the issue is partically caused by langchain-ai/langchain#23691 and partially by the need to sync Azure OpenAI API with the DIAL SDK.

Add default health check

Add the ability to optionally activate the /health endpoint returning status_code=200 and body={"status": "ok"}

Rename propagation_auth_headers param to follow the same style for all params

Support streaming for the statistics section

Currently, we publish all collected statistics only with the last chunk.

Migrate langchain_rag example to the latest version of langchain

Create separate library for DIAL Client SDK

share code to opensource

as agreed, we want to share internal toolset for DIAL platform with public

the codebase should be prepared, code history should be squashed

TextIO-compatible interface for append_content

Would be nice to have io.TextIO-compatible interface for Choice.append_content and Stage.append_content to support interoperability with existing python libraries.

Examples of the use-cases:

Print:

print("Hello", file=choice.content_stream)

Progress with tqdm

for item in tqdm(items, file=stage.content_stream):
    process(item)

Stage logging

logging_handler = logging.StreamHandler(stream=stage.content_stream)

Writers for some format:

csv_writer = csv.writer(choice.content_stream)
csv_writer.writerows(data)

And any other library which accepts file-like object as argument.

Support stages in request messages

Currently, the request message datatype doesn't include field for stages:

ai-dial-sdk/aidial_sdk/chat_completion/request.py

Lines 24 to 26 in 78c764b

    
           class CustomContent(ExtraForbidModel): 
        
               attachments: Optional[List[Attachment]] = None 
        
               state: Optional[Any] = None

This is in line with DIAL API (as of 6 Jun 2024), which tells that only the chat/completions response may have stages, but the request messages aren't allowed to.

However, this breaks a typical pattern of chat completion usage.

Suppose, this is a program one uses with a regular GPT-4 chat completions endpoint:

messages = []
while True:
    user = input("User: ")
    messages.append({"role": "user", "content": user})
    response = client.chat.completions.create(messages=messages)
    new_message = response.choices[0].message.dict()
    messages.append(new_message)

Then one decides to switch to a certain DIAL application (written using DIAL SDK), which also produces stages.
This code will break in the application, unless one explicitly removes the stages:

messages = []
while True:
    user = input("User: ")
    messages.append({"role": "user", "content": user})
    response = client.chat.completions.create(messages=messages)
    new_message = response.choices[0].message.dict()
+    if "stages" in new_message and "custom_content" in new_message["stages"]:
+        del new_message["custom_content"]["stages"]
    messages.append(new_message)

Which subverts the claim that DIAL API is backward compatible with OpenAI API.

A way to fail the stage without failing the request

Need some easy way to fail stage without failing the whole request.
The application logic may have alternative ways to do something. If some stage of the request fails, it does not always mean that the whole request should also fail.

Now you have to write a code like this fail the stage with some expected error an handle the alternative approach, and still have the request to be failed in case of some unexpected error:

	class MyFailStageException(Exception):
		pass
	...

	try:
		with choice.create_stage("stage1") as stage1:
			has_error = do_something()
			if has_error:
				raise MyFailStageException()

	except MyFailStageException:
		pass

	with choice.create_stage("stage2") as stage2:
		do_something_alternative()

It would be easier if you would be able to do something like stage.fail() without raising an exception which would fail the whole request

	with choice.create_stage("stage1") as stage1:
		has_error = do_something()
		if has_error:
			stage1.fail()

	# request execution continues here

Lifespan Events don't work with propagation_auth_headers=True argument

If I set the propagation_auth_headers argument in the DIALApp constructor and add one of the FastAPI lifespan events, the application will fail.

Code:

app = DIALApp('...', propagation_auth_headers=True)  # lifespan=lifespan)
app.add_chat_completion("echo", EchoApplication())

@app.on_event("startup")
async def startup_event():
    print('!!! Startup event !!!')

# Run builded app
if __name__ == "__main__":
    uvicorn.run(app, port=5000, host="0.0.0.0", lifespan='on')

Error message:

INFO: Started server process [4112]
INFO: Waiting for application startup.
ERROR: Exception in 'lifespan' protocol
Traceback (most recent call last):
File "...\venv\lib\site-packages\uvicorn\lifespan\on.py", line 86, in main
await app(scope, self.receive, self.send)
File "...\venv\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "...\venv\lib\site-packages\fastapi\applications.py", line 1106, in call
await super().call(scope, receive, send)
File "...\venv\lib\site-packages\starlette\applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "...\venv\lib\site-packages\starlette\middleware\errors.py", line 149, in call
await self.app(scope, receive, send)
File "...\venv\lib\site-packages\aidial_sdk\header_propagator.py", line 26, in call
for header in scope["headers"]:
KeyError: 'headers'
ERROR: Application startup failed. Exiting.

Process finished with exit code 3

Use SecretStr type for api_key and jwt in Request

Pydantic library has SecretStr type to store secrets. The SecretStr string will be formatted as '**********' if accidentally printed to logs.
https://docs.pydantic.dev/1.10/usage/types/#secret-types

It would be great to use SecretStr instead of StrictStr for api_key and jwt fields of the aidial_sdk.chat_completion.Request.

This would also help with langchain interoperability, which already uses SecretStr as an api_key parameter:
https://github.com/langchain-ai/langchain/blob/acc8fb3ead6092685947a56d83b745cfda70c970/libs/partners/openai/langchain_openai/llms/azure.py#L48

Strictly define the scope of the public interface of the library

Currently, pretty much every single module in DIAL SDK is publically available.

E.g. modules utils/*.py may be potentially used by a library user, however, we do not really expect this to happen.

Our expectations should be expressed explicitly in the library code.

The modules/classes/methods which we consider to be private to the library (i.e. pertaining to its implementation details) should be prefixed with underscore, so that they are hidden from the public interface.

Support display_message in the error schema

We want to use the same error json schema in both steaming and regular responses. {message, type, code. param}. As here: https://github.com/epam/ai-dial-sdk/blob/development/tests/test_errors.py#L26

And we want to extend it with display_message field. Which app may setup if it wants to send user a message.

We will do it unless you have strong reasons not to.

Telemetry: support conditional loading of instrumentors

None of opentelemetry instrumentors declare dependencies on the libraries which they instruments.

E.g. HTTPXClientInstrumentor doesn't depend on httpx library.

So if a DIAL SDK client doesn't have httpx dependency and it enables telemetry, then it will fail with the import error during initialization of the telemetry:

  File "./.venv/lib/python3.11/site-packages/opentelemetry/instrumentation/httpx/__init__.py", line 167, in <module>
    import httpx
ModuleNotFoundError: No module named 'httpx'

It happens because the instrumentor is loaded unconditionally:

from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor

def init_telemetry(
    app: FastAPI,
    config: TelemetryConfig,
):
   ###
   HTTPXClientInstrumentor().instrument()
   ###

Instead we should load it only lazily:

def init_telemetry(
    app: FastAPI,
    config: TelemetryConfig,
):
   ###
    try:
        from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
        HTTPXClientInstrumentor().instrument()
    except ImportError:
        pass # httpx lib is not installed, not need to load the instrumentor
   ###

The same applies for header propagation logic.

Wrong errors format

Actual error response body:

{
  "detail": {
    "error": {
      "message": "Error during processing the request",
      "type": "runtime_error",
      "param": null,
      "code": null
    }
  }
}

Expected:

{
  "error": {
    "message": "Error during processing the request",
    "type": "runtime_error",
    "param": null,
    "code": null
  }
}

Telemetry: make service name optional to support env variable

The application should support open telemetry environment variable to allow service name to be overridden:
OTEL_RESOURCE_ATTRIBUTES: "service.name=<service name>"

Support tools

Add tool-related fields to the request/response schemas.

Support header propagation for httpx

Header propagation doesn't work for openai>=1.0, because it uses httpx lib.

Note

Header propagation does work for openai<1.0, beause it uses aiohttp lib, which is supported in SDK.

Expose metrics to a dedicated port

Currently /metrics is served at the same port as other endpoints.

We need to move it to another port e.g. 9464

Support python 3.12

Don't generate chunks after any error chunk

Disable logging style override by default

For example, we are now overriding the standard logging level for uvicorn.

Support optional "parameters" field in a function definition

Even though Azure OpenAI explicitly tells that paramers is required, OpenAI tells that it's optional:

Note that Azure OpenAI does support missing parameters fields and treats it as OpenAI doc above describes.

So it seems likely that Azue OpenAI spec is lagging behind the OpenAI spec.

Note that this unbocks epam/ai-dial-adapter-bedrock#133 and epam/ai-dial-adapter-vertexai#110

Support httpx client for opentelemetry

Add max_addons_dialogue_tokens parameter to the completion request

The parameter is needed for the Assistant service to control the maximum number of tokens dedicated to dialogue with addons.

Migrate to new AI DIAL authorization method in header propagation

In new releases AI DIAL requests only api-key header in requests. authorization header with jwt will be used only in case of access to user file storage.

Support rate response API

Add support of API rate response.

Scenario.
User asks a question for the application. The application response with a message to user. The user may react on the quality of the response by rising finger up/down.

Processing flow

accept request with deploymentId in the request path and JSON body including responseId - ID of the response produced by application, rate - user reaction on the response quality. Response id is an arbitrary string, rate - boolean value(true - user likes the response otherwise false)
Finds registered chat completion handler by deploymentId. If the handler is not found return 404.
Validate the request: JSON structure should be valid and contains required fields. Returns 404 if the validation fails.
Run the handler to process the request
Response: the API should respond with 200(ok)

Note. Chat completion handler has a default implementation of rate response: it does nothing.

Support declaration of `tokenize` and `truncate_prompt` endpoints

It blocks the following issues:

Add unit tests to achieve 100% coverage

Telemetry: use correct version for opentelemetry-exporter-prometheus

Trying to add aidial-sdk[telemetry] to the dependency leads to the warning:

Warning: The file chosen for install of opentelemetry-exporter-prometheus 1.12.0rc1 (opentelemetry_exporter_prometheus-1.12.0rc1-py3-none-any.whl) is yanked. Reason for being yanked: Version is deprecated.

See https://pypi.org/project/opentelemetry-exporter-prometheus/#history

Non-deprecated version of opentelemetry-exporter-prometheus should be used. Most likely, the 0.41b0 - same as for other instrumentations.

choice.add_attachment(
        type=attachment.type,
        title=attachment.title,
        data=attachment.data,
        url=attachment.url,
        reference_url=attachment.reference_url,
        reference_type=attachment.reference_type,
)

which looks odd.

I expect the following code to work:

choice.add_attachment(attachment)

	class CustomContent(ExtraForbidModel):
	attachments: Optional[List[Attachment]] = None
	state: Optional[Any] = None