Deploy a multi LLM and multi RAG powered chatbot using AWS CDK on AWS

Table of content

Features
Architecture
Precautions
Preview Access and Service Quotas
Models Providers
RAG Sources
Deploy
Clean up
Authors
Credits
License

Features

Modular, comprehensive and ready to use

This sample provides code ready to use so you can start experimenting with different LLMs and prompts.

Supported models providers:

Amazon Bedrock (currently in preview)
Amazon SageMaker self hosted models from Foundation, Jumpstart and HuggingFace.
External providers via API such as AI21 Labs, Cohere, OpenAI, etc. See available langchain integrations for a comprehensive list.

Multiple RAG sources

This sample provides comes with CDK constructs to allow you to optionally deploy one or more of:


Example with Kendra as RAG source	Example with Amazon OpenSearch Vector Search as RAG source

Full-fledged User Interface

The repository includes a CDK construct to deploy a full-fledged UI built with React to interact with the deployed LLMs as chatbots. Hosted on Amazon S3 and distributed with Amazon CloudFront. Protected with Amazon Cognito Authentication to help you interact and experiment with multiple LLMs, multiple RAG sources, conversational history support and documents upload. The interface layer between the UI and backend is build on top of Amazon API Gateway WebSocket APIs.

Build on top of AWS Cloudscape Design System.

Architecture

This repository comes with several reusable CDK constructs. Giving you freedom to decide what the deploy and what not.

Here's an overview:

Available CDK Constructs

Authentication

This CDK constructs provides necessary Amazon Cognito resources to support user authentication.

Websocket Interface

This CDK constructs deployes a websocket based interface layer to allow two-way communication between the user interface and the model interface.

Main Topic and Queues - FIFO

This is not a CDK construct but it's important to note that messages are delivered via Amazon SQS FIFO queues and routed via an Amazon SNS FIFO Topic.

FIFO is used to ensure the correct order of messages inflow/outflow to keep a "chatbot conversation" always consistent for both user and LLM. Also to ensure that, where streaming tokens, is used the order of tokens is also always respected.

Model Interface

CDK constructs which deploye resources, dependencies and data storage to integrate with multiple LLM sources and providers. To facilitate further integrations and future updates and reduce amount of customization required, we provide code built with known existing LLM oriented frameworks.

Pre-built model interafaces:

LangchainModelInterface: python-centric and built on top of Langchain framework and leveraging Amazon DynamoDB as LangChain Memory.

Model Adapters

The model interface carries a concept of ModelAdapter with it. It's a class that you can inherit and ovveride specific methods to integrate with different models that might have different requirements in terms of prompt structure or parameters.

It also natively support subscription to LangChain Callback Handlers.

This repository provides some sample adapetrs that you can take inspiration from to integrate with other models. Read more about it here.

SageMaker Model

A prupose-built CDK Construct, SageMakerModel, which helps facilitate the deployment of model to SageMaker, you can use this layer to deploy:

Models from SageMaker Foundation Models/Jumpstart
Model supported by HuggingFace LLM Inference container.
Models from HuggingFace with custom inference code.

Layer

The Layer construct in CDK provides an easier mechanism to manage and deploy AWS Lambda layers. You can specify dependencies and requirements in a local folder and the layer will pack, zip and upload the depedencies autonomously to S3 and generate the Lambda Layer.

VPC

This CDK construct simply deploys public, private, and isolated subnets. Additionally, this stack deploys VPC endpoints for SageMaker endpoints, AWS Secrets Manager, S3, and Amazon DynamoDB, ensuring that traffic stays within the VPC when appropriate.

Retrieval Augmented Generation (RAG) CDK Constructs

This repo also comes with CDK constructs to help you getting started with pre-built RAG sources.

All RAG constructs leverages the same pattern of implementing:

An ingestion queue to recieve upload/delete S3 events for documents
An ingestion, converstion and storage mechanism which is specific to the RAG source
An API endpoint to expose RAG results to consumers, in our case the model interface.

In this sample each RAG sources is exposes endpoints and formats results in order to be used as LangChain RemoteRetriever from the Model Interface as part of a ConversationalRetrievalChain.

This aims to allow seamless integration with Langchain chains and workflows.

Amazon Aurora with pgvector

The CDK construct deployes a vector database on Amazon Aurora PostgreSQL with pgvector and embeddings.

Embeddings Model: sentence-transformers/all-MiniLM-L6-v2
Ranking Model: cross-encoder/ms-marco-MiniLM-L-12-v2

Hybrid search is performed with a combination of

Similiary Search
Full Text Search
Reranking of results

Check here to learn how to enable it in the stack.

Amazon OpenSearch VectorSearch (requires Bedrock Preview Access)

The CDK construct deployes a AOSS vector database capabilities with required collection, VPC endpoints, data access, encryption policies and a an index that can be used with embeddings produced by Amazon Titan Embeddings

Embeddings Model: Amazon Titan Embeddings

Check here to learn how to enable it in the stack.

Amazon Kendra

This CDK Construct deployes an Amazon Kendra Index and necessary resoures to ingest documents and search them via LangChain Amazon Kendra Index Retriever.

Make sure to review Amazon Kendra Pricing before deploying it.

Check here to learn how to enable it in the stack.

⚠️ Precautions ⚠️

Before you begin using the sample, there are certain precautions you must take into account:

Cost Management with self hosted models: Be mindful of the costs associated with AWS resources, especially with SageMaker models which are billed by the hour. While the sample is designed to be cost-effective, leaving serverful resources running for extended periods or deploying numerous LLMs can quickly lead to increased costs.
Licensing obligations: If you choose to use any datasets or models alongside the provided samples, ensure you check LLM code and comply with all licensing obligations attached to them.
This is a sample: the code provided as part of this repository shouldn't be used for production workloads without further reviews and adaptation.

Preview Access and Service Quotas

Amazon Bedrock If you are looking to interact with models from Amazon Bedrock FMs, you need to request preview access from the AWS console. Futhermore, make sure which regions are currently supported for Amazon Bedrock.
Instance type quota increase You might consider requesting an increase in service quota for specific SageMaker instance types such as the ml.g5 instance type. This will give access to latest generation of GPU/Multi-GPU instances types. You can do this from the AWS console.
Foundation Models Preview Access If you are looking to deploy models from SageMaker foundation models, you need to request preview access from the AWS console. Futhermore, make sure which regions are currently supported for SageMaker foundation models.

Providers

Amazon Bedrock (Preview)

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from Amazon and leading AI startups available through an API, so you can choose from various FMs to find the model that's best suited for your use case. With the Amazon Bedrock serverless experience, you can quickly get started, easily experiment with FMs, privately customize FMs with your own data, and seamlessly integrate and deploy them into your applications using AWS tools and capabilities.

If your account has access to Amazon Bedrock, there's no additional action required and you can deploy this sample as it is and Bedrock models will appear in your model list.

Self Hosted Models on SageMaker

This sample comes with a prupose-built CDK Construct, SageMakerModel, which helps abstracting 3 different types of model deployments:

Models from SageMaker Foundation Models/Jumpstart.
Model supported by HuggingFace LLM Inference container.
Models from HuggingFace with custom inference code.

3P Models Providers

You can also interact with external providers via their API such as AI21 Labs, Cohere, OpenAI, etc.

The provider must be supported in the Model Interface, see available langchain integrations for a comprehensive list of providers.

Usually an API_KEY is required to integrated with 3P models. To do so, the Model Interface deployes a Secrets in AWS Secrets Manager, intially with an empty JSON {}, where you can add your API KEYS for one or more providers.

These keys will be injected at runtime into the Lambda function Environment Variables, they won't be visibile in the AWS Lambda Console.

For example, if you wish to be able to interact with AI21 Labs., OpenAI's and Cohere endponts:

Open the Model Interface Keys Secret in Secrets Manager. You can find the secret name in the stack output too.
Update the Secrets by adding a key to the JSON

{
  "AI21_API_KEY": "xxxxx",
  "OPENAI_API_KEY": "sk-xxxxxxxxxxxxxxx",
  "COHERE_API_KEY": "xxxxx",
}

N.B: In case of no keys needs, the secret value must be an empty JSON {}, NOT an empty string ''.

make sure that the environment variable matches what is expected by the framework in use, like Langchain (see available langchain integrations.

Deploy

1. IMPORTANT Prerequisites for models providers

⚠️ IMPORTANT: Depending on the Model Provider you want to use there are different prerequisites. ⚠️

WITH Amazon Bedrock

If you want to use Amazon Bedrock you must sign up for preview access from the AWS console.

If access is granted you need to add the region and endpoint_url provided as part of the preview access in lib/aws-genai-llm-chatbot-stack.ts

const bedrockRegion = 'region';
const bedrockEndpointUrl = 'https://endpoint-url';

After this you can jump to the next step: Enviroment.

WITHOUT Amazon Bedrock

If you don't have access to Amazon Bedrock you can choose to:

a. Deploy a self hosted model on Sagemaker.

To facilitate this steps there are 2 commented examples on how to deploy:

More instructions on how to deploy other models here.

b. Interact with a 3P models providers

You can find how here.

(Optional) If using AWS Cloud9

If you'd like to use AWS Cloud9 to deploy the solution from you will need the following before proceeding:

at least m5.large as Instance type.
use Amazon Linux 2 as the platform.
increase the instance's EBS volume size to at least 100GB. To do this, run the following commands from the Cloud9 terminal. See the documentation for more details here.

./assets/cloud9-resize.sh 100

2. Environment setup

Verify that your environment satisfies the following prerequisites:

You have:

An AWS account
AdministratorAccess policy granted to your AWS account (for production, we recommend restricting access as needed)
Both console and programmatic access
NodeJS 16 or 18 installed
- If you are using nvm you can run the following before proceeding
- ```
nvm install 16 && nvm use 16

or

nvm install 18 && nvm use 18
```
AWS CLI installed and configured to use with your AWS account
Typescript 3.8+ installed
AWS CDK CLI installed
Docker installed
- N.B. buildx is also required. For Windows and macOS buildx is included in Docker Desktop
Python 3+ installed

3. Prepare CDK

The solution will be deployed into your AWS account using infrastructure-as-code wih the AWS Cloud Development Kit (CDK).

Clone the repository:

git clone https://github.com/aws-samples/aws-genai-llm-chatbot.git

Navigate to this project on your computer using your terminal:

cd aws-genai-llm-chatbot

Install the project dependencies by running this command:

npm install

(Optional) Bootstrap AWS CDK on the target account and regioon

Note: This is required if you have never used AWS CDK before on this account and region combination. (More information on CDK bootstrapping).

npx cdk bootstrap aws://{targetAccountId}/{targetRegion}

4. Deploy the solution to your AWS Account

Verify that Docker is running with the following command:

docker version

Note: If you get an error like the one below, then Docker is not running and need to be restarted:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Deploy the sample using the following CDK command:

npx cdk deploy

Note: This step duration can vary a lot, depending on the Constructs you are deploying. Can go from 6m with basic usage with Amazon Bedrock to 40m deploying all RAG sources an self hosted models.

You can view the progress of your CDK deployment in the CloudFormation console in the selected region.
Once deployed, take note of the User Interface, User Pool and, if you want to interact with 3P models providers the Secret that will hold the various API_KEYS.

...
Outputs:
AwsGenaiLllmChatbotStack.WebInterfaceUserInterfaceUrlXXXXX = dxxxxxxxxxxxxx.cloudfront.net
AwsGenaiLllmChatbotStack.AuthenticationUserPoolLinkXXXXX = https://xxxxx.console.aws.amazon.com/cognito/v2/idp/user-pools/xxxxx_XXXXX/users?region=xxxxx
AwsGenaiLllmChatbotStack1.LangchainInterfaceKeysSecretsNameXXXX = LangchainInterfaceKeySecret-xxxxxx
...

Open the generated Cognito User Pool Link from outputs above i.e. https://xxxxx.console.aws.amazon.com/cognito/v2/idp/user-pools/xxxxx_XXXXX/users?region=xxxxx
Add a user that will be used to login into the web interface.
Open the User Interface Url frin the outputs above i.e. dxxxxxxxxxxxxx.cloudfront.net
Login with the user created in .6, you will be asked to change the password and you'll be logged in in the main page.

Clean up

You can remove the stacks and all the associated resources created in your AWS account by running the following command:

npx cdk destroy

Authors

Credits

This sample was made possible thanks to the following libraries:

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Changelog of the project.
License of the project.
Code of Conduct of the project.
CONTRIBUTING for more information.

voicefoundry-cloud / aws-genai-llm-chatbot Goto Github PK

aws-genai-llm-chatbot's Introduction

Deploy a multi LLM and multi RAG powered chatbot using AWS CDK on AWS

Table of content

Features

Modular, comprehensive and ready to use

Multiple RAG sources

Full-fledged User Interface

Architecture

Available CDK Constructs

Authentication

Websocket Interface

Main Topic and Queues - FIFO

Model Interface

Model Adapters

SageMaker Model

Layer

VPC

Retrieval Augmented Generation (RAG) CDK Constructs

Amazon Aurora with pgvector

Amazon OpenSearch VectorSearch (requires Bedrock Preview Access)

Amazon Kendra

⚠️ Precautions ⚠️

Preview Access and Service Quotas

Providers

Amazon Bedrock (Preview)

Self Hosted Models on SageMaker

3P Models Providers

Deploy

1. IMPORTANT Prerequisites for models providers

WITH Amazon Bedrock

WITHOUT Amazon Bedrock

a. Deploy a self hosted model on Sagemaker.

b. Interact with a 3P models providers

(Optional) If using AWS Cloud9

2. Environment setup

3. Prepare CDK

4. Deploy the solution to your AWS Account

Clean up

Authors

Credits

License

Recommend Projects

Recommend Topics

Recommend Org