graffinity-cdi-llm-api

API for converting user searches to robust LLM prompts and returning a response.

This project is developed in collaboration with the Centre for Advanced Research Computing (ARC), University College London and the Centre for Digital Innovation (CDI), University College London.

About

Project Team

Harry Moss ([email protected])

Sanaz Jabbari ([email protected])

Nik Khadijah Nik Aznan ([email protected])

Research Software Engineering Contact

Centre for Advanced Research Computing, University College London ([email protected])

Built With

Getting Started

Prerequisites

llm-api requires Python 3.11 or newer.

Installation

We recommend installing in a project specific virtual environment created using a environment management tool such as Conda. To install the latest development version of llm-api using pip in the currently active environment run

pip install git+https://github.com/UCL-ARC/graffinity-cdi-llm-api.git

Alternatively create a local clone of the repository with

git clone https://github.com/UCL-ARC/graffinity-cdi-llm-api.git

and then install in editable mode by running

pip install -e .

Setting up environment variables

This is a crucial step in running the application and should not be skipped! We use Pydantic settings management to configure and verify settings such as API keys, LLM model choice and, when running via Docker Compose, port choice. Pydantic will preferentially set the variables defined in config.py from existing environment variables, before reading from a .env file in the root directory of the repository. An example .env.example file is provided showing the correct naming scheme for all required settings variables, with a prefix defined in config.py. Model names are defined as StrEnums to show developers the possible values each variable may take.

Before running the application (either locally or in a container), rename .env.example to .env and provide a value for each variable. Particular attention should be paid to API keys and model names.

A description of each variable is provided below:

LLM_API_OPENAI_API_KEY is your OpenAI API key. You must have completed billing details and preloaded credit to your account before models are callable.
LLM_API_OPENAI_LLM_NAME is set with a prefilled value in .env.example and is the recommended OpenAI model for use.
LLM_API_AWS_ACCESS_KEY_ID is the Access Key ID from an AWS account with the Allow permission set on the bedrock:InvokeModel action on the resource arn:aws:bedrock:*::foundation-model/*.
LLM_API_AWS_SECRET_ACCESS_KEY is the corresponding secret key to the above Access Key ID.
LLM_API_AWS_BEDROCK_MODEL_ID is the bedrock model ID string. As a default, this is set to anthropic.claude-v2.
API_PORT is set to 9000 as a default. Feel free to change this as required.

Running Locally

The FastAPI application can be run locally with

python src/llm_api/main.py

This runs the application via the Uvicorn ASGI server on http://localhost:8000, with automatically generated OpenAPI Swagger documentation available at http://localhost:8000/docs. Any changes to the code are immediately reflected in the running application.

This default behaviour may be changed by running the uvicorn server directly via

uvicorn llm_api.main:app --host {your host here} --port {your port here}

Running the application with a Uvicorn server is intended only for local testing and is not recommended for use in production. For production deployment, please see Docker deployment via Docker Compose.

Docker deployment via Docker Compose

Dockerfile contains instructions for building a docker image that runs this application with a Gunicorn server. Gunicorn configuration can be found in src/llm_api/gunicorn_conf.py. Ensure the API_PORT variable is defined in the .env file.

Container orchestration is performed by Docker Compose, allowing for multiple networked containers. You may prefer to use Kubernetes or similar, and this is just shown here as an example. The compose.yml file defines the service name, relevant env file to use, methods of determining container health, port mappings and internal network names. We currently deploy a single service, though this can be easily extended using Docker Compose.

Build and deploy the application on http://localhost:{API_PORT} with the following command

docker compose --project-name ${PROJECT_NAME} up --build

the application can be taken down via

docker compose  --project-name ${PROJECT_NAME} down

Any additional running containers associated with ${PROJECT_NAME} but not defined in compose.yml (for whatever reason) can be stopped with

docker compose  --project-name llm_api down --remove-orphans

Running Tests

Tests can be run across all compatible Python versions in isolated environments using tox by running

tox

To run tests manually in a Python environment with pytest installed run

pytest tests

Contributing

To contribute to the project as a developer, use the following as a guide. These are based on ARC Collaborations group practices and code review documentation.

Developer install

Install the project and development dependencies via pip with

pip install -e ".[dev,tests]"

Install pre-commit hooks with

pre-commit install

Future git commit operations will now run pre-commit hooks to ensure code style and typing conventions are followed. Please remember to do this!

Python standards we follow

To make explicit some of the potentially implicit:

We will target Python versions >= 3.11
We will use ruff for linting and code formatting to standardise code, improve legibility and speed up code reviews
Function arguments and return types will be annotated, with type checking via mypy
We will use docstrings to annotate classes, class methods and functions
- If you use Visual Studio Code, autoDocstring is recommended to speed this along.

Secrets detection

We use a secret detection pre-commit hook to ensure that no passwords, API keys or similarly sensitive credentials are committed to the repository. If you add in some fake credentials (for testing purposes or similar), please update the .secrets.baseline file in order for CI checks on any resulting pull requests to pass. You can update this file by running

detect-secrets scan > .secrets.baseline

from the root directory of the repository. The detect-secrets dependency is installed via pip if you select the dev optional dependencies.

General GitHub workflow

Create a branch for each new piece of work with a suitable descriptive name, such as feature-newgui or adding-scaffold
Do all work on this branch
Open a new PR for that branch to contain discussion about your changes
- Do this early and set as a 'Draft PR' (on GitHub) until you are ready to merge to make your work visible to other developers
Make sure the repository has CI configured so tests (ideally both of the branch, and of the PR when merged) are run on every push.
If you need advice, mention @reviewer and ask questions in a PR comment.
When ready for merge, request a review from the "Reviewer" menu on the PR.
All work must go through a pull-request review before reaching main
- Never commit or push directly to main

The main branch is for ready-to-deploy release quality code

Any team member can review (but not the PR author)
- try to cycle this around so that everyone becomes familiar with the code
Try to cycle reviewers around the project's team: so that all members get familiar with all work.
Once a reviewer approves your PR, you can hit the merge button
Default to a 'Squash Merge', adding your changes to the main branch as a single commit that can be easily rolled back if need be

Reviewing code

The Turing Way provides an overview of best practices - it comes as recommended reading and includes some possible workflows for code review - great if you're unsure what you're typically looking for during a code review.

Roadmap

Initial Research
Minimum viable product <-- You are Here
Alpha Release
Feature-Complete Release

Acknowledgements

This work was funded through the UCL CDI.

ucl-arc / graffinity-cdi-llm-api Goto Github PK