API for converting user searches to robust LLM prompts and returning a response.
This project is developed in collaboration with the Centre for Advanced Research Computing (ARC), University College London and the Centre for Digital Innovation (CDI), University College London.
Harry Moss ([email protected])
Sanaz Jabbari ([email protected])
Nik Khadijah Nik Aznan ([email protected])
Centre for Advanced Research Computing, University College London ([email protected])
llm-api
requires Python 3.11 or newer.
We recommend installing in a project specific virtual environment created using a environment management tool such as Conda. To install the latest development version of llm-api
using pip
in the currently active environment run
pip install git+https://github.com/UCL-ARC/graffinity-cdi-llm-api.git
Alternatively create a local clone of the repository with
git clone https://github.com/UCL-ARC/graffinity-cdi-llm-api.git
and then install in editable mode by running
pip install -e .
This is a crucial step in running the application and should not be skipped! We use Pydantic settings management to configure and verify settings such as API keys, LLM model choice and, when running via Docker Compose, port choice. Pydantic will preferentially set the variables defined in config.py from existing environment variables, before reading from a .env
file in the root directory of the repository. An example .env.example
file is provided showing the correct naming scheme for all required settings variables, with a prefix defined in config.py. Model names are defined as StrEnums
to show developers the possible values each variable may take.
Before running the application (either locally or in a container), rename .env.example
to .env
and provide a value for each variable. Particular attention should be paid to API keys and model names.
A description of each variable is provided below:
LLM_API_OPENAI_API_KEY
is your OpenAI API key. You must have completed billing details and preloaded credit to your account before models are callable.LLM_API_OPENAI_LLM_NAME
is set with a prefilled value in.env.example
and is the recommended OpenAI model for use.LLM_API_AWS_ACCESS_KEY_ID
is the Access Key ID from an AWS account with the Allow permission set on thebedrock:InvokeModel
action on the resourcearn:aws:bedrock:*::foundation-model/*
.LLM_API_AWS_SECRET_ACCESS_KEY
is the corresponding secret key to the above Access Key ID.LLM_API_AWS_BEDROCK_MODEL_ID
is the bedrock model ID string. As a default, this is set toanthropic.claude-v2
.API_PORT
is set to 9000 as a default. Feel free to change this as required.
The FastAPI application can be run locally with
python src/llm_api/main.py
This runs the application via the Uvicorn ASGI server on http://localhost:8000, with automatically generated OpenAPI Swagger documentation available at http://localhost:8000/docs. Any changes to the code are immediately reflected in the running application.
This default behaviour may be changed by running the uvicorn server directly via
uvicorn llm_api.main:app --host {your host here} --port {your port here}
Running the application with a Uvicorn server is intended only for local testing and is not recommended for use in production. For production deployment, please see Docker deployment via Docker Compose.
Dockerfile
contains instructions for building a docker image that runs this application with a Gunicorn server. Gunicorn configuration can be found in src/llm_api/gunicorn_conf.py
. Ensure the API_PORT
variable is defined in the .env
file.
Container orchestration is performed by Docker Compose, allowing for multiple networked containers. You may prefer to use Kubernetes or similar, and this is just shown here as an example. The compose.yml
file defines the service name, relevant env file to use, methods of determining container health, port mappings and internal network names. We currently deploy a single service, though this can be easily extended using Docker Compose.
Build and deploy the application on http://localhost:{API_PORT}
with the following command
docker compose --project-name ${PROJECT_NAME} up --build
the application can be taken down via
docker compose --project-name ${PROJECT_NAME} down
Any additional running containers associated with ${PROJECT_NAME}
but not defined in compose.yml
(for whatever reason) can be stopped with
docker compose --project-name llm_api down --remove-orphans
Tests can be run across all compatible Python versions in isolated environments using
tox
by running
tox
To run tests manually in a Python environment with pytest
installed run
pytest tests
To contribute to the project as a developer, use the following as a guide. These are based on ARC Collaborations group practices and code review documentation.
Install the project and development dependencies via pip
with
pip install -e ".[dev,tests]"
Install pre-commit hooks with
pre-commit install
Future git commit
operations will now run pre-commit hooks to ensure code style and typing conventions are followed. Please remember to do this!
To make explicit some of the potentially implicit:
- We will target Python versions
>= 3.11
- We will use ruff for linting and code formatting to standardise code, improve legibility and speed up code reviews
- Function arguments and return types will be annotated, with type checking via mypy
- We will use docstrings to annotate classes, class methods and functions
- If you use Visual Studio Code, autoDocstring is recommended to speed this along.
We use a secret detection pre-commit hook to ensure that no passwords, API keys or similarly sensitive credentials are committed to the repository. If you add in some fake credentials (for testing purposes or similar), please update the .secrets.baseline
file in order for CI checks on any resulting pull requests to pass. You can update this file by running
detect-secrets scan > .secrets.baseline
from the root directory of the repository. The detect-secrets
dependency is installed via pip
if you select the dev
optional dependencies.
- Create a branch for each new piece of work with a suitable descriptive name, such as
feature-newgui
oradding-scaffold
- Do all work on this branch
- Open a new PR for that branch to contain discussion about your changes
- Do this early and set as a 'Draft PR' (on GitHub) until you are ready to merge to make your work visible to other developers
- Make sure the repository has CI configured so tests (ideally both of the branch, and of the PR when merged) are run on every push.
- If you need advice, mention @reviewer and ask questions in a PR comment.
- When ready for merge, request a review from the "Reviewer" menu on the PR.
- All work must go through a pull-request review before reaching
main
- Never commit or push directly to
main
- Never commit or push directly to
The main
branch is for ready-to-deploy release quality code
- Any team member can review (but not the PR author)
- try to cycle this around so that everyone becomes familiar with the code
- Try to cycle reviewers around the project's team: so that all members get familiar with all work.
- Once a reviewer approves your PR, you can hit the merge button
- Default to a 'Squash Merge', adding your changes to the main branch as a single commit that can be easily rolled back if need be
The Turing Way provides an overview of best practices - it comes as recommended reading and includes some possible workflows for code review - great if you're unsure what you're typically looking for during a code review.
- Initial Research
- Minimum viable product <-- You are Here
- Alpha Release
- Feature-Complete Release
This work was funded through the UCL CDI.