AIKit is a one-stop shop to quickly get started to host, deploy, build and fine-tune large language models (LLMs).
AIKit offers two main capabilities:
-
Inference: AIKit uses LocalAI, which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as Kubectl AI, Chatbot-UI and many more, to send requests to open-source LLMs!
-
Fine Tuning: AIKit offers an extensible fine tuning interface. It supports Unsloth for fast, memory efficient, and easy fine-tuning experience.
๐ For full documentation, please see AIKit website!
- ๐ณ No GPU, Internet access or additional tools needed except for Docker!
- ๐ค Minimal image size, resulting in less vulnerabilities and smaller attack surface with a custom distroless-based image
- ๐ต Fine tune support
- ๐ Easy to use declarative configuration for inference and fine tuning
- โจ OpenAI API compatible to use with any OpenAI API compatible client
- ๐ธ Multi-modal model support
- ๐ผ๏ธ Image generation support with Stable Diffusion
- ๐ฆ Support for GGUF (
llama
), GPTQ (exllama
orexllama2
), EXL2 (exllama2
), and GGML (llama-ggml
) and Mamba models - ๐ข Kubernetes deployment ready
- ๐ฆ Supports multiple models with a single image
- ๐ฅ๏ธ Supports GPU-accelerated inferencing with NVIDIA GPUs
- ๐ Signed images for
aikit
and pre-made models - ๐ Support for non-proprietary and self-hosted container registries to store model images
You can get started with AIKit quickly on your local machine without a GPU!
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3:8b
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama-3-8b-instruct",
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
}'
Output should be similar to:
{"created":1713494426,"object":"chat.completion","id":"fce01ee0-7b5a-452d-8f98-b6cb406a1067","model":"llama-3-8b-instruct","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of applications and services, allowing developers to focus on writing code rather than managing infrastructure."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
That's it! ๐ API is OpenAI compatible so this is a drop-in replacement for any OpenAI API compatible client.
AIKit comes with pre-made models that you can use out-of-the-box!
If it doesn't include a specific model, you can always create your own images, and host in a container registry of your choice!
Model | Optimization | Parameters | Command | Model Name | License |
---|---|---|---|---|---|
๐ฆ Llama 3 | Instruct | 8B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3:8b |
llama-3-8b-instruct |
Llama |
๐ฆ Llama 3 | Instruct | 70B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3:70b |
llama-3-70b-instruct |
Llama |
๐ฆ Llama 2 | Chat | 7B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:7b |
llama-2-7b-chat |
Llama |
๐ฆ Llama 2 | Chat | 13B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:13b |
llama-2-13b-chat |
Llama |
Instruct | 8x7B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b |
mixtral-8x7b-instruct |
Apache | |
Instruct | 3.8B | docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi3:3.8b |
phi-3-3.8b |
MIT |
Model | Optimization | Parameters | Command | Model Name | License |
---|---|---|---|---|---|
๐ฆ Llama 3 | Instruct | 8B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3:8b-cuda |
llama-3-8b-instruct |
Llama |
๐ฆ Llama 3 | Instruct | 70B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3:70b-cuda |
llama-3-70b-instruct |
Llama |
๐ฆ Llama 2 | Chat | 7B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama2:7b-cuda |
llama-2-7b-chat |
Llama |
๐ฆ Llama 2 | Chat | 13B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama2:13b-cuda |
llama-2-13b-chat |
Llama |
Instruct | 8x7B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b-cuda |
mixtral-8x7b-instruct |
Apache | |
Instruct | 3.8B | docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi3:3.8b-cuda |
phi-3-3.8b |
MIT |
๐ For more information and how to fine tune models or create your own images, please see AIKit website!