ChatGPT Like Experience Offline

Motivation: One year later, what is like be able run chatgpt like capable model locally / offline

mimic chatgpt like experience locally using latest open source LLM models for free. in 3 easy steps

step-1. select the model server you like based on your hardware
step-2. start chatbot UI
step-3. launch browser http://localhost:3000

Open Source Models

Benchmark

OpenChat claims "The first 7B model that Achieves Comparable Results with ChatGPT (March)!"
Zephyr claims the highest ranked 7B chat model on the MT-Bench and AlpacaEval benchmarks:
Mistral-7B claims outperforms Llama 2 13B across all evaluated benchmarks and Llama 1 34B in reasoning, mathematics, and code generation.

There are so many opensource LLM out there, which one is good? see compare the performance of different LLM that can be deployed locally on consumer hardware. try it yourself in google colab include free GPU Local-LLM-Comparison-Colab-UI

1.Run Full Model Server

Setup Full Openchat Model Server vLLM (click to expand)

Requirement: must have a GPU with 24GB vram Experience: ChatGPT 3.5 like fast inference speed

## create environment
conda create -y --name openchat
conda activate openchat
conda install -y python=3.11
pip3 install torch torchvision torchaudio
pip3 install ochat

## run openchat server 
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 --engine-use-ray --worker-use-ray

1.Run Quantized Model Server in GGUF format

here's some example models, it can be any other open source models in GGUF format.

Note: adjust OFFLOAD_GPU_LAYERS value based on your GPU ram

OFFLOAD_GPU_LAYERS=0   # use CPU only 
OFFLOAD_GPU_LAYERS=16  # move 16 layers to GPU and rest in CPU
OFFLOAD_GPU_LAYERS=35  # move 35 layers to GPU

Setup Llama_cpp_python GGUF Model Server (click to expand)

Requirement: flexible GPU or CPU only or mixed offloading
Experience: varies depends on the type of model

## create environment
virtualenv venv --python=3.10

pip3 install torch torchvision torchaudio

## install llama-cpp-python (for your environment)
(for example in linux with cuda support )
### Install Server with OpenAI Compatible API - with CUDA GPU support
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server]

CMAKE_ARGS="-DLLAMA_CUBLAS=on -DBUILD_SHARED_LIBS=ON" FORCE_CMAKE=1 pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir

Using Zephyr-7B

run server with Zephyr-7B (click to expand)

echo "serving [Zerphyr] - multilingual model"

export MODEL_FILE="./models/zephyr-7b-beta.Q5_K_M.gguf" export MODEL_ID="TheBloke/zephyr-7B-beta.Q5_K_M.gguf" export OFFLOAD_GPU_LAYERS=35
export HOST=0.0.0.0 export PORT=8000 export CHAT_FORMAT="chatml" export CONTEXT_SIZE=4096

python3 -m llama_cpp.server
--n_gpu_layers $OFFLOAD_GPU_LAYERS
--model $MODEL_FILE
--model_alias $MODEL_ID
--chat_format $CHAT_FORMAT
--n_ctx $CONTEXT_SIZE
--host $HOST
--port $PORT
--seed 123

Using Mistral

run server with Mistral-7B (click to expand)

echo "serving [mistral 7b]"

export MODEL_FILE="./models/mistral-7b-instruct-v0.1.Q5_K_M.gguf" export MODEL_ID="TheBloke/mistral-7b-instruct-v0.1.Q5_K_M.GGUF" export OFFLOAD_GPU_LAYERS=35 export HOST=0.0.0.0 export PORT=8000 export CHAT_FORMAT="vicuna" export CONTEXT_SIZE=4096

python3 -m llama_cpp.server
--n_gpu_layers $OFFLOAD_GPU_LAYERS
--model $MODEL_FILE
--model_alias $MODEL_ID
--chat_format $CHAT_FORMAT
--n_ctx $CONTEXT_SIZE
--host $HOST
--port $PORT
--seed 123

Using openchat-3.5

run server with Openchat-3.5 (click to expand)

echo "serving [openchat 3.5]"

export MODEL_FILE="./models/openchat_3.5.Q5_K_M.gguf" export MODEL_ID="TheBloke/openchat_3.5.Q5_K_M.GGUF" export OFFLOAD_GPU_LAYERS=35 export HOST=0.0.0.0 export PORT=8000 #export CHAT_FORMAT="llama-2" export CHAT_FORMAT="vicuna" export CONTEXT_SIZE=4096

python3 -m llama_cpp.server
--n_gpu_layers $OFFLOAD_GPU_LAYERS
--model $MODEL_FILE
--model_alias $MODEL_ID
--chat_format $CHAT_FORMAT
--n_ctx $CONTEXT_SIZE
--host $HOST
--port $PORT
--seed 123

2.Running Chatbot Web UI

Setup ChatBot UI (click to expand)

Requirement: nodejs 18.x Experience: ChatGPT 3.5 like User Experience in chatting

## create environment
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
node --version
npm --version

## run chatbotui server 
echo "GGUF Chatbot UI"
export NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT="You are ChatGPT, a large language model trained by OpenAI. Follow the user's instructions carefully. Respond using markdown"
export NEXT_PUBLIC_DEFAULT_TEMPERATURE=0.5
export DEFAULT_MODEL=gpt-3.5-turbo
export OPENAI_API_KEY=EMPTY
export OPENAI_API_TYPE=openai

## openchat vllm server
#export OPENAI_API_HOST=http://localhost:18888
## llama_cpp_python GGUF file server
export OPENAI_API_HOST=http://127.0.0.1:8000
cd gguf-chatbot-ui
npm install
npm run dev

Credits:

All contributor in open source projects including llama.cpp, llama_cpp_python and chatbot.ui for their awesome work.

minyang-chen / chatgpt_like_experience_locally Goto Github PK

chatgpt_like_experience_locally's Introduction

ChatGPT Like Experience Offline

Open Source Models

1.Run Full Model Server

1.Run Quantized Model Server in GGUF format

Using Zephyr-7B

Using Mistral

Using openchat-3.5

2.Running Chatbot Web UI

Credits:

chatgpt_like_experience_locally's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent