Code Monkey home page Code Monkey logo

chatgpt_like_experience_locally's Introduction

ChatGPT Like Experience Offline

Motivation: One year later, what is like be able run chatgpt like capable model locally / offline

mimic chatgpt like experience locally using latest open source LLM models for free. in 3 easy steps

  • step-1. select the model server you like based on your hardware
  • step-2. start chatbot UI
  • step-3. launch browser http://localhost:3000

Chatbot UI multi-lingual

Chatbot UI

Open Source Models

Benchmark

  • OpenChat claims "The first 7B model that Achieves Comparable Results with ChatGPT (March)!"
  • Zephyr claims the highest ranked 7B chat model on the MT-Bench and AlpacaEval benchmarks:
  • Mistral-7B claims outperforms Llama 2 13B across all evaluated benchmarks and Llama 1 34B in reasoning, mathematics, and code generation.

There are so many opensource LLM out there, which one is good? see compare the performance of different LLM that can be deployed locally on consumer hardware. try it yourself in google colab include free GPU Local-LLM-Comparison-Colab-UI

1.Run Full Model Server

Setup Full Openchat Model Server vLLM (click to expand)

Requirement: must have a GPU with 24GB vram Experience: ChatGPT 3.5 like fast inference speed

## create environment
conda create -y --name openchat
conda activate openchat
conda install -y python=3.11
pip3 install torch torchvision torchaudio
pip3 install ochat

## run openchat server 
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 --engine-use-ray --worker-use-ray

1.Run Quantized Model Server in GGUF format

here's some example models, it can be any other open source models in GGUF format.

Note: adjust OFFLOAD_GPU_LAYERS value based on your GPU ram

OFFLOAD_GPU_LAYERS=0   # use CPU only 
OFFLOAD_GPU_LAYERS=16  # move 16 layers to GPU and rest in CPU
OFFLOAD_GPU_LAYERS=35  # move 35 layers to GPU  
Setup Llama_cpp_python GGUF Model Server (click to expand)
  • Requirement: flexible GPU or CPU only or mixed offloading
  • Experience: varies depends on the type of model
## create environment
virtualenv venv --python=3.10

pip3 install torch torchvision torchaudio

## install llama-cpp-python (for your environment)
(for example in linux with cuda support )
### Install Server with OpenAI Compatible API - with CUDA GPU support
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server]

CMAKE_ARGS="-DLLAMA_CUBLAS=on -DBUILD_SHARED_LIBS=ON" FORCE_CMAKE=1 pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir

Using Zephyr-7B

run server with Zephyr-7B (click to expand) echo "serving [Zerphyr] - multilingual model"

export MODEL_FILE="./models/zephyr-7b-beta.Q5_K_M.gguf" export MODEL_ID="TheBloke/zephyr-7B-beta.Q5_K_M.gguf" export OFFLOAD_GPU_LAYERS=35
export HOST=0.0.0.0 export PORT=8000 export CHAT_FORMAT="chatml" export CONTEXT_SIZE=4096

python3 -m llama_cpp.server
--n_gpu_layers $OFFLOAD_GPU_LAYERS
--model $MODEL_FILE
--model_alias $MODEL_ID
--chat_format $CHAT_FORMAT
--n_ctx $CONTEXT_SIZE
--host $HOST
--port $PORT
--seed 123

Using Mistral

run server with Mistral-7B (click to expand) echo "serving [mistral 7b]"

export MODEL_FILE="./models/mistral-7b-instruct-v0.1.Q5_K_M.gguf" export MODEL_ID="TheBloke/mistral-7b-instruct-v0.1.Q5_K_M.GGUF" export OFFLOAD_GPU_LAYERS=35 export HOST=0.0.0.0 export PORT=8000 export CHAT_FORMAT="vicuna" export CONTEXT_SIZE=4096

python3 -m llama_cpp.server
--n_gpu_layers $OFFLOAD_GPU_LAYERS
--model $MODEL_FILE
--model_alias $MODEL_ID
--chat_format $CHAT_FORMAT
--n_ctx $CONTEXT_SIZE
--host $HOST
--port $PORT
--seed 123

Using openchat-3.5

run server with Openchat-3.5 (click to expand) echo "serving [openchat 3.5]"

export MODEL_FILE="./models/openchat_3.5.Q5_K_M.gguf" export MODEL_ID="TheBloke/openchat_3.5.Q5_K_M.GGUF" export OFFLOAD_GPU_LAYERS=35 export HOST=0.0.0.0 export PORT=8000 #export CHAT_FORMAT="llama-2" export CHAT_FORMAT="vicuna" export CONTEXT_SIZE=4096

python3 -m llama_cpp.server
--n_gpu_layers $OFFLOAD_GPU_LAYERS
--model $MODEL_FILE
--model_alias $MODEL_ID
--chat_format $CHAT_FORMAT
--n_ctx $CONTEXT_SIZE
--host $HOST
--port $PORT
--seed 123

2.Running Chatbot Web UI

Setup ChatBot UI (click to expand)

Requirement: nodejs 18.x Experience: ChatGPT 3.5 like User Experience in chatting

## create environment
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
node --version
npm --version

## run chatbotui server 
echo "GGUF Chatbot UI"
export NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT="You are ChatGPT, a large language model trained by OpenAI. Follow the user's instructions carefully. Respond using markdown"
export NEXT_PUBLIC_DEFAULT_TEMPERATURE=0.5
export DEFAULT_MODEL=gpt-3.5-turbo
export OPENAI_API_KEY=EMPTY
export OPENAI_API_TYPE=openai

## openchat vllm server
#export OPENAI_API_HOST=http://localhost:18888
## llama_cpp_python GGUF file server
export OPENAI_API_HOST=http://127.0.0.1:8000
cd gguf-chatbot-ui
npm install
npm run dev

Credits:

All contributor in open source projects including llama.cpp, llama_cpp_python and chatbot.ui for their awesome work.

chatgpt_like_experience_locally's People

Contributors

minyang-chen avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.