Code Monkey home page Code Monkey logo

localllm's Introduction

localllm

Run LLMs locally on Cloud Workstations. Uses:

In this guide:

Running as a Cloud Workstation

This repository includes a Dockerfile that can be used to create a custom base image for a Cloud Workstation environment that includes the llm tool.

To get started, you'll need to have a GCP Project and have the gcloud CLI installed.

  1. Set environment variables

    1. Set the PROJECT_ID and PROJECT_NUM environment variables from your GCP project. You must modify the values.

      export PROJECT_ID=<project-id>
      export PROJECT_NUM=<project-num>
    2. Set other needed environment variables. You can modify the values.

      export REGION=us-central1
      export LOCALLLM_REGISTRY=localllm-registry
      export LOCALLLM_IMAGE_NAME=localllm
      export LOCALLLM_CLUSTER=localllm-cluster
      export LOCALLLM_WORKSTATION=localllm-workstation
      export LOCALLLM_PORT=8000
  2. Set the default project.

    gcloud config set project $PROJECT_ID
  3. Enable needed services.

    gcloud services enable \
      cloudbuild.googleapis.com \
      workstations.googleapis.com \
      container.googleapis.com \
      containeranalysis.googleapis.com \
      containerscanning.googleapis.com \
      artifactregistry.googleapis.com
  4. Create an Artifact Registry repository for docker images.

    gcloud artifacts repositories create $LOCALLLM_REGISTRY \
      --location=$REGION \
      --repository-format=docker
  5. Build and push the image to Artifact Registry using Cloud Build. Details are in cloudbuild.yaml.

    gcloud builds submit . \
        --substitutions=_IMAGE_REGISTRY=$LOCALLLM_REGISTRY,_IMAGE_NAME=$LOCALLLM_IMAGE_NAME
  6. Configure a Cloud Workstation cluster.

    gcloud workstations clusters create $LOCALLLM_CLUSTER \
      --region=$REGION
  7. Create a Cloud Workstation configuration. We suggest using a machine type of e2-standard-32 which has 32 vCPU, 16 core and 128 GB memory.

    gcloud workstations configs create $LOCALLLM_WORKSTATION \
    --region=$REGION \
    --cluster=$LOCALLLM_CLUSTER \
    --machine-type=e2-standard-32 \
    --container-custom-image=us-central1-docker.pkg.dev/${PROJECT_ID}/${LOCALLLM_REGISTRY}/${LOCALLLM_IMAGE_NAME}:latest
  8. Create a Cloud Workstation.

    gcloud workstations create $LOCALLLM_WORKSTATION \
    --cluster=$LOCALLLM_CLUSTER \
    --config=$LOCALLLM_WORKSTATION \
    --region=$REGION
  9. Grant access to the default Cloud Workstation service account.

    gcloud artifacts repositories add-iam-policy-binding $LOCALLLM_REGISTRY \
      --location=$REGION \
      --member=serviceAccount:service-$PROJECT_NUM@gcp-sa-workstationsvm.iam.gserviceaccount.com \
      --role=roles/artifactregistry.reader
  10. Start the workstation.

    gcloud workstations start $LOCALLLM_WORKSTATION \
      --cluster=$LOCALLLM_CLUSTER \
      --config=$LOCALLLM_WORKSTATION \
      --region=$REGION
  11. Connect to the workstation using ssh. Alternatively, you can connect to the workstation interactively in the browser.

    gcloud workstations ssh $LOCALLLM_WORKSTATION \
      --cluster=$LOCALLLM_CLUSTER \
      --config=$LOCALLLM_WORKSTATION \
      --region=$REGION
  12. Start serving the default model from the repo.

    llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF $LOCALLLM_PORT
  13. Get the hostname of the workstation using:

    gcloud workstations describe $LOCALLLM_WORKSTATION \
      --cluster=$LOCALLLM_CLUSTER \
      --config=$LOCALLLM_WORKSTATION \
      --region=$REGION
  14. Interact with the model by visiting the live OpenAPI documentation page: https://$LOCALLLM_PORT-$LLM_HOSTNAME/docs.

llm commands

Assumes that models are downloaded to ~/.cache/huggingface/hub/. This is the default cache path used by Hugging Face Hub library and only supports .gguf files.

If you're using models from TheBloke and you don't specify a filename, we'll attempt to use the model with 4 bit medium quantization, or you can specify a filename explicitly.

  1. List downloaded models.

    llm list
  2. List running models.

    llm ps
  3. Start serving models.

    1. Start serving the default model from the repo. Download if not present.

      llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000
    2. Start serving a specific model. Download if not present.

      llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf 8000
  4. Stop serving models.

    1. Stop serving all models from the repo.

      llm kill TheBloke/Llama-2-13B-Ensemble-v5-GGUF
    2. Stop serving a specific model.

      llm kill TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf
  5. Download models.

    1. Download the default model from the repo.
      llm pull TheBloke/Llama-2-13B-Ensemble-v5-GGUF
    2. Download a specific model from the repo.
      llm pull TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf
  6. Remove models.

    1. Remove all models downloaded from the repo.

      llm rm TheBloke/Llama-2-13B-Ensemble-v5-GGUF
    2. Remove a specific model from the repo.

      llm rm TheBloke/Llama-2-13B-Ensemble-v5-GGUF --filename llama-2-13b-ensemble-v5.Q4_K_S.gguf

Running locally

  1. Install the tools.

    # Install the tools
    pip3 install openai
    pip3 install ./llm-tool/.
  2. Download and run a model.

    llm run TheBloke/Llama-2-13B-Ensemble-v5-GGUF 8000
  3. Try out a query. The default query is for a haiku about cats.

    python3 querylocal.py
  4. Interact with the Open API interface via the /docs extension. For the above, visit http://localhost:8000/docs.

LLM Disclaimer

This project imports freely available LLMs and makes them available from Cloud Workstations. We recommend independently verifying any content generated by the models. We do not assume any responsibility or liability for the use or interpretation of generated content.

localllm's People

Contributors

bobcatfish avatar kmontg avatar bdmorgan avatar wauplin avatar jerop avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.