Code Monkey home page Code Monkey logo

llama-utils's Introduction

llama-utils

This is a project that shows you how to run LLM inference and build OpenAI-compatible API services for the Llama2 series of LLMswith Rust and WasmEdge.

How to use?

  • The folder api-server includes the source code and instructions to create OpenAI-compatible API service for your llama2 model or the LLama2 model itself.
  • The folder chat includes the source code and instructions to run llama2 models that can have continuous conversations.
  • The folder simple includes the source code and instructions to run llama2 models that can answer one question.

Why use Rust + Wasm

The Rust+Wasm stack provides a strong alternative to Python in AI inference.

  • Lightweight. The total runtime size is 30MB as opposed to 4GB for Python and 350MB for Ollama.
  • Fast. Full native speed on GPUs.
  • Portable. Single cross-platform binary on different CPUs, GPUs, and OSes.
  • Secure. Sandboxed and isolated execution on untrusted devices.
  • Container-ready. Supported in Docker, containerd, Podman, and Kubernetes.

For more information, please check out Fast and Portable Llama2 Inference on the Heterogeneous Edge.

Supported Models

The llama-utils project, in theory, supports all Language Learning Models (LLMs) based on the llama2 framework in GGUF format. Below is a list of models that have been successfully verified to work on both Mac and Jetson Orin platforms. We are committed to continuously expanding this list by verifying additional models. If you have successfully operated other LLMs, don't hesitate to contribute by creating a Pull Request (PR) to help extend this list.

Requirements

For macOS (apple silicon)

Install WasmEdge 0.13.5+WASI-NN ggml plugin(Metal enabled on apple silicon) via installer

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
# After install the wasmedge, you have to activate the environment.
# Assuming you use zsh (the default shell on macOS), you will need to run the following command
source $HOME/.zshenv

For Ubuntu (>= 20.04)

CUDA enabled

The installer from WasmEdge 0.13.5 will detect cuda automatically.

If CUDA is detected, the installer will always attempt to install a CUDA-enabled version of the plugin.

Install WasmEdge 0.13.5+WASI-NN ggml plugin via installer

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
# After installing the wasmedge, you have to activate the environment.
# Assuming you use bash (the default shell on Ubuntu), you will need to run the following command
source $HOME/.bashrc

This version is verified on the following platforms:

  1. Nvidia Jetson AGX Orin 64GB developer kit
  2. Intel i7-10700 + Nvidia GTX 1080 8G GPU
  3. AWS EC2 g5.xlarge + Nvidia A10G 24G GPU + Amazon deep learning base Ubuntu 20.04

CPU only

If the CPU is the only available hardware on your machine, the installer will install the OpenBLAS version of the plugin instead.

You may need to install libopenblas-dev by apt update && apt install -y libopenblas-dev.

Install WasmEdge 0.13.5+WASI-NN ggml plugin via installer

apt update && apt install -y libopenblas-dev # You may need sudo if the user is not root.
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
# After installing the wasmedge, you have to activate the environment.
# Assuming you use bash (the default shell on Ubuntu), you will need to run the following command
source $HOME/.bashrc

For General Linux

Install WasmEdge 0.13.5+WASI-NN ggml plugin via installer

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
# After install the wasmedge, you have to activate the environment.
# Assuming you use bash (the default shell on Ubuntu), you will need to run the following command
source $HOME/.bashrc

Troubleshooting

  • After running apt update && apt install -y libopenblas-dev, you may encounter the following error:

    ...
    E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
    E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?

    This indicates that you are not logged in as root. Please try installing again using the sudo command:

    sudo apt update && sudo apt install -y libopenblas-dev
  • After running the wasmedge command, you may receive the following error:

    [2023-10-02 14:30:31.227] [error] loading failed: invalid path, Code: 0x20
    [2023-10-02 14:30:31.227] [error]     load library failed:libblas.so.3: cannot open shared object file: No such file or directory
    [2023-10-02 14:30:31.227] [error] loading failed: invalid path, Code: 0x20
    [2023-10-02 14:30:31.227] [error]     load library failed:libblas.so.3: cannot open shared object file: No such file or directory
    unknown option: nn-preload

    This suggests that your plugin installation was not successful. To resolve this issue, please attempt to install your desired plugin again.

  • After executing the wasmedge command, you might encounter the error message: [WASI-NN] GGML backend: Error: unable to init model. This error signifies that the model setup was not successful. To resolve this issue, please verify the following:

    1. Check if your model file and the WASM application are located in the same directory. The WasmEdge runtime requires them to be in the same location to locate the model file correctly.

    2. Ensure that the model has been downloaded successfully. You can use the command shasum -a 256 <gguf-filename> to verify the model's sha256sum. Compare your result with the correct sha256sum available on the Hugging Face page for the model.

      image

Credits

The WASI-NN ggml plugin embedded llama.cpp as its backend.

llama-utils's People

Contributors

alabulei1 avatar apepkuss avatar hydai avatar juntao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.