Summary One solution that involves utilizing the Wasmedge runtime'

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I added an pure crun + llama example <a href="https://github.com/second-state/wasmedge

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

feat: Crun + GGML plugin tracking issue about wasmedge HOT 13 CLOSED

CaptainVincent commented on May 28, 2024

feat: Crun + GGML plugin tracking issue

from wasmedge.

Comments (13)

juntao commented on May 28, 2024

Does this help? It seems that you can access CUDA devices from Podman + crun.

NVIDIA/nvidia-container-toolkit#46

from wasmedge.

CaptainVincent commented on May 28, 2024

I am not sure ~
IIUC, this involves taking a different approach with the NVIDIA runtime.
Running application with the NVIDIA runtime doesn't actually coexist with the other runtime (crun + wasmedge handler).

from wasmedge.

juntao commented on May 28, 2024

The way I understand this is that there is an "Nvidia runtime" that acts as a runc / crun replacement.

But there is also an Nvidia library that works with crio / containers + runc (ie pre start hook etc) to accomplish the same. I guess we could explore this approach for crun as well?

from wasmedge.

sohankunkerkar commented on May 28, 2024

GPU (cuda) support, If we execute the above command on a machine with an NVIDIA graphics card, the below error will be encountered, and it will switch back to CPU mode.

I'm interested in running these models in CPU mode. Can I execute them on my local machine? What adjustments must I make to ensure it works with the CRI-O + crun (wasm) combination?

from wasmedge.

sohankunkerkar commented on May 28, 2024

@CaptainVincent Could you provide more information on the above question?

from wasmedge.

CaptainVincent commented on May 28, 2024

Certainly, it can be run on a local machine. Later, I will add a CI workflow using crun to run the GGM as a demo. Once I've completed it, I will provide the link here.

from wasmedge.

CaptainVincent commented on May 28, 2024

I added an pure crun + llama example here.

The steps aligns with the description in this issue. It's worth noting that this experimental feature is available only in our forked version of crun. The modifications we made differ exclusively in the wasmedge handler from the official version. If you wish to try a newer version of crun, migrate the changes is easily achievable. The decision to maintain this specific version is partly influenced by compatibility constraints with k8s.

Here is my test WASM image source. You can build your own image containing your WASM application, too. There are various tools for building Docker images, and the most straightforward approach is to refer to our example, write a Dockerfile to compile your application, and copy it into the image.

If you have any questions, feel free to bring them up.

from wasmedge.

sohankunkerkar commented on May 28, 2024

@CaptainVincent Thanks for the information. I was wondering what needs to be changed to follow a similar workflow for CRI-O.

from wasmedge.

CaptainVincent commented on May 28, 2024

If you are still using our fork of crun as the runtime for cri-o, you should pay attention to the following modifications.

You need to bind the plugin folder to a specific path to ensure that the cri-o service can look up the location of dynamic libraries (following the instructions in the CI action; at this point, the folder will contain the plugin library and its dependencies). Alternatively, you can modify the LD_LIBRARY_PATH of the cri-o service.
Below path is for containerd only
--mount type=bind,src=$HOME/.wasmedge/plugin/,dst=/opt/containerd/lib,options=bind:ro
Additionally, update the following environment variable to let WasmEdge know where to load our plugin path:
--env WASMEDGE_PLUGIN_PATH=/opt/containerd/lib

from wasmedge.

sohankunkerkar commented on May 28, 2024

@CaptainVincent Thanks for your pointers. I have made a few changes to crun to load the wasmedge plugin to /usr/lib/wasmedge. Here's the PR for it. This works fine with crun-wasm. I'm using this config.json to create a container.

Config.json

cat config.json 
{
  "ociVersion": "1.0.2-dev",
  "process": {
  	"terminal": true,
  	"user": {
  		"uid": 0,
  		"gid": 0
  	},
  	"args": [
  		"/llama-chat.wasm"
  	],
  	"env": [
  		"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  		"WASMEDGE_PLUGIN_PATH=/usr/lib/wasmedge",
  		"WASMEDGE_WASINN_PRELOAD=default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf",
  		"TERM=xterm"
  	],
  	"cwd": "/",
  	"capabilities": {
  		"bounding": [
  			"CAP_AUDIT_WRITE",
  			"CAP_KILL",
  			"CAP_NET_BIND_SERVICE"
  		],
  		"effective": [
  			"CAP_AUDIT_WRITE",
  			"CAP_KILL",
  			"CAP_NET_BIND_SERVICE"
  		],
  		"permitted": [
  			"CAP_AUDIT_WRITE",
  			"CAP_KILL",
  			"CAP_NET_BIND_SERVICE"
  		],
  		"ambient": [
  			"CAP_AUDIT_WRITE",
  			"CAP_KILL",
  			"CAP_NET_BIND_SERVICE"
  		]
  	},
  	"rlimits": [
  		{
  			"type": "RLIMIT_NOFILE",
  			"hard": 1024,
  			"soft": 1024
  		}
  	],
  	"noNewPrivileges": true
  },
  "root": {
  	"path": "rootfs",
  	"readonly": false
  },
  "hostname": "wasm-test",
  "mounts": [
  	{
  		"destination": "/proc",
  		"type": "proc",
  		"source": "proc"
  	},
  	{
  		"destination": "/dev",
  		"type": "tmpfs",
  		"source": "tmpfs",
  		"options": [
  			"nosuid",
  			"strictatime",
  			"mode=755",
  			"size=65536k"
  		]
  	},
  	{
  		"destination": "/dev/pts",
  		"type": "devpts",
  		"source": "devpts",
  		"options": [
  			"nosuid",
  			"noexec",
  			"newinstance",
  			"ptmxmode=0666",
  			"mode=0620",
  			"gid=5"
  		]
  	},
  	{
  		"destination": "/dev/shm",
  		"type": "tmpfs",
  		"source": "shm",
  		"options": [
  			"nosuid",
  			"noexec",
  			"nodev",
  			"mode=1777",
  			"size=65536k"
  		]
  	},
  	{
  		"destination": "/dev/mqueue",
  		"type": "mqueue",
  		"source": "mqueue",
  		"options": [
  			"nosuid",
  			"noexec",
  			"nodev"
  		]
  	},
  	{
  		"destination": "/sys",
  		"type": "sysfs",
  		"source": "sysfs",
  		"options": [
  			"nosuid",
  			"noexec",
  			"nodev",
  			"ro"
  		]
  	},
  	{
  		"destination": "/sys/fs/cgroup",
  		"type": "cgroup",
  		"source": "cgroup",
  		"options": [
  			"nosuid",
  			"noexec",
  			"nodev",
  			"relatime",
  			"ro"
  		]
  	}
  ],
  "linux": {
  	"resources": {
  		"devices": [
  			{
  				"allow": false,
  				"access": "rwm"
  			}
  		]
  	},
  	"namespaces": [
  		{
  			"type": "pid"
  		},
  		{
  			"type": "network"
  		},
  		{
  			"type": "ipc"
  		},
  		{
  			"type": "uts"
  		},
  		{
  			"type": "mount"
  		},
  		{
  			"type": "cgroup"
  		}
  	],
  	"maskedPaths": [
  		"/proc/acpi",
  		"/proc/asound",
  		"/proc/kcore",
  		"/proc/keys",
  		"/proc/latency_stats",
  		"/proc/timer_list",
  		"/proc/timer_stats",
  		"/proc/sched_debug",
  		"/sys/firmware",
  		"/proc/scsi"
  	],
  	"readonlyPaths": [
  		"/proc/bus",
  		"/proc/fs",
  		"/proc/irq",
  		"/proc/sys",
  		"/proc/sysrq-trigger"
  	]
  }
}

$ ls -ltr rootfs/
total 4673668
-rw-r--r--. 1 skunkerk skunkerk 4783156800 Mar  8 13:54 llama-2-7b-chat-q5_k_m.gguf
-rwxr-xr-x. 1 skunkerk skunkerk    2675405 Mar  8 13:54 llama-chat.wasm
drwxr-xr-x. 1 skunkerk skunkerk          0 Mar  8 13:54 sys
drwxr-xr-x. 1 skunkerk skunkerk          0 Mar  8 13:54 proc
drwxr-xr-t. 1 skunkerk skunkerk          0 Mar  8 13:54 dev
drwxr-xr-t. 1 skunkerk skunkerk          6 Mar  8 13:54 usr

$ ❯ WASMEDGE_PLUGIN_MOUNT=/home/skunkerk/.wasmedge/plugin crun-wasm run wasm-test
RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin
[INFO] Model alias: default
[INFO] Prompt context size: 512
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 0.8
[INFO] Top-p sampling (1.0 = disabled): 0.9
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] presence penalty (0.0 = disabled): 0
[INFO] frequency penalty (0.0 = disabled): 0
[INFO] Use default system prompt
[INFO] Prompt template: Llama2Chat
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
[INFO] Plugin version: b2230 (commit 89febfed)

================================== Running in interactive mode. ===================================

    - Press [Ctrl+C] to interject at any time.
    - Press [Return] to end the input.
    - For multi-line inputs, end each line with '\' and press [Return] to get another line.


[You]:

Now, I tried integrating it with CRI-O and Kubernetes. I made a few changes to CRI-O to pass the env variable WASMEDGE_PLUGIN_MOUNT via the crio config.

$ cat llm-wasm.yml
apiVersion: v1
kind: Pod
metadata:
  name: llama-pod
spec:
  containers:
  - name: llama-container
    image: quay.io/sohankunkerkar/llama-crun:v1
    command: ["./llama-chat.wasm"]
    securityContext:
      privileged: true
  restartPolicy: Always
  
  $ cat Contaierfile
  # Use a base image
FROM scratch

ENV WASMEDGE_WASINN_PRELOAD default:GGML:AUTO:/app/model.gguf
ENV WASMEDGE_PLUGIN_PATH=/usr/lib/wasmedge

WORKDIR /app

# Copy wasm files to the directory
COPY llama-chat.wasm /app

COPY model.gguf /app

$ k get po
NAME        READY   STATUS      RESTARTS   AGE
llama-pod   0/1     Completed   0          3h58m

$  k logs llama-pod
[INFO] Model alias: default
[INFO] Prompt context size: 512
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 0.8
[INFO] Top-p sampling (1.0 = disabled): 0.9
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] presence penalty (0.0 = disabled): 0
[INFO] frequency penalty (0.0 = disabled): 0
[INFO] Use default system prompt
[INFO] Prompt template: Llama2Chat
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Not Found"
RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin

I added the debug statement RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin to crun to investigate potential issues. Upon inspection, it appears to be populating the correct value.
The /app directory does have both the files:

root:/var/lib/containers/storage/overlay/323126b6306097007bf68ea1957508ca275cf3a3ee361cd3860a1bc7243986a4/merged# ls
app  dev  etc  proc  run  sys  usr  var

root:/var/lib/containers/storage/overlay/323126b6306097007bf68ea1957508ca275cf3a3ee361cd3860a1bc7243986a4/merged# ls -ltr app
total 4673668
-rw-r--r--. 1 root root 4783156800 Feb 29 00:13 model.gguf
-rwxr-xr-x. 1 root root    2675405 Mar  2 09:47 llama-chat.wasm

I'm still intrigued about this failure -> Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Not Found".
Any idea what could be the problem here?

from wasmedge.

CaptainVincent commented on May 28, 2024

I'm not entirely sure, but I'll give it a try on my machine as well.

from wasmedge.

CaptainVincent commented on May 28, 2024

I have a question: is the order of environment variables the same in crun and cri-o? I noticed that the current implementation relies on the order of WASMEDGE_WASINN_PRELOAD and WASMEDGE_PLUGIN_PATH in the environment variables to determine the flow. However, wasmedge has strong limitation; nn_preload must precede load plugin because the plugin initialization process of loading plugins will handle preload the model. I'm concerned that the order here might lead to nn_preload occurring later than the load plugin process.

from wasmedge.

CaptainVincent commented on May 28, 2024

Close this item because all related issues it tracked have been closed, considering the goal accomplished.

from wasmedge.

feat: Crun + GGML plugin tracking issue about wasmedge HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent