Code Monkey home page Code Monkey logo

Comments (13)

juntao avatar juntao commented on May 28, 2024

Does this help? It seems that you can access CUDA devices from Podman + crun.

NVIDIA/nvidia-container-toolkit#46

from wasmedge.

CaptainVincent avatar CaptainVincent commented on May 28, 2024

I am not sure ~
IIUC, this involves taking a different approach with the NVIDIA runtime.
Running application with the NVIDIA runtime doesn't actually coexist with the other runtime (crun + wasmedge handler).

from wasmedge.

juntao avatar juntao commented on May 28, 2024

The way I understand this is that there is an "Nvidia runtime" that acts as a runc / crun replacement.

But there is also an Nvidia library that works with crio / containers + runc (ie pre start hook etc) to accomplish the same. I guess we could explore this approach for crun as well?

from wasmedge.

sohankunkerkar avatar sohankunkerkar commented on May 28, 2024

GPU (cuda) support, If we execute the above command on a machine with an NVIDIA graphics card, the below error will be encountered, and it will switch back to CPU mode.

I'm interested in running these models in CPU mode. Can I execute them on my local machine? What adjustments must I make to ensure it works with the CRI-O + crun (wasm) combination?

from wasmedge.

sohankunkerkar avatar sohankunkerkar commented on May 28, 2024

@CaptainVincent Could you provide more information on the above question?

from wasmedge.

CaptainVincent avatar CaptainVincent commented on May 28, 2024

Certainly, it can be run on a local machine. Later, I will add a CI workflow using crun to run the GGM as a demo. Once I've completed it, I will provide the link here.

from wasmedge.

CaptainVincent avatar CaptainVincent commented on May 28, 2024

I added an pure crun + llama example here.

The steps aligns with the description in this issue. It's worth noting that this experimental feature is available only in our forked version of crun. The modifications we made differ exclusively in the wasmedge handler from the official version. If you wish to try a newer version of crun, migrate the changes is easily achievable. The decision to maintain this specific version is partly influenced by compatibility constraints with k8s.

Here is my test WASM image source. You can build your own image containing your WASM application, too. There are various tools for building Docker images, and the most straightforward approach is to refer to our example, write a Dockerfile to compile your application, and copy it into the image.

If you have any questions, feel free to bring them up.

from wasmedge.

sohankunkerkar avatar sohankunkerkar commented on May 28, 2024

@CaptainVincent Thanks for the information. I was wondering what needs to be changed to follow a similar workflow for CRI-O.

from wasmedge.

CaptainVincent avatar CaptainVincent commented on May 28, 2024

If you are still using our fork of crun as the runtime for cri-o, you should pay attention to the following modifications.

  1. You need to bind the plugin folder to a specific path to ensure that the cri-o service can look up the location of dynamic libraries (following the instructions in the CI action; at this point, the folder will contain the plugin library and its dependencies). Alternatively, you can modify the LD_LIBRARY_PATH of the cri-o service.
    Below path is for containerd only
    --mount type=bind,src=$HOME/.wasmedge/plugin/,dst=/opt/containerd/lib,options=bind:ro
  2. Additionally, update the following environment variable to let WasmEdge know where to load our plugin path:
    --env WASMEDGE_PLUGIN_PATH=/opt/containerd/lib

from wasmedge.

sohankunkerkar avatar sohankunkerkar commented on May 28, 2024

@CaptainVincent Thanks for your pointers. I have made a few changes to crun to load the wasmedge plugin to /usr/lib/wasmedge. Here's the PR for it. This works fine with crun-wasm. I'm using this config.json to create a container.

Config.json
cat config.json 
{
  "ociVersion": "1.0.2-dev",
  "process": {
  	"terminal": true,
  	"user": {
  		"uid": 0,
  		"gid": 0
  	},
  	"args": [
  		"/llama-chat.wasm"
  	],
  	"env": [
  		"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
  		"WASMEDGE_PLUGIN_PATH=/usr/lib/wasmedge",
  		"WASMEDGE_WASINN_PRELOAD=default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf",
  		"TERM=xterm"
  	],
  	"cwd": "/",
  	"capabilities": {
  		"bounding": [
  			"CAP_AUDIT_WRITE",
  			"CAP_KILL",
  			"CAP_NET_BIND_SERVICE"
  		],
  		"effective": [
  			"CAP_AUDIT_WRITE",
  			"CAP_KILL",
  			"CAP_NET_BIND_SERVICE"
  		],
  		"permitted": [
  			"CAP_AUDIT_WRITE",
  			"CAP_KILL",
  			"CAP_NET_BIND_SERVICE"
  		],
  		"ambient": [
  			"CAP_AUDIT_WRITE",
  			"CAP_KILL",
  			"CAP_NET_BIND_SERVICE"
  		]
  	},
  	"rlimits": [
  		{
  			"type": "RLIMIT_NOFILE",
  			"hard": 1024,
  			"soft": 1024
  		}
  	],
  	"noNewPrivileges": true
  },
  "root": {
  	"path": "rootfs",
  	"readonly": false
  },
  "hostname": "wasm-test",
  "mounts": [
  	{
  		"destination": "/proc",
  		"type": "proc",
  		"source": "proc"
  	},
  	{
  		"destination": "/dev",
  		"type": "tmpfs",
  		"source": "tmpfs",
  		"options": [
  			"nosuid",
  			"strictatime",
  			"mode=755",
  			"size=65536k"
  		]
  	},
  	{
  		"destination": "/dev/pts",
  		"type": "devpts",
  		"source": "devpts",
  		"options": [
  			"nosuid",
  			"noexec",
  			"newinstance",
  			"ptmxmode=0666",
  			"mode=0620",
  			"gid=5"
  		]
  	},
  	{
  		"destination": "/dev/shm",
  		"type": "tmpfs",
  		"source": "shm",
  		"options": [
  			"nosuid",
  			"noexec",
  			"nodev",
  			"mode=1777",
  			"size=65536k"
  		]
  	},
  	{
  		"destination": "/dev/mqueue",
  		"type": "mqueue",
  		"source": "mqueue",
  		"options": [
  			"nosuid",
  			"noexec",
  			"nodev"
  		]
  	},
  	{
  		"destination": "/sys",
  		"type": "sysfs",
  		"source": "sysfs",
  		"options": [
  			"nosuid",
  			"noexec",
  			"nodev",
  			"ro"
  		]
  	},
  	{
  		"destination": "/sys/fs/cgroup",
  		"type": "cgroup",
  		"source": "cgroup",
  		"options": [
  			"nosuid",
  			"noexec",
  			"nodev",
  			"relatime",
  			"ro"
  		]
  	}
  ],
  "linux": {
  	"resources": {
  		"devices": [
  			{
  				"allow": false,
  				"access": "rwm"
  			}
  		]
  	},
  	"namespaces": [
  		{
  			"type": "pid"
  		},
  		{
  			"type": "network"
  		},
  		{
  			"type": "ipc"
  		},
  		{
  			"type": "uts"
  		},
  		{
  			"type": "mount"
  		},
  		{
  			"type": "cgroup"
  		}
  	],
  	"maskedPaths": [
  		"/proc/acpi",
  		"/proc/asound",
  		"/proc/kcore",
  		"/proc/keys",
  		"/proc/latency_stats",
  		"/proc/timer_list",
  		"/proc/timer_stats",
  		"/proc/sched_debug",
  		"/sys/firmware",
  		"/proc/scsi"
  	],
  	"readonlyPaths": [
  		"/proc/bus",
  		"/proc/fs",
  		"/proc/irq",
  		"/proc/sys",
  		"/proc/sysrq-trigger"
  	]
  }
}
$ ls -ltr rootfs/
total 4673668
-rw-r--r--. 1 skunkerk skunkerk 4783156800 Mar  8 13:54 llama-2-7b-chat-q5_k_m.gguf
-rwxr-xr-x. 1 skunkerk skunkerk    2675405 Mar  8 13:54 llama-chat.wasm
drwxr-xr-x. 1 skunkerk skunkerk          0 Mar  8 13:54 sys
drwxr-xr-x. 1 skunkerk skunkerk          0 Mar  8 13:54 proc
drwxr-xr-t. 1 skunkerk skunkerk          0 Mar  8 13:54 dev
drwxr-xr-t. 1 skunkerk skunkerk          6 Mar  8 13:54 usr

$ ❯ WASMEDGE_PLUGIN_MOUNT=/home/skunkerk/.wasmedge/plugin crun-wasm run wasm-test
RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin
[INFO] Model alias: default
[INFO] Prompt context size: 512
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 0.8
[INFO] Top-p sampling (1.0 = disabled): 0.9
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] presence penalty (0.0 = disabled): 0
[INFO] frequency penalty (0.0 = disabled): 0
[INFO] Use default system prompt
[INFO] Prompt template: Llama2Chat
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
[INFO] Plugin version: b2230 (commit 89febfed)

================================== Running in interactive mode. ===================================

    - Press [Ctrl+C] to interject at any time.
    - Press [Return] to end the input.
    - For multi-line inputs, end each line with '\' and press [Return] to get another line.


[You]:

Now, I tried integrating it with CRI-O and Kubernetes. I made a few changes to CRI-O to pass the env variable WASMEDGE_PLUGIN_MOUNT via the crio config.

$ cat llm-wasm.yml
apiVersion: v1
kind: Pod
metadata:
  name: llama-pod
spec:
  containers:
  - name: llama-container
    image: quay.io/sohankunkerkar/llama-crun:v1
    command: ["./llama-chat.wasm"]
    securityContext:
      privileged: true
  restartPolicy: Always
  
  $ cat Contaierfile
  # Use a base image
FROM scratch

ENV WASMEDGE_WASINN_PRELOAD default:GGML:AUTO:/app/model.gguf
ENV WASMEDGE_PLUGIN_PATH=/usr/lib/wasmedge

WORKDIR /app

# Copy wasm files to the directory
COPY llama-chat.wasm /app

COPY model.gguf /app

$ k get po
NAME        READY   STATUS      RESTARTS   AGE
llama-pod   0/1     Completed   0          3h58m

$  k logs llama-pod
[INFO] Model alias: default
[INFO] Prompt context size: 512
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 0.8
[INFO] Top-p sampling (1.0 = disabled): 0.9
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] presence penalty (0.0 = disabled): 0
[INFO] frequency penalty (0.0 = disabled): 0
[INFO] Use default system prompt
[INFO] Prompt template: Llama2Chat
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Not Found"
RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin

I added the debug statement RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin to crun to investigate potential issues. Upon inspection, it appears to be populating the correct value.
The /app directory does have both the files:

root:/var/lib/containers/storage/overlay/323126b6306097007bf68ea1957508ca275cf3a3ee361cd3860a1bc7243986a4/merged# ls
app  dev  etc  proc  run  sys  usr  var

root:/var/lib/containers/storage/overlay/323126b6306097007bf68ea1957508ca275cf3a3ee361cd3860a1bc7243986a4/merged# ls -ltr app
total 4673668
-rw-r--r--. 1 root root 4783156800 Feb 29 00:13 model.gguf
-rwxr-xr-x. 1 root root    2675405 Mar  2 09:47 llama-chat.wasm

I'm still intrigued about this failure -> Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Not Found".
Any idea what could be the problem here?

from wasmedge.

CaptainVincent avatar CaptainVincent commented on May 28, 2024

I'm not entirely sure, but I'll give it a try on my machine as well.

from wasmedge.

CaptainVincent avatar CaptainVincent commented on May 28, 2024

I have a question: is the order of environment variables the same in crun and cri-o? I noticed that the current implementation relies on the order of WASMEDGE_WASINN_PRELOAD and WASMEDGE_PLUGIN_PATH in the environment variables to determine the flow. However, wasmedge has strong limitation; nn_preload must precede load plugin because the plugin initialization process of loading plugins will handle preload the model. I'm concerned that the order here might lead to nn_preload occurring later than the load plugin process.

from wasmedge.

CaptainVincent avatar CaptainVincent commented on May 28, 2024

Close this item because all related issues it tracked have been closed, considering the goal accomplished.

from wasmedge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.