Comments (13)
Does this help? It seems that you can access CUDA devices from Podman + crun.
NVIDIA/nvidia-container-toolkit#46
from wasmedge.
I am not sure ~
IIUC, this involves taking a different approach with the NVIDIA runtime.
Running application with the NVIDIA runtime doesn't actually coexist with the other runtime (crun + wasmedge handler).
from wasmedge.
The way I understand this is that there is an "Nvidia runtime" that acts as a runc / crun replacement.
But there is also an Nvidia library that works with crio / containers + runc (ie pre start hook etc) to accomplish the same. I guess we could explore this approach for crun as well?
from wasmedge.
GPU (cuda) support, If we execute the above command on a machine with an NVIDIA graphics card, the below error will be encountered, and it will switch back to CPU mode.
I'm interested in running these models in CPU mode. Can I execute them on my local machine? What adjustments must I make to ensure it works with the CRI-O + crun (wasm) combination?
from wasmedge.
@CaptainVincent Could you provide more information on the above question?
from wasmedge.
Certainly, it can be run on a local machine. Later, I will add a CI workflow using crun to run the GGM as a demo. Once I've completed it, I will provide the link here.
from wasmedge.
I added an pure crun + llama example here.
The steps aligns with the description in this issue. It's worth noting that this experimental feature is available only in our forked version of crun. The modifications we made differ exclusively in the wasmedge handler
from the official version. If you wish to try a newer version of crun, migrate the changes is easily achievable. The decision to maintain this specific version is partly influenced by compatibility constraints with k8s.
Here is my test WASM image source. You can build your own image containing your WASM application, too. There are various tools for building Docker images, and the most straightforward approach is to refer to our example, write a Dockerfile to compile your application, and copy it into the image.
If you have any questions, feel free to bring them up.
from wasmedge.
@CaptainVincent Thanks for the information. I was wondering what needs to be changed to follow a similar workflow for CRI-O.
from wasmedge.
If you are still using our fork of crun as the runtime for cri-o, you should pay attention to the following modifications.
- You need to bind the plugin folder to a specific path to ensure that the cri-o service can look up the location of dynamic libraries (following the instructions in the CI action; at this point, the folder will contain the plugin library and its dependencies). Alternatively, you can modify the LD_LIBRARY_PATH of the cri-o service.
Below path is for containerd only
--mount type=bind,src=$HOME/.wasmedge/plugin/,dst=/opt/containerd/lib,options=bind:ro
- Additionally, update the following environment variable to let WasmEdge know where to load our plugin path:
--env WASMEDGE_PLUGIN_PATH=/opt/containerd/lib
from wasmedge.
@CaptainVincent Thanks for your pointers. I have made a few changes to crun to load the wasmedge plugin to /usr/lib/wasmedge
. Here's the PR for it. This works fine with crun-wasm
. I'm using this config.json to create a container.
Config.json
cat config.json
{
"ociVersion": "1.0.2-dev",
"process": {
"terminal": true,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"/llama-chat.wasm"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"WASMEDGE_PLUGIN_PATH=/usr/lib/wasmedge",
"WASMEDGE_WASINN_PRELOAD=default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf",
"TERM=xterm"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"ambient": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"noNewPrivileges": true
},
"root": {
"path": "rootfs",
"readonly": false
},
"hostname": "wasm-test",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/sys/fs/cgroup",
"type": "cgroup",
"source": "cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
}
],
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
]
},
"namespaces": [
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "cgroup"
}
],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
$ ls -ltr rootfs/
total 4673668
-rw-r--r--. 1 skunkerk skunkerk 4783156800 Mar 8 13:54 llama-2-7b-chat-q5_k_m.gguf
-rwxr-xr-x. 1 skunkerk skunkerk 2675405 Mar 8 13:54 llama-chat.wasm
drwxr-xr-x. 1 skunkerk skunkerk 0 Mar 8 13:54 sys
drwxr-xr-x. 1 skunkerk skunkerk 0 Mar 8 13:54 proc
drwxr-xr-t. 1 skunkerk skunkerk 0 Mar 8 13:54 dev
drwxr-xr-t. 1 skunkerk skunkerk 6 Mar 8 13:54 usr
$ ❯ WASMEDGE_PLUGIN_MOUNT=/home/skunkerk/.wasmedge/plugin crun-wasm run wasm-test
RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin
[INFO] Model alias: default
[INFO] Prompt context size: 512
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 0.8
[INFO] Top-p sampling (1.0 = disabled): 0.9
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] presence penalty (0.0 = disabled): 0
[INFO] frequency penalty (0.0 = disabled): 0
[INFO] Use default system prompt
[INFO] Prompt template: Llama2Chat
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
[INFO] Plugin version: b2230 (commit 89febfed)
================================== Running in interactive mode. ===================================
- Press [Ctrl+C] to interject at any time.
- Press [Return] to end the input.
- For multi-line inputs, end each line with '\' and press [Return] to get another line.
[You]:
Now, I tried integrating it with CRI-O and Kubernetes. I made a few changes to CRI-O to pass the env variable WASMEDGE_PLUGIN_MOUNT
via the crio config.
$ cat llm-wasm.yml
apiVersion: v1
kind: Pod
metadata:
name: llama-pod
spec:
containers:
- name: llama-container
image: quay.io/sohankunkerkar/llama-crun:v1
command: ["./llama-chat.wasm"]
securityContext:
privileged: true
restartPolicy: Always
$ cat Contaierfile
# Use a base image
FROM scratch
ENV WASMEDGE_WASINN_PRELOAD default:GGML:AUTO:/app/model.gguf
ENV WASMEDGE_PLUGIN_PATH=/usr/lib/wasmedge
WORKDIR /app
# Copy wasm files to the directory
COPY llama-chat.wasm /app
COPY model.gguf /app
$ k get po
NAME READY STATUS RESTARTS AGE
llama-pod 0/1 Completed 0 3h58m
$ k logs llama-pod
[INFO] Model alias: default
[INFO] Prompt context size: 512
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 0.8
[INFO] Top-p sampling (1.0 = disabled): 0.9
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] presence penalty (0.0 = disabled): 0
[INFO] frequency penalty (0.0 = disabled): 0
[INFO] Use default system prompt
[INFO] Prompt template: Llama2Chat
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Not Found"
RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin
I added the debug statement RECEIVED WASMEDGE_PLUGIN_MOUNT value: /home/skunkerk/.wasmedge/plugin
to crun to investigate potential issues. Upon inspection, it appears to be populating the correct value.
The /app
directory does have both the files:
root:/var/lib/containers/storage/overlay/323126b6306097007bf68ea1957508ca275cf3a3ee361cd3860a1bc7243986a4/merged# ls
app dev etc proc run sys usr var
root:/var/lib/containers/storage/overlay/323126b6306097007bf68ea1957508ca275cf3a3ee361cd3860a1bc7243986a4/merged# ls -ltr app
total 4673668
-rw-r--r--. 1 root root 4783156800 Feb 29 00:13 model.gguf
-rwxr-xr-x. 1 root root 2675405 Mar 2 09:47 llama-chat.wasm
I'm still intrigued about this failure -> Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Not Found"
.
Any idea what could be the problem here?
from wasmedge.
I'm not entirely sure, but I'll give it a try on my machine as well.
from wasmedge.
I have a question: is the order of environment variables the same in crun and cri-o? I noticed that the current implementation relies on the order of WASMEDGE_WASINN_PRELOAD
and WASMEDGE_PLUGIN_PATH
in the environment variables to determine the flow. However, wasmedge has strong limitation; nn_preload must precede load plugin because the plugin initialization process of loading plugins will handle preload the model. I'm concerned that the order here might lead to nn_preload occurring later than the load plugin process.
from wasmedge.
Close this item because all related issues it tracked have been closed, considering the goal accomplished.
from wasmedge.
Related Issues (20)
- libstdc++.so.6: no version information available or libstdc++.so.6: cannot open shared object file HOT 6
- bug: AOT compiler wrongly optimizes invalid loads HOT 1
- bug: AOT compiler inconsistent memory side-effect with traps
- feat: provide new ggml plugin assets without AVX feature
- llama cannot run HOT 1
- llama cannot run HOT 1
- llama cannot run HOT 10
- Communication between wasms HOT 1
- Can't install WasmEdge and ggml plugin via the installer HOT 1
- bug: with Wasi-nn-ggml plugin: b2715, when I using curl to send a API request, it responses the stop sign of the model
- bug: Unexpected result
- Running birds tensorflowlite example via JNI - module name conflict HOT 4
- feat: Support llm.c as a WasmEdge plugin
- feat: Support stable diffusion.cpp as a WasmEdge plugin HOT 1
- LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance HOT 17
- LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs HOT 11
- LFX Mentorship (Jun-Aug, 2024): Support piper as a new backend of the WASI-NN WasmEdge plugin HOT 20
- On Darwin, `fd_pwrite` should respect offset even with append flag
- bug: Fail to raise an exception on a invalid test case
- feat: extend `wasi_logging` plugin to support log file HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wasmedge.