Summary When attempting to run the Llama 2 inference model locally

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<div class="highlight highlight-source-shell notranslate position-relative overflow-aut

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

bug: `--nn-preload` option doesn't work with the latest version of WasmEdge (v0.13.5) about wasmedge HOT 8 CLOSED

sohankunkerkar commented on September 26, 2024

bug: `--nn-preload` option doesn't work with the latest version of WasmEdge (v0.13.5)

from wasmedge.

Comments (8)

sohankunkerkar commented on September 26, 2024 2

Hi @sohankunkerkar Thanks for your reporting; I just updated the LLM document. Please let me know if the latest link makes it work or not.

@hydai Yup, this looks good now. Thanks for the quick fix.

from wasmedge.

hydai commented on September 26, 2024

Hi @sohankunkerkar
Have you installed the ggml plugin?
Ref: https://wasmedge.org/docs/develop/rust/wasinn/llm_inference/#prerequisite

from wasmedge.

sohankunkerkar commented on September 26, 2024

@hydai Yeah, it looks like I need to source the .bashrc before running that command.
BTW, do you happen to see this error before?

$ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
[2024-02-02 14:50:09.897] [error] loading failed: magic header not detected, Code: 0x23
[2024-02-02 14:50:09.897] [error]     Bytecode offset: 0x00000000
[2024-02-02 14:50:09.897] [error]     At AST node: component
[2024-02-02 14:50:09.897] [error]     File name: "/tmp/test/llama-chat.wasm"

from wasmedge.

hydai commented on September 26, 2024

$ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
[2024-02-02 14:50:09.897] [error] loading failed: magic header not detected, Code: 0x23
[2024-02-02 14:50:09.897] [error]     Bytecode offset: 0x00000000
[2024-02-02 14:50:09.897] [error]     At AST node: component
[2024-02-02 14:50:09.897] [error]     File name: "/tmp/test/llama-chat.wasm"

Could you check if the llama-chat.wasm is downloaded completely? If the magic header is not detected, the wasm file itself may be broken.

from wasmedge.

hydai commented on September 26, 2024

Oh, I think I know the reason.

We don't ship the wasm inside the repo now; instead, we move the wasm files into the release assets: https://github.com/second-state/LlamaEdge/releases/tag/0.2.12

cc @alabulei1 The document is totally out of date. We no longer ship the WASM file from the above link: https://wasmedge.org/docs/develop/rust/wasinn/llm_inference/#quick-start
The llama-utils is also renamed to llamaedge. The page should be updated.

from wasmedge.

sohankunkerkar commented on September 26, 2024

Ah, I see. I managed to get past that error. Now I'm seeing this error:

$wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
[INFO] Model alias: default
[INFO] Prompt context size: 512
[INFO] Number of tokens to predict: 1024
[INFO] Number of layers to run on the GPU: 100
[INFO] Batch size for prompt processing: 512
[INFO] Temperature for sampling: 0.8
[INFO] Top-p sampling (1.0 = disabled): 0.9
[INFO] Penalize repeat sequence of tokens: 1.1
[INFO] presence penalty (0.0 = disabled): 0
[INFO] frequency penalty (0.0 = disabled): 0
[INFO] Use default system prompt
[INFO] Prompt template: Llama2Chat
[INFO] Log prompts: false
[INFO] Log statistics: false
[INFO] Log all information: false
gguf_init_from_file: invalid magic characters '<!DO'
[2024-02-02 14:58:50.883] [error] [WASI-NN] GGML backend: Error: unable to init model.
Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument"

from wasmedge.

hydai commented on September 26, 2024

I am sorry about that.

'<!DO' shows it doesn't download the model correctly. And it seems like only grep an HTML file.

Please try curl -LO https://huggingface.co/wasmedge/llama2/resolve/main/llama-2-7b-chat-q5_k_m.gguf to download the model again.

Cc @alabulei1, even the model link needs to be updated. Please check them at the same time.

from wasmedge.

hydai commented on September 26, 2024

Hi @sohankunkerkar
Thanks for your reporting; I just updated the LLM document. Please let me know if the latest link makes it work or not.

from wasmedge.

bug: `--nn-preload` option doesn't work with the latest version of WasmEdge (v0.13.5) about wasmedge HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent