mlc-ai / binary-mlc-llm-libs Goto Github PK

View Code? Open in Web Editor NEW

167.0 167.0 43.0 247.7 MB

binary-mlc-llm-libs's People

Contributors

Stargazers

Watchers

Forkers

cyd3nt petercao sanyaade-teachings xdomain1212 rctry hzfengsy charliefruan pure-water nice-repo s3714110 davidpissarra dala-ai sangelone open-runtime r2d4 david-sharma barfinglemurs sing-li acalatrava cagataycali rickzx sust4in wenshengcheung aicrazyguy k2m5t2 cachengo clankpan sudeepag gaecom kartik14 karayakar mayneyao xhcom-ui mengshyu gxmlfx flatsiedatsie diegocao yiyanzhai icursor mr0001000 nanidao

binary-mlc-llm-libs's Issues

android apk run error

install https://github.com/mlc-ai/binary-mlc-llm-libs/raw/main/mlc-chat.apk 33.3 MB (35,003,075 字节)。
show ‘ add model failed： xxx/xx/ open failed：EACCES( Permission denied)'

[Request] Please generate `metal.so` for WizardCoder and WizardMath

Currently missing support for M1/M2.

[Request] Please generate Llama-2-7b-chat-hf-q4f16_1-cuda.dll

I want to use mlc-llm on windows with cuda. I have compiled mlc_chat_cli.exe with cuda enabled, but I still need this dll to run llama.

💸 This repository is over its data quota.

Time to pull the credit card!

(env) louisbeaumont@louis030195com-third-brain:~/Documents/assistants$ git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt_libs
Cloning into 'dist/prebuilt_libs'...
remote: Enumerating objects: 689, done.
remote: Counting objects: 100% (220/220), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 689 (delta 170), reused 197 (delta 155), pack-reused 469
Receiving objects: 100% (689/689), 184.05 MiB | 1.01 MiB/s, done.
Resolving deltas: 100% (495/495), done.
Updating files: 100% (197/197), done.
Downloading mlc-chat.apk (124 MB)
Error downloading object: mlc-chat.apk (b7b937c): Smudge error: Error downloading mlc-chat.apk (b7b937c7be3b7e5f8164f0f1ef58c9e1df15fd0f08721fbf7fe16d058ef09c6e): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to '/Users/louisbeaumont/Documents/assistants/dist/prebuilt_libs/.git/lfs/logs/20240213T142150.445523.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: mlc-chat.apk: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

[Request] please generate Qwen-7B-Chat for .wasm to use in web

It's mission .wasm in the Qwen-7B-Chat dir, I can only see .so files. Could you please generate .wasm?

[Bug] [Stack trace] RedPajama doesn't work

The latest apk (from 5 days ago) crashes while using RedPajama, but llama 2 based models seem to work (I tried the uncensored one). RedPajama gives this error:

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Check failed: shard_rec.nbytes == raw_data.length() (29583360 vs. 23663914) : Parameters are not loaded properly. Please check your parameter shards and git lfs installation
Stack trace:
  File "/Users/houbohan/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 219

	at org.apache.tvm.Base.checkCall(Base.java:173)
	at org.apache.tvm.Function.invoke(Function.java:130)
	at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
	at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:636)
	at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:634)
	at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:537)
	at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:634)
	at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
	at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:463)
	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:637)
	at java.lang.Thread.run(Thread.java:1012)


Error message:
ValueError: Check failed: shard_rec.nbytes == raw_data.length() (29583360 vs. 23663914) : Parameters are not loaded properly. Please check your parameter shards and git lfs installation
Stack trace:
  File "/Users/houbohan/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 219

I had RedPajama working on older versions of the apk.

Error When Implementing Mali GPU Acceleration on OrangePi5 with mlc-llm

Following the tutorial, I set up mlc-llm on my OrangePi5 with Mali GPU acceleration via OpenCL. Everything was smooth until I encountered an error. I've re-downloaded the Mali libraries (versions below) multiple times, but the error persists. Could the libraries be corrupted?

Library versions in use:

RedPajama-INCITE-Chat-3B-v1-q4f16_1
RedPajama-INCITE-Chat-3B-v1-q4f16_1-mali.so

Any advice on resolving this would be appreciated.

arm_release_ver: g13p0-01eac0, rk_so_ver: 3
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
Traceback (most recent call last):
  File "/home/yusepp/Desktop/test.py", line 6, in <module>
    cm = ChatModule(model=models+"/RedPajama-INCITE-Chat-3B-v1-q4f16_1",
  File "/home/yusepp/mlc-llm/python/mlc_chat/chat_module.py", line 842, in __init__
    self._reload(self.model_lib_path, self.model_path, user_chat_config_json_str)
  File "/home/yusepp/mlc-llm/python/mlc_chat/chat_module.py", line 1056, in _reload
    self._reload_func(lib, model_path, app_config_json)
  File "/home/yusepp/tvm_unity/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/home/yusepp/tvm_unity/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc", line 255, in tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
ValueError: Traceback (most recent call last):
  3: 0x0000ffff63d3ae9b
  2: 0x0000ffff63d3ac23
  1: 0x0000ffff63d392bf
  0: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:255
  4: 0x0000ffff63d3ae9b
  3: 0x0000ffff63d3ac23
  2: 0x0000ffff63d392bf
  1: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:253
  0: tvm::runtime::relax_vm::NDArrayCacheMetadata::FileRecord::Load(DLDevice, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, tvm::runtime::Optional<tvm::runtime::NDArray>*) const
        at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:193
  File "/home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc", line 255
ValueError: Error when loading parameters from params_shard_0.bin: [20:19:57] /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (64552960 vs. 133) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.

[Request] Please generate CodeLlama-7b-Python-hf-q4f16_1-metal.so

It's missing in the latest update. Can you please add this so that M1/M2 can run the codellama? Thanks

没有windows下的预编译lib了？

[Request] Please add usage in README

WizardCoder-15B-V1.0-q4f16_1 failing to load on WebLLM

Following the available examples in the WebLLM repo such as the next-simple-chat:

I have added the model URL and ID,

{ model_url: "https://huggingface.co/mlc-ai/mlc-chat-WizardCoder-15B-V1.0-q4f32_1/resolve/main/", local_id: "WizardCoder-15B-V1.0-q4f32_1", }

then added the libmap

"WizardCoder-15B-V1.0-q4f32_1": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/WizardCoder-15B-V1.0-q4f16_1-webgpu.wasm",

but I end up getting this error immediately after loading the model on the browser:

Init error, Error: Unknown conv template wizard_coder_or_math

Android app crash after last models updated

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Error when loading parameters from params_shard_66.bin: [22:39:36] /Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (45088768 vs. 21734731) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:633)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:631)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:534)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:631)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
ValueError: Error when loading parameters from params_shard_66.bin: [22:39:36] /Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (45088768 vs. 21734731) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255

Missing library files for Llama-2-70b-chat-hf-q4f16_1 model

I downloaded the 70b model and encountered an error when running it. The command I used was:

mlc_chat_cli --local-id Llama-2-70b-chat-hf-q4f16_1

I received the following error message:

WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use MLC config: "/home/lxr/software/dist/prebuilt/mlc-chat-Llama-2-70b-chat-hf-q4f16_1/mlc-chat-config.json"
Use model weights: "/home/lxr/software/dist/prebuilt/mlc-chat-Llama-2-70b-chat-hf-q4f16_1/ndarray-cache.json"
Cannot find library "Llama-2-70b-chat-hf-q4f16_1-vulkan.so" in "dist/prebuilt/lib" or other search paths.

However, in the same directory, I have successfully deployed the 13b model. Could you please provide further guidance on how to resolve this issue?

Source code or how to compile these

Thanks for making these available.

How can they be built manually? Can we include that in the README?

Is it working? Just closing on S23.

Hi!
Is it working? Downloaded LLama-2-7b, clicked on Chat button, it shows some messages, "Ready to chat".
And application closed, probably crushed. That's it.
Samsung S23

Intel MAC shared library files

Which files here should users on Intel Mac machines be using. It looks like the metal.so files are all built for arm64 architecture. What about x86?

TinyLlama is missing the wasm

Unfortunately TinyLlama is missing the wasm

[Request] Please update the Android APK

The current one is already 2 months old. Seems the CI/Build process won't do it automatically.

Llama2 70b is not working

Hi,
I just download the colab you provide at this link: https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb.
I works properly if I use the 7b model, however if I change the settings in order to use the 70b model, I receive the following error:

InternalError Traceback (most recent call last)
in <cell line: 4>()
2 from mlc_chat.callback import StreamToStdout
3
----> 4 cm = ChatModule(
5 model="dist/Llama-2-70b-chat-hf-q4f16_1-MLC",
6 model_lib_path="dist/prebuilt_libs/Llama-2-70b-chat-hf/Llama-2-70b-chat-hf-q4f16_1-cuda.so"

5 frames
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.PackedFuncBase.call()

tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall()

tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall3()

tvm/_ffi/_cython/./base.pxi in tvm._ffi._cy3.core.CHECK_CALL()

/workspace/mlc-llm/cpp/llm_chat.cc in LoadParams()

InternalError: Traceback (most recent call last):
7: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1633
6: mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String)
at /workspace/mlc-llm/cpp/llm_chat.cc:631
5: LoadParams
at /workspace/mlc-llm/cpp/llm_chat.cc:219
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>::AssignTypedLambda<void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>(void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int), std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
3: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)
2: tvm::runtime::relax_vm::NDArrayCacheMetadata::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
1: tvm::runtime::LoadBinaryFromFile(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >*)
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/runtime/file_utils.cc", line 121
InternalError: Check failed: (!fs.fail()) is false: Cannot open dist/Llama-2-70b-chat-hf-q4f16_1-MLC/ndarray-cache.json

Resource consumption degradation

Hi. I try this app from time to time to look over the progress in mobile LLMs. In one of the previous versions of MLCChat (a6b0a4c from 19.09.2023) my device with 8GB RAM managed to run a 7B model, but now none of the 7B models present in the app work, even if I clean up the RAM completely. CL_OUT_OF_RESOURCES in opencl_device_api_cc:246 is the error. Llama prints the error message in the chat, Mistral successfully loads the model, but after starting the generation it crashes the app with the same error.
Snapdragon 860

How to create these wasm/ mali-so files?

Need some manual or something

[request] Add Microsoft Phi 3 SLM

https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/

Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks. This release expands the selection of high-quality models for customers, offering more practical choices as they compose and build generative AI applications.

Please support Llama 3

i download the model from https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC. But the apk here can't Identify this model.

pre-built error raise FileNotFoundError(err_msg) FileNotFoundError: Cannot find the model library that corresponds to `None`.

My computer is window11 with environment of cuda with wsl.

[Request] Please update rwkv-raven-{1b5, 3b, 7b}-q8f16_0 to _1

As far as I understood the latest APK requires _1 models? I'd like to try the RWKV, because it didn't work with the original apk. What's the difference between the _0 and _1 versions, are they incompatible?

Gemma isn't working

Doesn't reply and after few retries throws:
MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_module.cc", line 90

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.decode(ChatModule.java:74)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$2.invoke(AppViewModel.kt:669)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$2.invoke(AppViewModel.kt:668)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.requestGenerate$lambda$4(AppViewModel.kt:668)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$lluIrcsPALEW5nCb2tohZYadhTY(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:6)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_module.cc", line 90

[request] Please generate the .wasm files for Zephyr 1.6B

Zephyr 1.6B shards exist on Huggingface, and seem to be diligently updated:
https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/tree/main

However, the corresponsing .wasm files are not available in this repository?

[Error] Mistral-7B-Instruct-v0.2 Model is creating garbage responses to prompts

I'm trying to build the MLC Chat Android app using the prebuilt models provided in this repository as per the instructions provided in https://llm.mlc.ai/docs/prebuilt_models.html#overview. All LLMs are working fine and are providing responses as expected but only Mistral-7B-Instruct-v0.2 model is generating garbage responses to the prompts as shown below.

This is happening for all kinds of prompts sending to the model.

Please help me with this error.