binary-mlc-llm-libs's People
Forkers
cyd3nt petercao sanyaade-teachings xdomain1212 rctry hzfengsy charliefruan pure-water nice-repo s3714110 davidpissarra dala-ai sangelone open-runtime r2d4 david-sharma barfinglemurs sing-li acalatrava cagataycali rickzx sust4in wenshengcheung aicrazyguy k2m5t2 cachengo clankpan sudeepag gaecom kartik14 karayakar mayneyao xhcom-ui mengshyu gxmlfx flatsiedatsie diegocao yiyanzhai icursor mr0001000 nanidaobinary-mlc-llm-libs's Issues
android apk run error
install https://github.com/mlc-ai/binary-mlc-llm-libs/raw/main/mlc-chat.apk 33.3 MB (35,003,075 字节)。
show ‘ add model failed: xxx/xx/ open failed:EACCES( Permission denied)'
[Request] Please generate `metal.so` for WizardCoder and WizardMath
Currently missing support for M1/M2.
[Request] Please generate Llama-2-7b-chat-hf-q4f16_1-cuda.dll
I want to use mlc-llm on windows with cuda. I have compiled mlc_chat_cli.exe with cuda enabled, but I still need this dll to run llama.
💸 This repository is over its data quota.
Time to pull the credit card!
(env) louisbeaumont@louis030195com-third-brain:~/Documents/assistants$ git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt_libs
Cloning into 'dist/prebuilt_libs'...
remote: Enumerating objects: 689, done.
remote: Counting objects: 100% (220/220), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 689 (delta 170), reused 197 (delta 155), pack-reused 469
Receiving objects: 100% (689/689), 184.05 MiB | 1.01 MiB/s, done.
Resolving deltas: 100% (495/495), done.
Updating files: 100% (197/197), done.
Downloading mlc-chat.apk (124 MB)
Error downloading object: mlc-chat.apk (b7b937c): Smudge error: Error downloading mlc-chat.apk (b7b937c7be3b7e5f8164f0f1ef58c9e1df15fd0f08721fbf7fe16d058ef09c6e): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
Errors logged to '/Users/louisbeaumont/Documents/assistants/dist/prebuilt_libs/.git/lfs/logs/20240213T142150.445523.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: mlc-chat.apk: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
[Request] please generate Qwen-7B-Chat for .wasm to use in web
It's mission .wasm in the Qwen-7B-Chat dir, I can only see .so files. Could you please generate .wasm?
Missing lib for Android
Didn't find any lib for Android
[Bug] [Stack trace] RedPajama doesn't work
The latest apk (from 5 days ago) crashes while using RedPajama, but llama 2 based models seem to work (I tried the uncensored one). RedPajama gives this error:
MLCChat failed
Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Check failed: shard_rec.nbytes == raw_data.length() (29583360 vs. 23663914) : Parameters are not loaded properly. Please check your parameter shards and git lfs installation
Stack trace:
File "/Users/houbohan/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 219
at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:636)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:634)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:537)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:634)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:463)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:637)
at java.lang.Thread.run(Thread.java:1012)
Error message:
ValueError: Check failed: shard_rec.nbytes == raw_data.length() (29583360 vs. 23663914) : Parameters are not loaded properly. Please check your parameter shards and git lfs installation
Stack trace:
File "/Users/houbohan/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 219
I had RedPajama working on older versions of the apk.
Error When Implementing Mali GPU Acceleration on OrangePi5 with mlc-llm
Following the tutorial, I set up mlc-llm on my OrangePi5 with Mali GPU acceleration via OpenCL. Everything was smooth until I encountered an error. I've re-downloaded the Mali libraries (versions below) multiple times, but the error persists. Could the libraries be corrupted?
Library versions in use:
- RedPajama-INCITE-Chat-3B-v1-q4f16_1
- RedPajama-INCITE-Chat-3B-v1-q4f16_1-mali.so
Any advice on resolving this would be appreciated.
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
Traceback (most recent call last):
File "/home/yusepp/Desktop/test.py", line 6, in <module>
cm = ChatModule(model=models+"/RedPajama-INCITE-Chat-3B-v1-q4f16_1",
File "/home/yusepp/mlc-llm/python/mlc_chat/chat_module.py", line 842, in __init__
self._reload(self.model_lib_path, self.model_path, user_chat_config_json_str)
File "/home/yusepp/mlc-llm/python/mlc_chat/chat_module.py", line 1056, in _reload
self._reload_func(lib, model_path, app_config_json)
File "/home/yusepp/tvm_unity/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
raise_last_ffi_error()
File "/home/yusepp/tvm_unity/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
File "/home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc", line 255, in tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
ValueError: Traceback (most recent call last):
3: 0x0000ffff63d3ae9b
2: 0x0000ffff63d3ac23
1: 0x0000ffff63d392bf
0: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:255
4: 0x0000ffff63d3ae9b
3: 0x0000ffff63d3ac23
2: 0x0000ffff63d392bf
1: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)
at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:253
0: tvm::runtime::relax_vm::NDArrayCacheMetadata::FileRecord::Load(DLDevice, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, tvm::runtime::Optional<tvm::runtime::NDArray>*) const
at /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:193
File "/home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc", line 255
ValueError: Error when loading parameters from params_shard_0.bin: [20:19:57] /home/yusepp/Desktop/tvm_unity/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (64552960 vs. 133) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
[Request] Please generate CodeLlama-7b-Python-hf-q4f16_1-metal.so
It's missing in the latest update. Can you please add this so that M1/M2 can run the codellama? Thanks
没有windows下的预编译lib了?
[Request] Please add usage in README
WizardCoder-15B-V1.0-q4f16_1 failing to load on WebLLM
Following the available examples in the WebLLM repo such as the next-simple-chat:
I have added the model URL and ID,
{ model_url: "https://huggingface.co/mlc-ai/mlc-chat-WizardCoder-15B-V1.0-q4f32_1/resolve/main/", local_id: "WizardCoder-15B-V1.0-q4f32_1", }
then added the libmap
"WizardCoder-15B-V1.0-q4f32_1": "https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/WizardCoder-15B-V1.0-q4f16_1-webgpu.wasm",
but I end up getting this error immediately after loading the model on the browser:
Init error, Error: Unknown conv template wizard_coder_or_math
Android app crash after last models updated
MLCChat failed
Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Error when loading parameters from params_shard_66.bin: [22:39:36] /Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (45088768 vs. 21734731) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255
at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:633)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:631)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:534)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:631)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)
Error message:
ValueError: Error when loading parameters from params_shard_66.bin: [22:39:36] /Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (45088768 vs. 21734731) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255
Missing library files for Llama-2-70b-chat-hf-q4f16_1 model
I downloaded the 70b model and encountered an error when running it. The command I used was:
mlc_chat_cli --local-id Llama-2-70b-chat-hf-q4f16_1
I received the following error message:
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
Use MLC config: "/home/lxr/software/dist/prebuilt/mlc-chat-Llama-2-70b-chat-hf-q4f16_1/mlc-chat-config.json"
Use model weights: "/home/lxr/software/dist/prebuilt/mlc-chat-Llama-2-70b-chat-hf-q4f16_1/ndarray-cache.json"
Cannot find library "Llama-2-70b-chat-hf-q4f16_1-vulkan.so" in "dist/prebuilt/lib" or other search paths.
However, in the same directory, I have successfully deployed the 13b model. Could you please provide further guidance on how to resolve this issue?
Source code or how to compile these
Thanks for making these available.
How can they be built manually? Can we include that in the README?
Is it working? Just closing on S23.
Hi!
Is it working? Downloaded LLama-2-7b, clicked on Chat button, it shows some messages, "Ready to chat".
And application closed, probably crushed. That's it.
Samsung S23
Intel MAC shared library files
Which files here should users on Intel Mac machines be using. It looks like the metal.so files are all built for arm64 architecture. What about x86?
TinyLlama is missing the wasm
Unfortunately TinyLlama is missing the wasm
[Request] Please update the Android APK
The current one is already 2 months old. Seems the CI/Build process won't do it automatically.
Llama2 70b is not working
Hi,
I just download the colab you provide at this link: https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb.
I works properly if I use the 7b model, however if I change the settings in order to use the 70b model, I receive the following error:
InternalError Traceback (most recent call last)
in <cell line: 4>()
2 from mlc_chat.callback import StreamToStdout
3
----> 4 cm = ChatModule(
5 model="dist/Llama-2-70b-chat-hf-q4f16_1-MLC",
6 model_lib_path="dist/prebuilt_libs/Llama-2-70b-chat-hf/Llama-2-70b-chat-hf-q4f16_1-cuda.so"
5 frames
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.PackedFuncBase.call()
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall()
tvm/_ffi/_cython/./packed_func.pxi in tvm._ffi._cy3.core.FuncCall3()
tvm/_ffi/_cython/./base.pxi in tvm._ffi._cy3.core.CHECK_CALL()
/workspace/mlc-llm/cpp/llm_chat.cc in LoadParams()
InternalError: Traceback (most recent call last):
7: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1633
6: mlc::llm::LLMChat::Reload(tvm::runtime::TVMArgValue, tvm::runtime::String, tvm::runtime::String)
at /workspace/mlc-llm/cpp/llm_chat.cc:631
5: LoadParams
at /workspace/mlc-llm/cpp/llm_chat.cc:219
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>::AssignTypedLambda<void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)>(void ()(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int), std::__cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
3: tvm::runtime::relax_vm::NDArrayCache::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int, int)
2: tvm::runtime::relax_vm::NDArrayCacheMetadata::Load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
1: tvm::runtime::LoadBinaryFromFile(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >*)
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/runtime/file_utils.cc", line 121
InternalError: Check failed: (!fs.fail()) is false: Cannot open dist/Llama-2-70b-chat-hf-q4f16_1-MLC/ndarray-cache.json
Resource consumption degradation
Hi. I try this app from time to time to look over the progress in mobile LLMs. In one of the previous versions of MLCChat (a6b0a4c from 19.09.2023) my device with 8GB RAM managed to run a 7B model, but now none of the 7B models present in the app work, even if I clean up the RAM completely. CL_OUT_OF_RESOURCES
in opencl_device_api_cc:246
is the error. Llama prints the error message in the chat, Mistral successfully loads the model, but after starting the generation it crashes the app with the same error.
Snapdragon 860
How to create these wasm/ mali-so files?
Need some manual or something
[request] Add Microsoft Phi 3 SLM
https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/
Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks. This release expands the selection of high-quality models for customers, offering more practical choices as they compose and build generative AI applications.
Please support Llama 3
i download the model from https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC. But the apk here can't Identify this model.
pre-built error raise FileNotFoundError(err_msg) FileNotFoundError: Cannot find the model library that corresponds to `None`.
[Request] Please update rwkv-raven-{1b5, 3b, 7b}-q8f16_0 to _1
As far as I understood the latest APK requires _1
models? I'd like to try the RWKV, because it didn't work with the original apk. What's the difference between the _0
and _1
versions, are they incompatible?
Gemma isn't working
Doesn't reply and after few retries throws:
MLCChat failed
Stack trace:
org.apache.tvm.Base$TVMError: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_module.cc", line 90
at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.decode(ChatModule.java:74)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$2.invoke(AppViewModel.kt:669)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$2.invoke(AppViewModel.kt:668)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.requestGenerate$lambda$4(AppViewModel.kt:668)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$lluIrcsPALEW5nCb2tohZYadhTY(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:6)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)
Error message:
InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/mlc-llm/3rdparty/tvm/src/runtime/opencl/opencl_module.cc", line 90
[request] Please generate the .wasm files for Zephyr 1.6B
Zephyr 1.6B shards exist on Huggingface, and seem to be diligently updated:
https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/tree/main
However, the corresponsing .wasm files are not available in this repository?
[Error] Mistral-7B-Instruct-v0.2 Model is creating garbage responses to prompts
I'm trying to build the MLC Chat Android app using the prebuilt models provided in this repository as per the instructions provided in https://llm.mlc.ai/docs/prebuilt_models.html#overview. All LLMs are working fine and are providing responses as expected but only Mistral-7B-Instruct-v0.2 model is generating garbage responses to the prompts as shown below.
This is happening for all kinds of prompts sending to the model.
Please help me with this error.
Uncaught Error: Cannot find model_url for Mistral-7B-Instruct-v0.1-q4f32_1 when running Chrome Extension example
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.