Comments (4)
I'm hitting the above error when self.decode_func()
is called:
mlc-llm/python/mlc_llm/testing/debug_chat.py
Lines 316 to 323 in 437166a
In the provided tokenizer.json
, decoder is set to 'null'. We prompt these models with tokens directly and are not doing BPE, so I want some simple passthrough here.
Another clue is that when I initialize DebugChat, I'm greeted with "Warning: Decoder field is not found in tokenizer.json. Use ByteFallback as default."
Any ideas? Could this be a config issue related to tokenizer.json?
from mlc-llm.
It's not a tokenizer issue, but a KVCache kernel issue. I guess that's because the kernel is not fully compatible with fp32
from mlc-llm.
That was it, thank you! Original problem is resolved with different quantization.
The docs are a bit unclear, does the "0" in q0f** mean no weight quantization?
from mlc-llm.
does the "0" in q0f** mean no weight quantization?
@caenopy Yes that's true. Thanks for pointing this out and we'll try to update the docs for better clarity.
from mlc-llm.
Related Issues (20)
- [Model Request] Phi 3 Vision HOT 1
- [Model Request] SmolLM HOT 2
- [Bug] MLC-LLM nightly is crashing for me
- [Feature Request] HOT 1
- [Question] How to load model to cuda:4~7 instead of default cuda:0~3 when 'tensor_parallel_shards=4' HOT 2
- [Bug] iOS App MLC Chat crashes while running SmolLM-1.7B-Instruct-q4f16_1-MLC on iphone15 PM
- [Question] Pixel 8 Pro: Unexpected GPU Out of Memory Error
- [Model Request] Llama 3.1 8b and Llama 3.1 70 b HOT 1
- Will MLC work on Qualcomm's AI100 hardware[Question]
- [Model Request] Mistral Large Instruct 2407
- [Question] how to set gpu_memory_utilization? HOT 1
- [Bug] max_single_sequence_length gets overridden?
- [Model Request] minicpm and minicpm-v HOT 2
- [Bug] Engine restarted with idle process HOT 3
- [Bug] mlc_llm pacakge issue(Clang versionError) HOT 2
- [Question] Do we plan to add a benchmarking script for batched performance?
- [Bug] tvm._ffi.base.TVMError: TVMError: Assert fail: T.Cast("int32", fused_fused_dequantize_take1_p_lv2656_shape[1]) == 256 HOT 1
- [Question] Llama3: How to solve GPU Out of Memory Error on Pixel 8 Pro? HOT 2
- [Question] debug chat ε¦δ½ζ―ζε€ζ¨‘ζθΎε ₯ηdebugοΌ
- [Bug] error: Multiple top-level packages discovered in a flat-layout: ['web', 'cpp', 'ios', 'site', 'cmake', 'android']. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlc-llm.