🐛 Bug To Reproduce Steps to reproduce the be

does the "0" in q0f** mean no weight quantization? <p d

[Bug] Threadgroup memory size exceeds the maximum threadgroup memory allowed about mlc-llm HOT 4 CLOSED

caenopy commented on August 16, 2024

[Bug] Threadgroup memory size exceeds the maximum threadgroup memory allowed

from mlc-llm.

Comments (4)

caenopy commented on August 16, 2024

I'm hitting the above error when self.decode_func() is called:

mlc-llm/python/mlc_llm/testing/debug_chat.py

Lines 316 to 323 in 437166a

    
           def _decode(self, token: int, kv_caches: Object): 
        
               embedding, _ = self._embed( 
        
                   tvm.nd.array(np.array([token]).astype("int32"), device=self.device) 
        
               ) 
        
               self.begin_forward_func(kv_caches, ShapeTuple([0]), ShapeTuple([1])) 
        
               logits, kv_caches = self.decode_func(embedding, kv_caches, self.params) 
        
               self.end_forward_func(kv_caches) 
        
               return logits

In the provided tokenizer.json, decoder is set to 'null'. We prompt these models with tokens directly and are not doing BPE, so I want some simple passthrough here.

Another clue is that when I initialize DebugChat, I'm greeted with "Warning: Decoder field is not found in tokenizer.json. Use ByteFallback as default."

Any ideas? Could this be a config issue related to tokenizer.json?

from mlc-llm.

Hzfengsy commented on August 16, 2024

It's not a tokenizer issue, but a KVCache kernel issue. I guess that's because the kernel is not fully compatible with fp32

from mlc-llm.

caenopy commented on August 16, 2024

That was it, thank you! Original problem is resolved with different quantization.

The docs are a bit unclear, does the "0" in q0f** mean no weight quantization?

from mlc-llm.

MasterJH5574 commented on August 16, 2024

does the "0" in q0f** mean no weight quantization?

@caenopy Yes that's true. Thanks for pointing this out and we'll try to update the docs for better clarity.

from mlc-llm.

Recommend Projects

[Bug] Threadgroup memory size exceeds the maximum threadgroup memory allowed about mlc-llm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def _decode(self, token: int, kv_caches: Object):
	embedding, _ = self._embed(
	tvm.nd.array(np.array([token]).astype("int32"), device=self.device)
	)
	self.begin_forward_func(kv_caches, ShapeTuple([0]), ShapeTuple([1]))
	logits, kv_caches = self.decode_func(embedding, kv_caches, self.params)
	self.end_forward_func(kv_caches)
	return logits