A quick search did not find where it is used. If you remove it, then

此外，在ComfyUI中可以像这样检测到设备（从model_management.py）： <div class

macOS support,about heshengtao/comfyui_llm_party

bigcat88 commented on September 26, 2024 1

Also , in ComfyUI device can be detected like this(from model_management.py):

import comfy.model_management as mm

device = mm.get_torch_device()

if mm.is_device_mps(device):
    pass
if mm.is_device_cuda(device):  
    pass

I can make a PR for this and other parts related to MPS a little bit later if you wish :)

from comfyui_llm_party.

heshengtao commented on September 26, 2024

auto-gptq is a library used to invoke the Qwen model. It’s quite late here now, and in the next few days, I will adjust the way Qwen is invoked and try to find a way around this auto-gptq library.

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

I'm not in a rush at all, so it's okay.

As far as I remember, Qwen is almost no different from other models, and I was already able to load Llama without “auto-gptq” into node on macOS (I had to change some small things for this).

I do not see use of auto-gptq, maybe I am too tired already or blind )

comfyui_LLM_party/llm.py

Lines 736 to 760 in b71c58f

    
           elif model_type=="Qwen": 
        
               qwen_device = "cuda" if torch.cuda.is_available() else "cpu" 
        
               if qwen_tokenizer=="": 
        
                   qwen_tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, revision='master', trust_remote_code=True) 
        
               if qwen_model=="": 
        
                   if device=="cuda": 
        
                       qwen_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, fp32=True,fp16=False,bf16=False).eval() 
        
                   elif device=="cpu": 
        
                       qwen_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cpu", trust_remote_code=True, fp32=True,fp16=False,bf16=False).eval() 
        
                   else: 
        
                       qwen_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, fp16=True,fp32=False,bf16=False).eval() 
        
                   qwen_model.eval() 
        
               qwen_model.generation_config = GenerationConfig.from_pretrained(model_path, trust_remote_code=True) 
        
               response, history = llm_chat(qwen_model,qwen_tokenizer,user_prompt,history,qwen_device,max_length) 
        
               while "Action Input:" in response: 
        
                   print(response) 
        
                   pattern_A = r"Action: (.*?)\n" 
        
                   pattern_B = r"Action Input: (.*?)\n" 
        
                   Action = re.search(pattern_A, response).group(1) 
        
                   ActionInput = re.search(pattern_B, response).group(1) 
        
                   ActionInput=json.loads(ActionInput.replace("'", '"')) 
        
                   result = dispatch_tool(Action,ActionInput) 
        
                   print(result) 
        
                   response, history = llm_chat(qwen_model,qwen_tokenizer,result,history,qwen_device,max_length,role="observation")

from comfyui_llm_party.

heshengtao commented on September 26, 2024

I found that when I was invoking Qwen’s GPTQ model, there was an error due to a missing third-party library, so I added this library to the requirements at that time. If AutoGPTQ is directly removed, I believe the unquantized Qwen should be able to run (but I don’t have enough VRAM, haha), I considered that most users might face the issue of insufficient VRAM to use the quantized model, so I included AutoGPTQ in the requirements. I am willing to sacrifice the need for the quantized model for the sake of cross-platform requirements, and I will temporarily remove the AutoGPTQ library; the code should still be able to run correctly. As for how to invoke the quantized model, I will think of a solution.

from comfyui_llm_party.

heshengtao commented on September 26, 2024

I have already removed AutoGPTQ from the requirements. Seeing that you had to make some minor adjustments, I want to know if by removing AutoGPTQ, I have avoided those changes. Are there any other adjustments needed?

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

As for how to invoke the quantized model, I will think of a solution.

You can probably mention this in the readme, something like:
"if you want to support quantized models, you need to install auto-gptq"

How ComfyUI mentions different packages for different video cards/OS in it's Readme file.

If needed it is simple to check if auto-gptq can be imported and present.

Are there any other adjustments needed?

In LLM_local class to device need be added mps

Autodetect of default can be changed to this:

"default": "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu"),

MPS is the same device as CUDA, but last PyTorch(2.3) supports only fp32/fp16 and maybe int8 on it.
Imho, from start only supporting fp32 will be enough as there are usually many RAM in MacBooks, and support for fp16/int8 can be added later.

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

Also , in ComfyUI device can be detected like this(from model_management.py):

import comfy.model_management as mm

device = mm.get_torch_device()

if mm.is_device_mps(device):
    pass
if mm.is_device_cuda(device):  
    pass

from comfyui_llm_party.

heshengtao commented on September 26, 2024

Thank you for your contribution! I have just freed myself from the tedious work, and I will take the time to modify the issues you mentioned, although there might be a bit of procrastination haha. Thank you again for your dedication to this project!

from comfyui_llm_party.

heshengtao commented on September 26, 2024

此外，在ComfyUI中可以像这样检测到设备（从model_management.py）：
import comfy.model_management as mm

device = mm.get_torch_device()

if mm.is_device_mps(device):
    pass
if mm.is_device_cuda(device):  
    pass
如果您愿意，我可以稍后为这个以及与 MPS 相关的其他部分制作 PR:)

I missed this message earlier, but if you could provide a PR for these parts, that would be really great. Thank you for your help and support!

from comfyui_llm_party.

heshengtao commented on September 26, 2024

I have already added the code related to MPS, but I do not have the relevant equipment on hand to test it. If you find any issues with this part of the code, please let me know at any time.

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).mps()  # there is no ".mps() function"

should be

llama_model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to(device)

or

llama_model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to("mps")

After that it is partially working with TinyLlama model for one run(when I just open flow and execute it works)

But when I press "Queue prompt" second time I get:
cannot access local variable 'llama_device' where it is not associated with a value

currently looking into this, soon will say why

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

When is_reload is False the llama_device is undefined on the second run.
So it is not related to mps, the same error will be for cuda too.

I have a question related to this: why are glm_tokenizer, llama_tokenizer, qwen_tokenizer - is not a one variable(model_tokenizer)?
The same question is for glm_model, llama_model...

Also this local variable llama_device(gem_device) can be unified in one variable too, and need to be global.

Or there was a specific reason for this?

from comfyui_llm_party.

heshengtao commented on September 26, 2024

fix this bug!

from comfyui_llm_party.

heshengtao commented on September 26, 2024

Today, while modifying the code, I wrote the assignment for llama_device inside the if llama_model == "": block, which resulted in llama_device not being assigned when the model was not unloaded the second time.

from comfyui_llm_party.

heshengtao commented on September 26, 2024

I’ve made modifications to the code related to MPS compatibility. A new issue has arisen: when I introduce int8 and int4 precision, I have to use the bitsandbytes library, which is not adapted for MPS. Currently, I prevent the bitsandbytes library from being imported when the device does not have CUDA. However, I’m still concerned that macOS users might encounter environment dependency errors when bitsandbytes is installed.Could you please help me test it?Thank you very much!

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

was no time during week, were too much work.
will check all this at these weekends )

from comfyui_llm_party.

heshengtao commented on September 26, 2024

I really, really appreciate you! There's nothing more joyful than completing an interesting project with like-minded individuals. For this reason, I've even hidden an easter egg in the project as a little surprise, haha!

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

Tested repo with latest commits, on MPS it works

But now on the cpu I get Placeholder storage has not been allocated on MPS device!.

Will we close this problem, like for MPS, and create a new one for CPU?

Testing on AMD (CUDA version) - also expected today.

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

But now on the cpu I get Placeholder storage has not been allocated on MPS device!.

this happens only if first time generate on mps and after that switch to cpu.
or do the opposite, first time generate on cpu and after that switch to mps - will be the same error

Error/Warning:

You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on mps, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cpu') before running .generate()

from comfyui_llm_party.

heshengtao commented on September 26, 2024

Yes, actually there is a problem with the code I wrote. I only accounted for loading the model onto the corresponding device when it’s not loaded, but I didn’t write the code to switch devices after it’s loaded. I admit I was being lazy because I originally thought no one would load a model and then switch devices, haha. I’ll immediately add the code for this part.

from comfyui_llm_party.

heshengtao commented on September 26, 2024

Fixed a bug where switching devices, dtypes, or model types would cause the model to throw an error. Now, when users switch these parameters, the model will reload.

from comfyui_llm_party.

bigcat88 commented on September 26, 2024

Thank you, tested and it works now on macOS on both cpu and mps :)

from comfyui_llm_party.

macOS support about comfyui_llm_party HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	elif model_type=="Qwen":
	qwen_device = "cuda" if torch.cuda.is_available() else "cpu"
	if qwen_tokenizer=="":
	qwen_tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, revision='master', trust_remote_code=True)
	if qwen_model=="":
	if device=="cuda":
	qwen_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, fp32=True,fp16=False,bf16=False).eval()
	elif device=="cpu":
	qwen_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cpu", trust_remote_code=True, fp32=True,fp16=False,bf16=False).eval()
	else:
	qwen_model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, fp16=True,fp32=False,bf16=False).eval()
	qwen_model.eval()
	qwen_model.generation_config = GenerationConfig.from_pretrained(model_path, trust_remote_code=True)
	response, history = llm_chat(qwen_model,qwen_tokenizer,user_prompt,history,qwen_device,max_length)
	while "Action Input:" in response:
	print(response)
	pattern_A = r"Action: (.*?)\n"
	pattern_B = r"Action Input: (.*?)\n"

	Action = re.search(pattern_A, response).group(1)
	ActionInput = re.search(pattern_B, response).group(1)
	ActionInput=json.loads(ActionInput.replace("'", '"'))
	result = dispatch_tool(Action,ActionInput)
	print(result)
	response, history = llm_chat(qwen_model,qwen_tokenizer,result,history,qwen_device,max_length,role="observation")