Running on a threadripper + RTX A6000 with 48gb of VRAM. I did the i

This is what I ended up doing <div class="snippet-clipboard-content notranslate po

I solved similar/same issue by reinstalling torch: <code class="notr

Program is running on CPU while set on GPU about localgpt HOT 7 OPEN

promtengineer commented on June 10, 2024

Program is running on CPU while set on GPU

from localgpt.

Comments (7)

ttimasdf commented on June 10, 2024 3

in localGPT/run_localGPT.py

Add import torch and from transformers import AutoTokenizer, AutoModelForCausalLM at the beginning
In load_model() function, change LlamaTokenizer to AutoTokenizer
Change LlamaForCausalLM to AutoModelForCausalLM
Add the following options to AutoModelForCausalLM.from_pretrained() function call:
1. device_map='auto'
2. torch_dtype=torch.float16

Tested on model TheBloke/Wizard-Vicuna-13B-Uncensored-HF · Hugging Face

from localgpt.

SpeedOfSpin commented on June 10, 2024

I'm also interested in this. I can't get it on the GPU for some reason.

from localgpt.

vaylonn commented on June 10, 2024

will test it later this day, i take you guys updated !

from localgpt.

lelapin123 commented on June 10, 2024

would you mind to post the functions.
I try to do that and it returns error, with me...

from localgpt.

SpeedOfSpin commented on June 10, 2024

This is what I ended up doing

gpu = True

def load_model():

    model_id = "TheBloke/vicuna-7B-1.1-HF"
    # model_id = "mayaeary/pygmalion-6b_dev-4bit-128g"
    # model_id = "TheBloke/wizardLM-7B-GPTQ"

    if gpu:
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForCausalLM.from_pretrained(model_id,
                                            device_map='auto',
                                            torch_dtype=torch.float16,
                                            )
    else:
        tokenizer = LlamaTokenizer.from_pretrained(model_id)
        model = LlamaForCausalLM.from_pretrained(model_id)



    pipe = pipeline(
        "text-generation",
        model=model, 
        tokenizer=tokenizer, 
        max_length=2048,
        temperature=0,
        top_p=0.95,
        repetition_penalty=1.15
    )

    local_llm = HuggingFacePipeline(pipeline=pipe)

    return local_llm

You will probably need a 24GB GPU to run that model though

from localgpt.

rolandinsh commented on June 10, 2024

I solved similar/same issue by reinstalling torch:

pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade --force-reinstall

Source: adapted from https://stackoverflow.com/a/76144354/885761

from localgpt.

khangeqkai commented on June 10, 2024

Non of these solutions work for me, it still running on cpu. :(

Edit: sorry i was a noob, the model i ran doesnt work on gpu. So i changed it to a different model and now my gpu is running at 100% from both anaconda and wsl2.

from localgpt.

Recommend Projects