When I try : Instruction: Tell me about th

If you are using a V100 this might be of interest: <a class="issue-link js-issue-link"

If you are using a V100 this might be of interest: <a href="https://githu

Some problems in the demonstration example about alpaca-lora HOT 6 CLOSED

tloen commented on July 4, 2024

Some problems in the demonstration example

from alpaca-lora.

Comments (6)

tloen commented on July 4, 2024

I couldn't reproduce this on my end, but after sleeping on it I think it might have to do with Huggingface Accelerate. Will investigate today.

What hardware are you running, and do you have Accelerate installed?

from alpaca-lora.

younesbelkada commented on July 4, 2024

If you are using a V100 this might be of interest: huggingface/transformers#21955 (comment)
tweaking the llm_int8_threshold should maybe help
Also make sure you are using one of the latest bitsandbytes version (at least 0.37.0)

from alpaca-lora.

T-Atlas commented on July 4, 2024

I couldn't reproduce this on my end, but after sleeping on it I think it might have to do with Huggingface Accelerate. Will investigate today.

What hardware are you running, and do you have Accelerate installed?

I using a V100 and install latest Accelerate

from alpaca-lora.

T-Atlas commented on July 4, 2024

If you are using a V100 this might be of interest: huggingface/transformers#21955 (comment) tweaking the llm_int8_threshold should maybe help Also make sure you are using one of the latest bitsandbytes version (at least 0.37.0)

I will try it

from alpaca-lora.

T-Atlas commented on July 4, 2024

Unfortunately, I have forgotten the parameter setting when my problem occurred. Because I tried to take some alternatives, such as modifying num_ Beams.
I'm sure your solution works because it's similar to mine. Previously, I also observed that some answers could not be generated to return null questions, which I hope will be resolved and I will keep testing.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import torch
from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig,BitsAndBytesConfig

tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf",cache_dir="./cache/")

model = LLaMAForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    quantization_config = BitsAndBytesConfig(load_in_8bit=True, llm_int8_threshold=5.0),
    torch_dtype=torch.float16,
    device_map={'': 0},cache_dir="./cache/"
)
model = PeftModel.from_pretrained(
    model, "tloen/alpaca-lora-7b", torch_dtype=torch.float16,cache_dir="./cache/",device_map={'': 0}
)


def evaluate(instruction, input=None, **kwargs):
    prompt = generate_prompt(instruction, input)
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].cuda()
    generation_config = GenerationConfig(
        temperature=0.7,
        top_p=1.0,
        num_beams=5,
        **kwargs,
    )
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=1024,
    )
    s = generation_output.sequences[0]
    output = tokenizer.decode(s)
    return output.split("### Response:")[1].strip()


if __name__ == "__main__":
    # testing code for readme
    for instruction in [
        "Tell me about alpacas.",
        "Tell me about the president of Mexico in 2019.",
        "Tell me about the king of France in 2019.",
        "List all Canadian provinces in alphabetical order.",
        "Write a Python program that prints the first 10 Fibonacci numbers.",
        "Write a program that prints the numbers from 1 to 100. But for multiples of three print 'Fizz' instead of the number and for the multiples of five print 'Buzz'. For numbers which are multiples of both three and five print 'FizzBuzz'.",
        "Tell me five words that rhyme with 'shock'.",
        "Translate the sentence 'I have no mouth but I must scream' into Spanish.",
        "Count up from 1 to 500."
    ]:
        print("Instruction:", instruction)
        print("Response:", evaluate(instruction))
        print()

from alpaca-lora.

Aekansh-Ak commented on July 4, 2024

In your code-:

def evaluate(instruction, input=None, **kwargs):
prompt = generate_prompt(instruction, input)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].cuda()
generation_config = GenerationConfig(
temperature=0.7,
top_p=1.0,
num_beams=5,
**kwargs,
)
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=1024,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
return output.split("### Response:")[1].strip()

What is function generate_prompt?
Input is "None" so, this shouldn't inputs["input_ids"] return error.

from alpaca-lora.

Some problems in the demonstration example about alpaca-lora HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent