Comments (6)
I couldn't reproduce this on my end, but after sleeping on it I think it might have to do with Huggingface Accelerate. Will investigate today.
What hardware are you running, and do you have Accelerate installed?
from alpaca-lora.
If you are using a V100 this might be of interest: huggingface/transformers#21955 (comment)
tweaking the llm_int8_threshold
should maybe help
Also make sure you are using one of the latest bitsandbytes
version (at least 0.37.0
)
from alpaca-lora.
I couldn't reproduce this on my end, but after sleeping on it I think it might have to do with Huggingface Accelerate. Will investigate today.
What hardware are you running, and do you have Accelerate installed?
I using a V100 and install latest Accelerate
from alpaca-lora.
If you are using a V100 this might be of interest: huggingface/transformers#21955 (comment) tweaking the
llm_int8_threshold
should maybe help Also make sure you are using one of the latestbitsandbytes
version (at least0.37.0
)
I will try it
from alpaca-lora.
Unfortunately, I have forgotten the parameter setting when my problem occurred. Because I tried to take some alternatives, such as modifying num_ Beams
.
I'm sure your solution works because it's similar to mine. Previously, I also observed that some answers could not be generated to return null questions, which I hope will be resolved and I will keep testing.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import torch
from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig,BitsAndBytesConfig
tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf",cache_dir="./cache/")
model = LLaMAForCausalLM.from_pretrained(
"decapoda-research/llama-7b-hf",
load_in_8bit=True,
quantization_config = BitsAndBytesConfig(load_in_8bit=True, llm_int8_threshold=5.0),
torch_dtype=torch.float16,
device_map={'': 0},cache_dir="./cache/"
)
model = PeftModel.from_pretrained(
model, "tloen/alpaca-lora-7b", torch_dtype=torch.float16,cache_dir="./cache/",device_map={'': 0}
)
def evaluate(instruction, input=None, **kwargs):
prompt = generate_prompt(instruction, input)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].cuda()
generation_config = GenerationConfig(
temperature=0.7,
top_p=1.0,
num_beams=5,
**kwargs,
)
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=1024,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
return output.split("### Response:")[1].strip()
if __name__ == "__main__":
# testing code for readme
for instruction in [
"Tell me about alpacas.",
"Tell me about the president of Mexico in 2019.",
"Tell me about the king of France in 2019.",
"List all Canadian provinces in alphabetical order.",
"Write a Python program that prints the first 10 Fibonacci numbers.",
"Write a program that prints the numbers from 1 to 100. But for multiples of three print 'Fizz' instead of the number and for the multiples of five print 'Buzz'. For numbers which are multiples of both three and five print 'FizzBuzz'.",
"Tell me five words that rhyme with 'shock'.",
"Translate the sentence 'I have no mouth but I must scream' into Spanish.",
"Count up from 1 to 500."
]:
print("Instruction:", instruction)
print("Response:", evaluate(instruction))
print()
from alpaca-lora.
In your code-:
def evaluate(instruction, input=None, **kwargs):
prompt = generate_prompt(instruction, input)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].cuda()
generation_config = GenerationConfig(
temperature=0.7,
top_p=1.0,
num_beams=5,
**kwargs,
)
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=1024,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
return output.split("### Response:")[1].strip()
- What is function generate_prompt?
- Input is "None" so, this shouldn't inputs["input_ids"] return error.
from alpaca-lora.
Related Issues (20)
- Are the saved models (either adapter_model.bin or pytorch_model.bin) only 25-26MB in size? HOT 5
- generate error HOT 1
- can't load tokenizer HOT 2
- Load_in_8bit causing issues: Out of memory error with 44Gb VRAM in my GPU or device_map error HOT 1
- AttributeError: module 'gradio' has no attribute 'inputs' HOT 18
- When I set load_in_8bit=true, some errors occurred....
- is there any flag to mark the model is safetensors or pickle format?
- Errors of tuning on 70B LLAMA 2, does alpaca-lora support 70B llama 2 tuning work?
- safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization HOT 15
- generate error after hit submit btn
- The weights are not updated HOT 1
- LAION Open Assistant data is already released
- Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported
- Is it possible to combine alpaca-lora with RAG
- Is there a way to check if this training is all done?
- failed to run on colab: ModulesToSaveWrapper has no attribute `embed_tokens`
- Finetune scenarios
- decapoda-research/llama-7b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' HOT 2
- Single GPU vs multiple GPUs stack (parallel)
- Why this error? ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded: base_model.model.model.layers.3, base_model.model.model.layers.4, base_model.model.model.layers.5, base_model.model.model.layers.6, base_model.model.model.layers.7, base_model.model.model.layers.8, base_model.model.model.layers.9, base_model.model.model.layers.10, base_model.model.model.la
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alpaca-lora.