sail-sg / lorahub Goto Github PK
View Code? Open in Web Editor NEW[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
License: MIT License
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
License: MIT License
Hi, thank you for your good work!
I was wondering if you could provide the code used to Finetune these Lora modules on the flan datasets, along with the flan data loading scripts, the validation data used to tune hyperparameters, and perform the model selection. It would be very useful to everyone in general.
Thanks,
Prateek
Hi! I am trying to use the lorahub modules trained on Glue cola dataset, but it seems like it is not producing the correct output? Running the following code gives an output of "The next one is more accurate." rather than the expected label "1".
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-large")
model = PeftModel.from_pretrained(model, "lorahub/flan_t5_large-glue_cola")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-large")
input_text = "Our friends won't buy this analysis, let alone the next one we propose."
input_ids = tokenizer(input_text, truncation=True, padding=True,return_tensors="pt").input_ids
outputs = model.generate(inputs=input_ids)
print(tokenizer.decode(outputs[0]))
Could you please provide an example of how to load a model, instantiate it, and apply a LoRA (Low-Rank Adaptation) using Hugging Face? Specifically, I'm interested in seeing how to download the LoRA and integrate it with a model instance, similar to the examples provided by Hugging Face
I would like to reproduce the glue datasets performance by just doing inference to see with my own eyes.
thank you for the great job
I tested almost all the lora models on the hugging face lorahub, but no single lora can score on Disambiguation subset. Is it really possible that merging weights makes the miracle happend? And without the Disambiguation subset, all the lora models are pretty similar to the flan-t5-large result on BBH.
Some lora can not be loaded as the filename is different from hugging face model id. I will fix that problem later but the current result is strange.
import threading
import os
from lorahub.constant import LORA_MODULE_NAMES
def run(device:int):
for i, lora in enumerate(LORA_MODULE_NAMES):
if i%8!=device:
continue
loraname = lora.split('/')[-1]
print(f'thread:{device} is testing {loraname}')
os.system(f'CUDA_VISIBLE_DEVICES={device} python reproduce_bbh.py >log/{loraname}.log 2>log/{loraname}.err '+
f'--lora_model /data/sdb/sunyuhan/exp/lorahub_models/{loraname}')
print(device)
threads = [threading.Thread(target=run, args=[i])for i in range(8)]
for t in threads:
t.start()
for t in threads:
t.join()
print("Thread has finished.")
Look at the radar chart, all the models are similar and none can score on Disambiguation subset:
Hi team!
I trained my own LoRA with train_model.py. Then I have a folder called 'demo' which contains my LoRA model (located in /lorahub/demo) . If I would like to use it in the process of lorahub_learning, should I add 'lorahub/demo' to LORA_MODULE_NAMES? I have tried but failed somehow.
Hope to get your guidance.
Thanks!
Fanjunduo Wei
I am currently attempting to reproduce the experiment results outlined in your paper using the Flan-t5-large model on the BBH task. I've encountered a deviation in the results and I'm seeking assistance in addressing this issue.
I utilized the script available at this GitHub repository to conduct the evaluation.
My obtained score for Flan-t5-large on the BBH task was 31.82, whereas your paper reports a score of 27 for the same model-task combination.
To ensure consistency, I made sure to set ntrain=0 as instructed.
I kindly ask for your guidance in reproducing the experiment results as presented in your paper. It would be immensely helpful if you could provide some insights, suggestions, or clarifications to assist me in achieving consistent results.
I greatly appreciate your time and consideration in helping me address this matter. Your expertise will go a long way in resolving this discrepancy and advancing my understanding of the experiment.
Thank you in advance for your assistance!
[2023-08-15 14:55:45,077] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
{'data_dir': 'lukaemon/bbh', 'ntrain': 0, 'kwargs': {'model_name': 'seq_to_seq', 'model_path': '/data/sdb/sunyuhan/exp/flan-t5-large'}, 'args': Namespace(data_dir='lukaemon/bbh', ntrain=0, kwargs={'model_name': 'seq_to_seq', 'model_path': '/data/sdb/sunyuhan/exp/flan-t5-large'}), 'model': SeqToSeqModel(model_path='/data/sdb/sunyuhan/exp/flan-t5-large', max_input_length=2048, max_output_length=32, model=None, tokenizer=None, lora_path='', device='cuda', load_8bit=False, do_sample=False)}
{'prompt': 'not ( True ) and ( True ) is\nAnswer:', 'label': 'False', 'pred': 'not'}
{'name': 'boolean_expressions', 'score': 0.504}
{'prompt': 'How would a typical person answer each of the following questions about causation?\nA machine is set up in such a way that it will short circuit if both the black wire and the red wire touch the battery at the same time. The machine will not short circuit if just one of these wires touches the battery. The black wire is designated as the one that is supposed to touch the battery, while the red wire is supposed to remain in some other part of the machine. One day, the black wire and the red wire both end up touching the battery at the same time. There is a short circuit. Did the black wire cause the short circuit?\nOptions:\n- Yes\n- No\nAnswer:', 'label': 'No', 'pred': 'Yes'}
{'name': 'causal_judgement', 'score': 0.5989304812834224}
{'prompt': 'Today is Christmas Eve of 1937. What is the date tomorrow in MM/DD/YYYY?\nOptions:\n(A) 12/11/1937\n(B) 12/25/1937\n(C) 01/04/1938\n(D) 12/04/1937\n(E) 12/25/2006\n(F) 07/25/1937\nAnswer:', 'label': '(B)', 'pred': '(A)'}
{'name': 'date_understanding', 'score': 0.22}
{'prompt': 'In the following sentences, explain the antecedent of the pronoun (which thing the pronoun refers to), or state that it is ambiguous.\nSentence: The patient was referred to the specialist because he had a rare skin condition.\nOptions:\n(A) The patient had a skin condition\n(B) The specialist had a skin condition\n(C) Ambiguous\nAnswer:', 'label': '(A)', 'pred': 'The patient had a skin condition'}
{'name': 'disambiguation_qa', 'score': 0.0}
{'prompt': 'Complete the rest of the sequence, making sure that the parentheses are closed properly. Input: [ [\nAnswer:', 'label': '] ]', 'pred': ']'}
{'name': 'dyck_languages', 'score': 0.0}
{'prompt': '"Here comes a perfectly valid argument: First of all, whoever is a schoolmate of Sondra is not a stepsister of Pricilla. In consequence, whoever is not a stepsister of Pricilla is a schoolmate of Sondra."\nIs the argument, given the explicitly stated premises, deductively valid or invalid?\nOptions:\n- valid \n- invalid\nAnswer:', 'label': 'invalid', 'pred': 'valid'}
{'name': 'formal_fallacies', 'score': 0.572}
{'prompt': 'This SVG path element <path d="M 55.57,80.69 L 57.38,65.80 M 57.38,65.80 L 48.90,57.46 M 48.90,57.46 L 45.58,47.78 M 45.58,47.78 L 53.25,36.07 L 66.29,48.90 L 78.69,61.09 L 55.57,80.69"/> draws a\nOptions:\n(A) circle\n(B) heptagon\n(C) hexagon\n(D) kite\n(E) line\n(F) octagon\n(G) pentagon\n(H) rectangle\n(I) sector\n(J) triangle\nAnswer:', 'label': '(B)', 'pred': '(A)'}
{'name': 'geometric_shapes', 'score': 0.172}
{'prompt': 'Which sentence has the correct adjective order:\nOptions:\n(A) midsize old grey Brazilian sweater\n(B) midsize grey Brazilian old sweater\nAnswer:', 'label': '(A)', 'pred': '(A) old, grey, sweater'}
{'name': 'hyperbaton', 'score': 0.596}
{'prompt': 'The following paragraphs each describe a set of five objects arranged in a fixed order. The statements are logically consistent within each paragraph. On a branch, there are five birds: a quail, an owl, a raven, a falcon, and a robin. The owl is the leftmost. The robin is to the left of the raven. The quail is the rightmost. The raven is the third from the left.\nOptions:\n(A) The quail is the rightmost\n(B) The owl is the rightmost\n(C) The raven is the rightmost\n(D) The falcon is the rightmost\n(E) The robin is the rightmost\nAnswer:', 'label': '(A)', 'pred': '(A)'}
{'name': 'logical_deduction_five_objects', 'score': 0.452}
{'prompt': 'The following paragraphs each describe a set of seven objects arranged in a fixed order. The statements are logically consistent within each paragraph. In a golf tournament, there were seven golfers: Ana, Eve, Ada, Dan, Rob, Amy, and Joe. Dan finished third. Ana finished above Ada. Amy finished last. Dan finished below Rob. Eve finished below Ada. Rob finished below Joe.\nOptions:\n(A) Ana finished third\n(B) Eve finished third\n(C) Ada finished third\n(D) Dan finished third\n(E) Rob finished third\n(F) Amy finished third\n(G) Joe finished third\nAnswer:', 'label': '(D)', 'pred': '(D)'}
{'name': 'logical_deduction_seven_objects', 'score': 0.456}
{'prompt': 'The following paragraphs each describe a set of three objects arranged in a fixed order. The statements are logically consistent within each paragraph. On a branch, there are three birds: a blue jay, a quail, and a falcon. The falcon is to the right of the blue jay. The blue jay is to the right of the quail.\nOptions:\n(A) The blue jay is the second from the left\n(B) The quail is the second from the left\n(C) The falcon is the second from the left\nAnswer:', 'label': '(A)', 'pred': '(A)'}
{'name': 'logical_deduction_three_objects', 'score': 0.516}
{'prompt': 'Find a movie similar to Batman, The Mask, The Fugitive, Pretty Woman:\nOptions:\n(A) The Front Page\n(B) Maelstrom\n(C) The Lion King\n(D) Lamerica\nAnswer:', 'label': '(C)', 'pred': '(A)'}
{'name': 'movie_recommendation', 'score': 0.56}
{'prompt': '((-1 + 2 + 9 * 5) - (-2 + -4 + -4 * -7)) =\nAnswer:', 'label': '24', 'pred': '((-1 + 3) + 9 * 5) - (-4 + -6) + -23 * -7))'}
{'name': 'multistep_arithmetic_two', 'score': 0.0}
{'prompt': 'If you follow these instructions, do you return to the starting point? Always face forward. Take 1 step backward. Take 9 steps left. Take 2 steps backward. Take 6 steps forward. Take 4 steps forward. Take 4 steps backward. Take 3 steps right.\nOptions:\n- Yes\n- No\nAnswer:', 'label': 'No', 'pred': 'No'}
{'name': 'navigate', 'score': 0.576}
{'prompt': 'I have a flute, a piano, a trombone, four stoves, a violin, an accordion, a clarinet, a drum, two lamps, and a trumpet. How many musical instruments do I have?\nAnswer:', 'label': '8', 'pred': '20'}
{'name': 'object_counting', 'score': 0.288}
{'prompt': 'Here is a table where the first line is a header and each subsequent line is a penguin: name, age, height (cm), weight (kg) Louis, 7, 50, 11 Bernard, 5, 80, 13 Vincent, 9, 60, 11 Gwen, 8, 70, 15 For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. We now add a penguin to the table:\nJames, 12, 90, 12\nAnd here is a similar table, but listing giraffes:\nname, age, height (cm), weight (kg)\nJody, 5, 430, 620\nGladys, 10, 420, 590\nMarian, 2, 310, 410\nDonna, 9, 440, 650\nWhich is the oldest penguin?\nOptions:\n(A) Louis\n(B) Bernard\n(C) Vincent\n(D) Gwen\n(E) James\nAnswer:', 'label': '(E)', 'pred': '(E)'}
{'name': 'penguins_in_a_table', 'score': 0.3424657534246575}
{'prompt': 'On the floor, there is one mauve cat toy, two purple cat toys, three grey cat toys, two mauve notebooks, three grey notebooks, three burgundy cat toys, and one purple notebook. If I remove all the notebooks from the floor, how many grey objects remain on it?\nOptions:\n(A) zero\n(B) one\n(C) two\n(D) three\n(E) four\n(F) five\n(G) six\n(H) seven\n(I) eight\n(J) nine\n(K) ten\n(L) eleven\n(M) twelve\n(N) thirteen\n(O) fourteen\n(P) fifteen\n(Q) sixteen\nAnswer:', 'label': '(D)', 'pred': 'There are 1 mauve cat toy, 2 purple cat toys, 3 grey cat toys, 2 grey notebooks, 3 burgundy cat'}
{'name': 'reasoning_about_colored_objects', 'score': 0.292}
{'prompt': "Which of the following is a humorous edit of this artist or movie name: 'rain man'?\nOptions:\n(A) ruin man\n(B) rains man\n(C) rain men\n(D) rainmman\nAnswer:", 'label': '(A)', 'pred': '(D)'}
{'name': 'ruin_names', 'score': 0.236}
{'prompt': 'The following translations from German to English contain a particular error. That error will be one of the following types: Named Entities: An entity (names, places, locations, etc.) is changed to a different entity. Numerical Values: Numerical values (ordinals or cardinals), dates, and/or units are changed. Modifiers or Adjectives: The modifiers and adjectives pertaining to a noun are changed. Negation or Antonyms: Introduce or remove a negation or change comparatives to their antonyms. Facts: Trivial factual errors not pertaining to the above classes are introduced in the translations. Dropped Content: A significant clause in the translation is removed. Please identify that error. Source: Karl Borromäus Joseph Fürst von Liechtenstein war ein kaiserlicher Feldmarschall.\nTranslation: Charles Borromeo Joseph Prince of Liechtenstein was an judicial field marshal.\nThe translation contains an error pertaining to\nOptions:\n(A) Modifiers or Adjectives\n(B) Numerical Values\n(C) Negation or Antonyms\n(D) Named Entities\n(E) Dropped Content\n(F) Facts\nAnswer:', 'label': '(A)', 'pred': '(C)'}
{'name': 'salient_translation_error_detection', 'score': 0.352}
{'prompt': "Which statement is sarcastic?\nOptions:\n(A) He's a generous person, trying to promote a charity stream that has raised millions to help kids in need\n(B) He's a terrible person, trying to promote a charity stream that has raised millions to help kids in need\nAnswer:", 'label': '(B)', 'pred': '(B)'}
{'name': 'snarks', 'score': 0.550561797752809}
{'prompt': 'Is the following sentence plausible? "Elias Lindholm beat the buzzer."\nAnswer:', 'label': 'no', 'pred': 'yes'}
{'name': 'sports_understanding', 'score': 0.552}
{'prompt': 'Today, Susan went to the coffee shop. Between what times could they have gone?\nWe know that:\nSusan woke up at 7am.\nLinda saw Susan driving to the water park from 7am to 11am.\nJohn saw Susan buying clothes at the mall from 11am to 12pm.\nJessica saw Susan taking photos near the Eiffel Tower from 12pm to 1pm.\nSteven saw Susan buying lunch at the deli from 1pm to 2pm.\nThomas saw Susan reading at the library from 2pm to 6pm.\nThe coffee shop was closed after 9pm.\nBetween what times could Susan have gone to the coffee shop?\nOptions:\n(A) 6pm to 9pm\n(B) 7am to 11am\n(C) 1pm to 2pm\n(D) 2pm to 6pm\nAnswer:', 'label': '(A)', 'pred': '(A)'}
{'name': 'temporal_sequences', 'score': 0.22}
{'prompt': 'Alice, Bob, Claire, Dave, and Eve are dancers at a square dance. At the start of a song, they each have a partner: Alice is dancing with Patrick, Bob is dancing with Sam, Claire is dancing with Jamie, Dave is dancing with Lola, and Eve is dancing with Melissa.\nThroughout the song, the dancers often trade partners. First, Dave and Eve switch partners. Then, Dave and Alice switch partners. Then, Eve and Alice switch partners. Then, Claire and Bob switch partners. Finally, Dave and Alice switch partners. At the end of the dance, Alice is dancing with\nOptions:\n(A) Patrick\n(B) Sam\n(C) Jamie\n(D) Lola\n(E) Melissa\nAnswer:', 'label': '(A)', 'pred': '(A)'}
{'name': 'tracking_shuffled_objects_five_objects', 'score': 0.12}
{'prompt': 'Alice, Bob, Claire, Dave, Eve, Fred, and Gertrude are on the same team in a soccer match. At the start of the match, they are each assigned to a position: Alice is playing striker, Bob is playing right winger, Claire is playing left winger, Dave is playing benchwarmer, Eve is playing goalkeeper, Fred is playing center midfielder, and Gertrude is playing cheerleader.\nAs the game progresses, pairs of players occasionally swap positions. First, Eve and Claire trade positions. Then, Gertrude and Alice trade positions. Then, Fred and Bob trade positions. Then, Dave and Fred trade positions. Then, Fred and Bob trade positions. Then, Bob and Eve trade positions. Finally, Claire and Alice trade positions. At the end of the match, Gertrude is playing\nOptions:\n(A) striker\n(B) right winger\n(C) left winger\n(D) benchwarmer\n(E) goalkeeper\n(F) center midfielder\n(G) cheerleader\nAnswer:', 'label': '(A)', 'pred': '(G)'}
{'name': 'tracking_shuffled_objects_seven_objects', 'score': 0.084}
{'prompt': 'Alice, Bob, and Claire are friends and avid readers who occasionally trade books. At the start of the semester, they each buy one new book: Alice gets Ulysses, Bob gets Frankenstein, and Claire gets Lolita.\nAs the semester proceeds, they start trading around the new books. First, Claire and Bob swap books. Then, Bob and Alice swap books. Finally, Claire and Bob swap books. At the end of the semester, Bob has\nOptions:\n(A) Ulysses\n(B) Frankenstein\n(C) Lolita\nAnswer:', 'label': '(B)', 'pred': '(B)'}
{'name': 'tracking_shuffled_objects_three_objects', 'score': 0.332}
{'prompt': 'Question: Sherrie tells the truth. Vernell says Sherrie tells the truth. Alexis says Vernell lies. Michaela says Alexis tells the truth. Elanor says Michaela tells the truth. Does Elanor tell the truth?\nAnswer:', 'label': 'No', 'pred': 'yes'}
{'name': 'web_of_lies', 'score': 0.0}
{'prompt': 'Sort the following words alphabetically: List: syndrome therefrom\nAnswer:', 'label': 'syndrome therefrom', 'pred': 'therefrom'}
{'name': 'word_sorting', 'score': 0.0}
{'average': 0.31822066786892184}
{'bbh': 31.82}
bbh: 31.82
Hello Author,
I just wanted to ask if lorahub can be used on non llm tasks, for example in things like text to speech in vits for example https://github.com/jaywalnut310/vits .
Thanks in advance.
A great work!
I wonder where is the instruction text of each lora in lorahub?
Hi @SivilTaram, thank you for sharing the training scripts!
I was trying to download the FLAN datasets using the script. Seems like the datasets tag "conceptofmind/FLAN_2022" is not available on hugging face anymore. Would it be possible for you to upload the data you have to huggingface or share a copy privately with me? This is critical for reproducing some of the experiments on my end.
Thanks,
Prateek
In the code train_model.py:
model = AutoModelForSeq2SeqLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
if anyone could not use this to initial a flan-t5 model from AutoModelForSeq2SeqLM, then you need to add the following in params:
unk_token="",
bos_token="","
eos_token="
Thanks!
Nice work!
I needed clarification about the adapters you released on HF; the base model is flan-t5, and the adapters are trained on some Flan tasks! Am I mistaking something, or they're taught on the seen data? e.g., the ANLI task is a Flan2021 task(which probably flan-t5 has seen) and is included in your hub.
Also, are you planning to release the code to reproduce the numbers in your paper in the near future?
Thanks.
Hi, if using Chinese, how many examples does it need to train this lora?
Also ,will the lora effect original model ablility heavily?
@mavenlin @SivilTaram @P2333 @chenxwh @Boyu-Mi i want to evaluate my pruned llama-7b model (saved by torch.save), but the following error happend:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/iotsc01/xinpengq/LLM-Pruner-main/lorahub-main/reproduce_bbh.py:181 in │
│ │
│ 178 │ │ # unzip │
│ 179 │ │ os.system("unzip data_bbh.zip") │
│ 180 │ # evaluate the model │
│ ❱ 181 │ evaluate_flan_results_zero_shot("data_bbh", args.ckpt) │
│ 182 │ # # five shot for flan models │
│ 183 │ # evaluate_flan_results_few_shot("data_bbh", "google/flan-t5-large") │
│ 184 │ # # five shot for lorahub models │
│ │
│ /home/iotsc01/xinpengq/LLM-Pruner-main/lorahub-main/reproduce_bbh.py:63 in │
│ evaluate_flan_results_zero_shot │
│ │
│ 60 │ │ │ │ return_tensors="pt", │
│ 61 │ │ │ │ padding=True, │
│ 62 │ │ │ ).to(model.device) │
│ ❱ 63 │ │ │ outputs = model.generate( │
│ 64 │ │ │ │ input_ids=inputs["input_ids"], max_new_tokens=5 │
│ 65 │ │ │ ) │
│ 66 │ │ │ outputs = tokenizer.batch_decode( │
│ │
│ /home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/torch/utils/_contextlib.p │
│ y:115 in decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ /home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/transformers/generation/u │
│ tils.py:1456 in generate │
│ │
│ 1453 │ │ │
│ 1454 │ │ # if we don't pass past_key_values
and a cache_implementation is specified │
│ 1455 │ │ │
│ ❱ 1456 │ │ if generation_config.cache_implementation in NEED_SETUP_CACHE_CLASSES_MAPPING an │
│ 1457 │ │ │ "past_key_values", False │
│ 1458 │ │ ): │
│ 1459 │ │ │ cache_cls = NEED_SETUP_CACHE_CLASSES_MAPPING[generation_config.cache_impleme │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'GenerationConfig' object has no attribute 'cache_implementation'
how can i solve it? the following is my code:
import argparse
import sys
sys.path.append("../LLaMA-Factory-main/src")
import torch
from LLMPruner.peft import PeftModel
from lorahub.algorithm import load_dataset, lorahub_inference
import os
import json
from lorahub.algorithm import lorahub_learning, lorahub_inference
from lorahub.constant import LORA_MODULE_NAMES
import random
from random import shuffle
from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer,AutoModel,AutoTokenizer
def evaluate_flan_results_zero_shot(folder, model_path, **kwargs):
sub_dirs = os.listdir(folder)
pruned_dict = torch.load(model_path, map_location='cpu')
tokenizer, model = pruned_dict['tokenizer'], pruned_dict['model']
model.config.pad_token_id = tokenizer.pad_token_id = 0
model.config.bos_token_id = 1
model.config.eos_token_id = 2
res = [0, 0]
for sub_dir in sub_dirs:
test_file_path = os.path.join(folder, sub_dir, "zero_shot.jsonl")
task_inputs, task_outputs = [], []
for line in open(test_file_path, "r", encoding="utf-8"):
example = json.loads(line)
task_inputs.append(example["context"])
task_outputs.append(example["completion"])
print("Evaluating on task (zero shot): ", sub_dir)
# _,task_perf = lorahub_inference(task_inputs,
# model,
# tokenizer,
# 16,
# task_outputs)
def accuracy_score(outputs, ground_truths):
correct = 0
total = 0
for output, truth in zip(outputs, ground_truths):
# if output.strip().lower().replace(".", "") == truth.strip().lower().replace(".", ""):
if truth.strip().replace(".", "") in output.strip().replace(".", ""):
correct += 1
total += 1
return correct / total * 100
example_predictions = []
# process dataset
dataset = load_dataset(task_inputs, task_outputs, tokenizer)
# use gpu if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
for i in range(0, len(dataset["input"]), 4):
inputs = tokenizer(
dataset["input"][i : i + 4],
max_length=2048,
return_tensors="pt",
padding=True,
).to(model.device)
outputs = model.generate(
input_ids=inputs["input_ids"], max_new_tokens=5
)
outputs = tokenizer.batch_decode(
outputs.to("cpu"), skip_special_tokens=True
)
example_predictions.extend(outputs)
for i in range(len(example_predictions)):
index = example_predictions[i].find("A:")
if index != -1:
example_predictions[i] = example_predictions[i][index+2:]
if task_outputs is not None:
task_perf = accuracy_score(example_predictions, task_outputs)
else:
task_perf = None
print("***",task_perf,"***")
print(example_predictions)
res[0] += 1
res[1] += task_perf
torch.cuda.empty_cache()
print("result avg:",res[1]/res[0])
def evaluate_flan_results_few_shot(folder, flan_model_name):
sub_dirs = os.listdir(folder)
for sub_dir in sub_dirs:
test_file_path = os.path.join(folder, sub_dir, "few_shot.jsonl")
task_inputs, task_outputs = [], []
for line in open(test_file_path, "r", encoding="utf-8"):
example = json.loads(line)
task_inputs.append(example["context"])
task_outputs.append(example["completion"])
print("Evaluating on task (five shot): ", sub_dir)
lorahub_inference(task_inputs,
flan_model_name,
flan_model_name,
16,
task_outputs)
def evaluate_lorahub_results_few_shot(folder, flan_model_name):
sub_dirs = os.listdir(folder)
# 5 seeds used in our experiments
for sub_dir in sub_dirs:
# construct the few-shot examples for lorahub learning
example_inputs, examples_outputs = [], []
example_file_path = os.path.join(folder, sub_dir, "example.jsonl")
for line in open(example_file_path, "r", encoding="utf-8"):
example = json.loads(line)
example_inputs.append(example["context"])
examples_outputs.append(example["completion"])
# random select 5 examples for each task
random.seed(42)
shuffled_set = list(zip(example_inputs, examples_outputs))
random.shuffle(shuffled_set)
example_inputs, examples_outputs = zip(*shuffled_set)
# take the first 5 examples
example_inputs, examples_outputs = example_inputs[:5], examples_outputs[:5]
# load the zero-shot examples for evaluation
test_file_path = os.path.join(folder, sub_dir, "zero_shot.jsonl")
task_inputs, task_outputs = [], []
for line in open(test_file_path, "r", encoding="utf-8"):
example = json.loads(line)
task_inputs.append(example["context"])
task_outputs.append(example["completion"])
task_perf_list = []
for seed in range(1, 6):
random.seed(seed)
def get_lora_module_list():
return random.sample(LORA_MODULE_NAMES, 20)
# get a list of modules to be used in the composition
modules = get_lora_module_list()
# perform LoRAHub learning
module_weights, model, tokenizer = lorahub_learning(lora_module_list=modules,
example_inputs=example_inputs,
example_outputs=examples_outputs,
max_inference_step=40,
batch_size=5)
print("module_weights:", module_weights)
"""
Perform inference to get predictions
"""
_, task_acc = lorahub_inference(example_inputs=task_inputs,
model_or_name_path=model,
tokenizer_or_tokenizer_path=tokenizer,
batch_size=10,
# can set as None if you do not have the ground truth
example_outputs=task_outputs)
task_perf_list.append(task_acc)
avg_perf, max_perf = sum(task_perf_list) / len(task_perf_list), max(task_perf_list)
print("average perf:", avg_perf, "best perf:", max_perf)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Tuning Pruned LLaMA (huggingface version)')
parser.add_argument('--base_model', type=str, default="/home/iotsc01/LLM-Pruner-main/llama-7b-hf", help='base model name')
parser.add_argument('--ckpt', type=str, default=None, help='pruned model path')
parser.add_argument('--lora_ckpt', type=str, default=None)
parser.add_argument('--max_seq_len', type=int, default=128, help='max sequence length')
parser.add_argument('--test_mod', type=str, default="pruned", help='choose from [pruned, tuned, base]')
args = parser.parse_args()
if not os.path.exists("data_bbh"):
# download dataset
os.system("wget https://github.com/sail-sg/lorahub/releases/download/0.1/data_bbh.zip")
# unzip
os.system("unzip data_bbh.zip")
# evaluate the model
evaluate_flan_results_zero_shot("data_bbh", args.ckpt)
# # five shot for flan models
# evaluate_flan_results_few_shot("data_bbh", "google/flan-t5-large")
# # five shot for lorahub models
# evaluate_lorahub_results_few_shot("data_bbh", "google/flan-t5-large")
Very interesting work! I have two questions and hope to get answer:
LoraHub is a really great idea, similar to a few ideas I thought of yesterday.
I am having a hard time downloading the flanv2 dataset provided in this repo.
The git clone command just downloads the metadata for each dataset as shown below.
Using huggingface load_dataset downloads a single unified dataset that comes from the P3 dataset (source).
Huggingface CLI method seems to have authentication issues.
I could manually go through the huggingface flanv2 dataset repo and download each dataset lfs, but that is time-consuming.
Is there any graceful way to download the datasets used?
Hi there, i found i could not use shuffle with seed, if anyone meets this problem, you can use the following codes to replace the original one, which is around line62:
random.seed(42)
shuffled_set = list(zip(example_inputs, examples_outputs))
random.shuffle(shuffled_set)
example_inputs, examples_outputs = zip(*shuffled_set)
reproduce_bbh.py
I have met some bugs while running python example.py
If setting batched=True
at algorithm.py: 105
by default in your code, the exception below raised:
Traceback (most recent call last):
File "/data/sdb/sunyuhan/exp/lorahub/example.py", line 239, in <module>
main()
File "/data/sdb/sunyuhan/exp/lorahub/example.py", line 211, in main
module_weights, model, tokenizer = lorahub_learning(lora_module_list=modules,
File "/data/sdb/sunyuhan/exp/lorahub/lorahub/algorithm.py", line 264, in lorahub_learning
dataset = load_dataset(example_inputs, example_outputs, tokenizer)
File "/data/sdb/sunyuhan/exp/lorahub/lorahub/algorithm.py", line 103, in load_dataset
processed_datasets = dataset.map(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2092, in map
return self._map_single(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 518, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 485, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/datasets/fingerprint.py", line 411, in wrapper
out = func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2469, in _map_single
batch = apply_function_on_filtered_inputs(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2361, in apply_function_on_filtered_inputs
validate_function_output(processed_inputs, indices)
File "/root/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2340, in validate_function_output
raise TypeError(
TypeError: Provided `function` which is applied to all elements of table returns a `dict` of types [<class 'torch.Tensor'>, <class 'torch.Tensor'>, <class 'torch.Tensor'>]. When using `batched=True`, make sure provided `function` returns a `dict` of types like `(<class 'list'>, <class 'numpy.ndarray'>)`.
Solution : set batched=False
:
processed_datasets = dataset.map(
preprocess_func_with_tokenizer,
->batched=False,
num_proc=1,
desc="Running tokenizer on dataset",
)
All the padding value is True
currently, and the exception below raised:
Traceback (most recent call last):
File "/data/sdb/sunyuhan/exp/lorahub/example.py", line 239, in <module>
main()
File "/data/sdb/sunyuhan/exp/lorahub/example.py", line 211, in main
module_weights, model, tokenizer = lorahub_learning(lora_module_list=modules,
File "/data/sdb/sunyuhan/exp/lorahub/lorahub/algorithm.py", line 279, in lorahub_learning
recommendation = optimizer.minimize(get_score_partial, verbosity=1)
File "/root/miniconda3/lib/python3.10/site-packages/nevergrad/optimization/base.py", line 671, in minimize
result = job.result()
File "/root/miniconda3/lib/python3.10/site-packages/nevergrad/optimization/utils.py", line 137, in result
self._result = self.func(*self.args, **self.kwargs)
File "/data/sdb/sunyuhan/exp/lorahub/lorahub/algorithm.py", line 165, in get_score
loss = get_loss(example_dataset, model, batch_size)
File "/data/sdb/sunyuhan/exp/lorahub/lorahub/algorithm.py", line 127, in default_get_loss
for _, batch in enumerate(train_dataloader):
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
data = self._next_data()
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py", line 70, in default_data_collator
return torch_default_data_collator(features)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/data/data_collator.py", line 136, in torch_default_data_collator
batch[k] = torch.tensor([f[k] for f in features])
ValueError: expected sequence of length 81 at dim 2 (got 90)
Solution : set all paddings in algorithm.py
to 'max_length'
:
inputs = tokenizer(
dataset["input"][i : i + batch_size],
max_length=2048,
return_tensors="pt",
->padding='max_length',
).to(device)
The shape of input has now became [5, 1, 2048]
due to the modifies before:
(Pdb) p batch['input_ids'].shape
torch.Size([5, 1, 2048])
And it will cause shape error:
Traceback (most recent call last):
File "/data/sdb/sunyuhan/exp/lorahub/example.py", line 239, in <module>
main()
File "/data/sdb/sunyuhan/exp/lorahub/example.py", line 211, in main
module_weights, model, tokenizer = lorahub_learning(lora_module_list=modules,
File "/data/sdb/sunyuhan/exp/lorahub/lorahub/algorithm.py", line 280, in lorahub_learning
recommendation = optimizer.minimize(get_score_partial, verbosity=1)
File "/root/miniconda3/lib/python3.10/site-packages/nevergrad/optimization/base.py", line 671, in minimize
result = job.result()
File "/root/miniconda3/lib/python3.10/site-packages/nevergrad/optimization/utils.py", line 137, in result
self._result = self.func(*self.args, **self.kwargs)
File "/data/sdb/sunyuhan/exp/lorahub/lorahub/algorithm.py", line 166, in get_score
loss = get_loss(example_dataset, model, batch_size)
File "/data/sdb/sunyuhan/exp/lorahub/lorahub/algorithm.py", line 131, in default_get_loss
outputs = model(**batch)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/peft/peft_model.py", line 1080, in forward
return self.base_model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1678, in forward
encoder_outputs = self.encoder(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 986, in forward
batch_size, seq_length = input_shape
ValueError: too many values to unpack (expected 2)
Solution : squeeze the unused dimension at algorithm.py:128
:
with torch.no_grad():
device = "cuda" if torch.cuda.is_available() else "cpu"
for _, batch in enumerate(train_dataloader):
->batch = {k: v.squeeze(1).to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = model(**batch)
loss = outputs.loss
train_loss += loss.detach().float()
All the bugs maybe caused by setting batched=False
, but I can not set the batched=True
due to unknown reason. And after above 3 modifies I can successfully run the example.py
.
Hi, thanks for the great library! Can the LoRA composition and weight learning approach be used with LoRA modules trained from AutoModelForCausalLM? If so, do you have any supporting scripts etc.
Thanks!
Hello, LoRAHub is a very interesting work!
I have the following two questions about the LoRA modules mentioned in the paper:
(1) According to the original paper, the FLAN-2022 Collection used for Flan-T5 instruction tuning consists of as many as 1836 tasks. However, in LoRAHub, it seems that only 196 tasks were used to train the LoRA modules. Can you please clarify if there is any misunderstanding here or if you selected these 196 tasks from the pool of 1836 tasks?
(2) LoRAHub trained each LoRA module using the same tasks as Flan-T5 for fairness. However, it seems that Flan-T5 sampled train examples from different tasks during training and did not use all the samples. Does this mean that even though LoRAHub's LoRA modules use the same tasks as Flan-T5, they may still use samples that Flan-T5 has not seen?
I have been using the flan-t5-large lora model math_dataset_algebra__linear_1d
, which is designed for mathematical tasks, to process and generate results on the MMLU valid dataset. However, the performance of the model on this dataset has been disappointing and the results obtained are not up to the expected standard. Here is the url of the lora: https://huggingface.co/lorahub/flan_t5_large-math_dataset_algebra__linear_1d
The main issue I'm facing is that despite lora being specifically designed for mathematical tasks, its performance on the MMLU valid dataset is not good as expected.
As lora is optimized for mathematical tasks, I expected it to perform well on the MMLU valid dataset. The outputs generated by the model should align with the ground truth data from the dataset, demonstrating accurate and meaningful results.
Contrary to expectations, the flan-t5-large model's performance on the MMLU valid dataset has been disappointing. The generated outputs often contain errors and inaccuracies, making them unsuitable for my intended use case.
original flan_t5
Subject | Average Accuracy |
---|---|
Abstract Algebra | 0.250 |
College Mathematics | 0.280 |
Elementary Mathematics | 0.280 |
High School Mathematics | 0.296 |
High School Statistics | 0.319 |
Math | 0.289 |
math_dataset_algebra__linear_1d
Subject | Average Accuracy |
---|---|
Abstract Algebra | 0.240 |
College Mathematics | 0.300 |
Elementary Mathematics | 0.267 |
High School Mathematics | 0.289 |
High School Statistics | 0.329 |
Math | 0.286 |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.