Code Monkey home page Code Monkey logo

kofalcontune's Introduction

KoFalcontune

LLM Sota를 갱신한 Falcon을 한국어에 Finetune 할 수 있도록 자료를 공유합니다.

Faclon

Model License Commercial use? Pretraining length [tokens] Pretraining compute [PF-days] Leaderboard score K,V-cache size for a 2.048 context
StableLM-Alpha-7B CC-BY-SA-4.0 1,500B 700 38.3* 800MB
LLaMA-7B LLaMA license 1,000B 500 47.6 1,100MB
MPT-7B Apache 2.0 1,000B 500 48.6 1,100MB
Falcon-7B Apache 2.0 1,500B 700 48.8 20MB
LLaMA-33B LLaMA license 1,500B 3200 56.9 3,300MB
LLaMA-65B LLaMA license 1,500B 6300 58.3 5,400MB
Falcon-40B Apache 2.0 1,000B 2800 60.4 240MB

ENV

Ubuntu Kuebeflow
A100 80G

Finetune Info

Data

KoAlpaca-v1.1

Origin Model : ybelkada/falcon-7b-sharded-bf16

학습이 완료된 모델을 공유합니다.

Koalpaca Fientune Model

inputs = tokenizer("광해군은 폭군이었나요 ?", return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=512)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])

# output

광해군은 폭군이었나요?
광해군은 조선시대에 수많은 외교군들이었습니다. 
 중에서도 가장 유명한 것은 폭군으로, 그들은 외교정책을 선포하고 신라의 지배를 부담하는 전략을 담당했습니다. 
그러나 광해군은 폭군이지만, 계보형의 외교도 많이 하고 있었습니다. 
그들은 외교문화의 중요성을 인식하고 있었기 때문입니다. 
이들은 외교문화에 맞서 전승전쟁을 전개하였으며, 이란에서 벌였던 외교군의 위업을 대신했습니다. 
그러나 이들은 외교문화에
inputs = tokenizer("기계식 키보드 청소방법", return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=512)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])

# output

기계식 키보드 청소방법은 다음과 같습니다. 
1. 먼저, 키보드에 접착된 물은 무조건 닦아줘야 합니다.
2. 그리고 키보드에 묻은 이쑤시개가 있는 경우, 먼저 진공접근으로 녹인  스프레이를 제거합니다.
3. 스프레이를 제거하면  깨끗해진다면, 청소기를 사용하는 것이 좋습니다. 청소기를 사용하면 일반적으로 키보드를 청소할  있습니다.
4. 마지막으로, 키보드를 뒤집어서 공기가 남지 않게 보충합니다.

그리고 청소기를 사용하는 방법은 다음과

wandb

학습은 10000 step을 진행했습니다.

image

Origin Git

falcontune: 4-Bit Finetuning of FALCONs on a Consumer GPU

falcontune allows finetuning FALCONs (e.g., falcon-40b-4bit) on as little as one consumer-grade A100 40GB.

Its features tiny and easy-to-use codebase.

One benefit of being able to finetune larger LLMs on one GPU is the ability to easily leverage data parallelism for large models.

Underneath the hood, falcontune implements the LoRA algorithm over an LLM compressed using the GPTQ algorithm, which requires implementing a backward pass for the quantized LLM.

falcontune can generate a 50-token recipe on A100 40GB for ~ 10 seconds using triton backend

$ falcontune generate --interactive --model falcon-40b-instruct-4bit --weights gptq_model-4bit--1g.safetensors --max_new_tokens=50 --use_cache --do_sample --prompt "How to prepare pasta?"


How to prepare pasta?
Here's a simple recipe to prepare pasta:

Ingredients:
- 1 pound of dry pasta
- 4-6 cups of water
- Salt (optional)

Instructions:
1. Boil the water

Took 10.042 s

This example is based on the model: TheBloke/falcon-40b-instruct-GPTQ.

Here is a Google Colab. You will need a A100 40GB to finetune the model.

Installation

Setup

pip install -r requirements.txt 
python setup.py install         

The default backend is triton which is the fastest. For cuda support install also the CUDA kernels:

python setup_cuda.py install         

Running falcontune

The above process installs a falcontune command in your environment.

Download Models

First, start by downloading the weights of a FALCON model:

$ wget https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/resolve/main/gptq_model-4bit--1g.safetensors

Generate Text

You can generate text directly from the command line. This generates text from the base model:

$ falcontune generate \
    --interactive \
    --model falcon-40b-instruct-4bit \
    --weights gptq_model-4bit--1g.safetensors \
    --max_new_tokens=50 \
    --use_cache \
    --do_sample \
    --instruction "Who was the first person on the moon?"

Finetune A Base Model

You may also finetune a base model yourself. First, you need to download a dataset:

$ wget https://github.com/gururise/AlpacaDataCleaned/raw/main/alpaca_data_cleaned.json

You can finetune any model of the FALCON family:

FALCON-7B
$ falcontune finetune \
    --model=falcon-7b \
    --weights=tiiuae/falcon-7b \
    --dataset=./alpaca_data_cleaned.json \
    --data_type=alpaca \
    --lora_out_dir=./falcon-7b-alpaca/ \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]'

The above commands will download the model and use LoRA to finetune the quantized model. The final adapters and the checkpoints will be saved in `falcon-7b-alpaca` and available for generation as follows:

$ falcontune generate \
    --interactive \
    --model falcon-7b \
    --weights tiiuae/falcon-7b \
    --lora_apply_dir falcon-7b-alpaca \
    --max_new_tokens 50 \
    --use_cache \
    --do_sample \
    --instruction "How to prepare pasta?"
FALCON-7B-INSTRUCT
$ falcontune finetune \
    --model=falcon-7b-instruct \
    --weights=tiiuae/falcon-7b-instruct \
    --dataset=./alpaca_data_cleaned.json \
    --data_type=alpaca \
    --lora_out_dir=./falcon-7b-instruct-alpaca/ \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]'

The above commands will download the model and use LoRA to finetune the quantized model. The final adapters and the checkpoints will be saved in `falcon-7b-instruct-alpaca` and available for generation as follows:

$ falcontune generate \
    --interactive \
    --model falcon-7b-instruct \
    --weights mosaicml/falcon-7b-instruct \
    --lora_apply_dir falcon-7b-instruct-alpaca \
    --max_new_tokens 50 \
    --use_cache \
    --do_sample \
    --instruction "How to prepare pasta?"
FALCON-40B
$ falcontune finetune \
    --model=falcon-40b \
    --weights=tiiuae/falcon-40b \
    --dataset=./alpaca_data_cleaned.json \
    --data_type=alpaca \
    --lora_out_dir=./falcon-40b-alpaca/ \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]'

The above commands will download the model and use LoRA to finetune the quantized model. The final adapters and the checkpoints will be saved in `falcon-40b-alpaca` and available for generation as follows:

$ falcontune generate \
    --interactive \
    --model falcon-40b \
    --weights tiiuae/falcon-40b\
    --lora_apply_dir falcon-40b-alpaca \
    --max_new_tokens 50 \
    --use_cache \
    --do_sample \
    --instruction "How to prepare pasta?"
FALCON-40B-INSTRUCT
$ falcontune finetune \
    --model=falcon-40b-instruct \
    --weights=tiiuae/falcon-40b-instruct \
    --dataset=./alpaca_data_cleaned.json \
    --data_type=alpaca \
    --lora_out_dir=./falcon-40b-instruct-alpaca/ \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]'

The above commands will download the model and use LoRA to finetune the quantized model. The final adapters and the checkpoints will be saved in `falcon-40b-instruct-alpaca` and available for generation as follows:

$ falcontune generate \
    --interactive \
    --model falcon-40b-instruct \
    --weights tiiuae/falcon-40b-instruct\
    --lora_apply_dir falcon-40b-alpaca \
    --max_new_tokens 50 \
    --use_cache \
    --do_sample \
    --instruction "How to prepare pasta?"
FALCON-7B-INSTRUCT-4BIT
$ wget https://huggingface.co/TheBloke/falcon-7b-instruct-GPTQ/resolve/main/gptq_model-4bit-64g.safetensors

$ falcontune finetune \
    --model=falcon-7b-instruct-4bit \
    --weights=gptq_model-4bit-64g.safetensors \
    --dataset=./alpaca_data_cleaned.json \
    --data_type=alpaca \
    --lora_out_dir=./falcon-7b-instruct-4bit-alpaca/ \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]'

The above commands will download the model and use LoRA to finetune the quantized model. The final adapters and the checkpoints will be saved in `falcon-7b-instruct-4bit-alpaca` and available for generation as follows:

$ falcontune generate \
    --interactive \
    --model falcon-7b-instruct-4bit \
    --weights gptq_model-4bit-64g.safetensors \
    --lora_apply_dir falcon-7b-instruct-4bit-alpaca \
    --max_new_tokens 50 \
    --use_cache \
    --do_sample \
    --instruction "How to prepare pasta?"
FALCON-40B-INSTRUCT-4BIT
$ wget https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/resolve/main/gptq_model-4bit--1g.safetensors

$ falcontune finetune \
    --model=falcon-40b-instruct-4bit \
    --weights=gptq_model-4bit--1g.safetensors \
    --dataset=./alpaca_data_cleaned.json \
    --data_type=alpaca \
    --lora_out_dir=./falcon-40b-instruct-4bit-alpaca/ \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]'

The above commands will download the model and use LoRA to finetune the quantized model. The final adapters and the checkpoints will be saved in `falcon-40b-instruct-4bit-alpaca` and available for generation as follows:

$ falcontune generate \
    --interactive \
    --model falcon-40b-instruct-4bit \
    --weights gptq_model-4bit--1g.safetensors \
    --lora_apply_dir falcon-40b-instruct-4bit-alpaca \
    --max_new_tokens 50 \
    --use_cache \
    --do_sample \
    --instruction "How to prepare pasta?"

Acknowledgements

falcontune is based on the following projects:

  • The GPTQ algorithm and codebase by the IST-DASLAB with modifications by @qwopqwop200
  • The alpaca_lora_4bit repo by johnsmith0031
  • The PEFT repo and its implementation of LoRA
  • The LLAMA, OPT, and BLOOM models by META FAIR and the BigScience consortium
  • The llmtune repo by kuleshov-group

Consultations

Need a custom solution? Let me know: [email protected]

kofalcontune's People

Contributors

halokim avatar rmihaylov avatar rmmihaylov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.