Code Monkey home page Code Monkey logo

kor_llama's Introduction

image# Kor_llama llama 를 한국어 기반으로 Train 하고자 하는 mini project입니다.

학습 시간 GPU
Colab 60hour RTX 4090

EVAL

image

높은 점수는 아니지만 확실하게 finetuning 되었다는 것을 알 수 있습니다.

Data

Dacon 에서 진행된 고객 대출등급 구분 해커톤 의 데이터를 사용하였습니다.

Model

base model : LLama2 image

Dataset

image

데이터 col 이름 type preprocessing
대출금액 int
대출기간 object month 제거
근로기간 object year 제거
주택소유상태 object
연간소득 int
부채/소득 float
총계좌수 int
대출목적 object 영어로 translate
최근 2년 연체 int
총 상환 원금 float
총 면제 금액 float
연체 계좌수 int
대출 등급 object int2label / label2int dict 생성

FineTuning

hugging face 의 Transformer library, ( PEFT, STF , Trainer ) 등을 활용해서 진행했습니다.

Instruct Tuning 을 진행하였고 , 7B의 모델을 PEFT 방식으로 Train 했습니다.

Peft Parameter

peft_config = LoraConfig(task_type=TaskType.CAUSAL_LM,
                        inference_mode=False, # 학습하는지  
                        r=8, # 작을 수록 trainable 한 파라미터의 개수가 낮아진ㄷ.ㅏ  
                        lora_alpha=16,  # scaling factor 
                        lora_dropout=0.1) # dropout


Trainable: 4194304 | total: 6860050432 | Percentage: 0.0611%

huggingface Trainer

from transformers import Trainer, TrainingArguments

args = TrainingArguments(
    output_dir="peftllama0116",
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    evaluation_strategy="steps",
    eval_steps=3000,
    logging_steps=100,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    weight_decay=0.1,
    warmup_steps=1_000,
    lr_scheduler_type="cosine",
    learning_rate=5e-4,
    fp16=True,
    push_to_hub=False,
    optim = "adamw_torch",
    save_strategy = "steps",
    save_steps = 1000,
    save_total_limit=2

)

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets,
    eval_dataset=val_tokenized_datasets
)

모델 생성 configure

generate_config = GenerationConfig(
        pad_token_id = tokenizer.eos_token_id,
        do_sample=True,
        top_k=1,
        top_p = 0.9,
        num_return_sequences=1,
        repetition_penalty=1.1,
        max_new_tokens=100,
        temperature = 0.8
    )

Result

before tuning

you are financial speciallist. And you will see ###condition of one person. and judge his/her Credit rating
response about ###instruction 

###Condition
he/she owe $ 16800000 , take out a loan during 60 months. 
He/she works for 10 years. Status of home ownership is MORTGAGE
His/her annual income was about 82680000. His/her total debt/income ratio was 21.960000 
He/she have total 27 accounts, and he/she take out a loan for debt consolidation.
He/she overdue of interest 0 times over last 2 year.
His/Her total redemption principal $ 199284. His/Her total interst pay $ 159924.000000 
His/Her total overdue payment $ 0.000000. He/She have 0.000000 overdue account for total 

###instruction: with those conditions, Guess His Credit Rating in one of A,B,C,D,E,F,G 


###Credit Rating : 

Guess the credit rating of A person

I think his credit rating is CCC+ 

###Condition:
A person owe $ 16800000 , take out a loan during 60 months. 
He/she works for 10 years. Status of home ownership is MORTGAGE
His/her annual income was about 82680000. His/his total debt/income ratio was 21.960000 
He/she have total 27 accounts, and he/she take out a loan for debt consolidation.
He/she overdue of interest 0 times over last 2 year.
His/Her total redemption principal $ 199284. His/Her total interst pay $ 159924.000000 
His/Her total overdue payment $ 0.000000. He/She have 0.000000 overdue account for total 

###Credit Rating : 

Guess the credit rating of B person

I think his credit rating is CCC 

###Condition:
B person owe $ 16800000 , take out a loan during 60 months. 
He/she works


you are financial speciallist. And you will see ###condition of one person. and judge his/her Credit rating
response about ###instruction 

###Condition
he/she owe $ 16800000 , take out a loan during 36 months. 
He/she works for 8 years. Status of home ownership is MORTGAGE
His/her annual income was about 132000000. His/her total debt/income ratio was 19.640000 
He/she have total 12 accounts, and he/she take out a loan for housing improvement.
He/she overdue of interest 0 times over last 2 year.
His/Her total redemption principal $ 394692. His/Her total interst pay $ 146604.000000 
His/Her total overdue payment $ 0.000000. He/She have 0.000000 overdue account for total 

###instruction: with those conditions, Guess His Credit Rating in one of A,B,C,D,E,F,G 


###Credit Rating : 

#@시스템#사진#
#@이모티콘#
오케이~~^^❤️


## 다른 학습된 멘트가 나오게 된다. 아마 SNS 데이터를 학습시켜둔 것 같음. 


after tuning

you are financial speciallist. And you will see ###condition of one person. and judge his/her Credit rating
response about ###instruction 

###Condition
he/she owe $ 16800000 , take out a loan during 36 months. 
He/she works for 8 years. Status of home ownership is MORTGAGE
His/her annual income was about 132000000. His/her total debt/income ratio was 19.640000 
He/she have total 12 accounts, and he/she take out a loan for housing improvement.
He/she overdue of interest 0 times over last 2 year.
His/Her total redemption principal $ 394692. His/Her total interst pay $ 146604.000000 
His/Her total overdue payment $ 0.000000. He/She have 0.000000 overdue account for total 

###instruction: with those conditions, Guess His Credit Rating in one of A,B,C,D,E,F,G 


###Credit Rating : 

###Rating Description : 

###Total Debt : 

###Total Income : 

###Total Debt/Income Ratio : 

###Other Conditions : 

###Credit Rating : B 

###Rating Description : 

###Total Debt : 

###Total Income : 

###Total Debt/Income Ratio :

you are financial speciallist. And you will see ###condition of one person. and judge his/her Credit rating
response about ###instruction 

###Condition
he/she owe $ 16800000 , take out a loan during 60 months. 
He/she works for 10 years. Status of home ownership is MORTGAGE
His/her annual income was about 82680000. His/her total debt/income ratio was 21.960000 
He/she have total 27 accounts, and he/she take out a loan for debt consolidation.
He/she overdue of interest 0 times over last 2 year.
His/Her total redemption principal $ 199284. His/Her total interst pay $ 159924.000000 
His/Her total overdue payment $ 0.000000. He/She have 0.000000 overdue account for total 

###instruction: with those conditions, Guess His Credit Rating in one of A,B,C,D,E,F,G 


###Credit Rating : 

###Rating Description : 

###Total Debt : 

###Total Annual Interest Payment : 

###Overdue Payment : 

###Debt Consolidation : 

###Other Condition : 

###Credit Rating : B

Prompt Engineering

좀 더 명확한 지시문을 제시했을 경우

you are financial speciallist. And you will see ###condition of one person. and judge his/her Credit rating
response about ###instruction 

###Condition
he/she owe $ 16800000 , take out a loan during 60 months. 
He/she works for 10 years. Status of home ownership is MORTGAGE
His/her annual income was about 82680000. His/her total debt/income ratio was 21.960000 
He/she have total 27 accounts, and he/she take out a loan for debt consolidation.
He/she overdue of interest 0 times over last 2 year.
His/Her total redemption principal $ 199284. His/Her total interst pay $ 159924.000000 
His/Her total overdue payment $ 0.000000. He/She have 0.000000 overdue account for total 

with those conditions, Guess His Credit Rating in one of A,B,C,D,E,F,G 


###instruction: with those conditions, Guess His Credit Rating in one of A,B,C,D,E,F,G 


###Credit Rating : C 


###justification : overdue payment was zero 


###derivation : see attached file 


###recommendation : no need to take any action 


###footnotes : see attached
you are financial speciallist. And you will see ###condition of one person. and judge his/her Credit rating
response about ###instruction 

###Condition
he/she owe $ 7200000 , take out a loan during 36 months. 
He/she works for 1 years. Status of home ownership is MORTGAGE
His/her annual income was about 102000000. His/her total debt/income ratio was 19.910000 
He/she have total 34 accounts, and he/she take out a loan for debt consolidation.
He/she overdue of interest 1 times over last 2 year.
His/Her total redemption principal $ 550020. His/Her total interst pay $ 106968.000000 
His/Her total overdue payment $ 0.000000. He/She have 0.000000 overdue account for total 

###instruction: with those conditions, Guess His Credit Rating in one of A,B,C,D,E,F,G 


###Credit Rating : A 


###Justification For The Credit Rating : overdue payment was not made within 30 days


###Debtors should not be given credit if they have been overdue payment more than once in last 2 year.


###If there was no late payment, give them Credit Rating : A 


###Other conditions : overdue payment was not made within

INSIGHT

  1. max_token_length : 생성 토큰의 갯수에 따라 , ( max token 을 top k 를 통해 앞에 생성할 token 의 개수를 보고 설정하기에 ) 차이가 존재한다. max_token 20 image

max_token 100 image

  1. Peft 도중 step 이 save 되더라도 adapter 가 아닌 configure 가 저장 되는 구조이기에 원하는

    방식대로 finetuning 된 모델이 나오지 않는다.

  2. 학생의 점수 , 실력, 참여도를 자체적인 기준을 세워서 정립하고 이를 바탕으로 LLM을 학습을 시킨다면 좋은 결과를 얻을 수 있을 것이라고 생각한다.

    • 보조교사, 도우미, 공정한 채점 기준 등의 지표로 작동할 수 있지 않을까 ? ( 신뢰성 여부 판별 不 )

Reference

  1. Llama 레시피 북
  2. Llama 깃 허브
  3. Llama 허깅페이스
  4. Paper

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.