Code Monkey home page Code Monkey logo

sql_llm's Introduction

Text-to-SQL Generation Using Fine-tuned LLMs on Intel GPUs(XPUs) and QLoRA.

This repository includes code for fine-tuning a Language Model for text-to-SQL tasks and for generating SQL queries with the fine-tuned model. Both the fine-tuning and generation processes leverage QLoRA, a Quantized Low-Rank Parameter Efficient finetuning method, enabled by Intel's BigDL library on Intel GPUs.

lora_adapters_v2(1)

Prerequisites

  • Python 3.x
  • PyTorch
  • Transformers library
  • Datasets library
  • Intel Extension for PyTorch (IPEX)
  • Intel BigDL-LLM[XPU]

Installation

  1. Clone this repo.
git clone https://github.com/your_username/your_repository.git
  1. Install required python packages
pip install -r requirements
  1. Install Intel BigDL llm package
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

File Descriptions

  • finetune.py : Contains code for fine-tuning a pre-trained Language Model on text-to-SQL tasks.
  • generate.py : Contains code for generating SQL queries using a fine-tuned model.

Fine-Tuning a Model (finetune.py)

To finetune a model, run the finetune.py script

python finetune.py
============================================================
Training Parameters:
Foundation model:         NousResearch/CodeLlama-7b-hf
Model save path:          ./final_model
Device used:              xpu
Intel GPU:                Intel(R) Data Center GPU Max 1100
Batch size per device:    32
Gradient accum. steps:    4
Warmup steps:             100
Save steps:               20
Evaluation steps:         20
Max steps:                300
Learning rate:            0.0003
Max gradient norm:        0.3
Save total limit:         3
Logging steps:            20
============================================================

Here is how the loss chart looks at the end of 300 steps of finetuning:

As you can see the loss has a big drop in the intial steps and training loss gradually tapers to around 0.6:

loss_chart

Key Features:

  • Downloads a pre-trained model based on the given base model ID.
  • Tokenizes the input questions, context, and answers.
  • Fine-tunes the model using the tokenized data and qLoRA.
  • Saves the fine-tuned model.

Configuration:

  • BASE_MODEL: The pre-trained model to use for fine-tuning.
  • MODEL_PATH: Path to save the fine-tuned model.
  • DEVICE: Device to run the model on.

SQL Query Generation (generate.py)

To generate SQL queries using the fine-tuned model, run the generate.py script.

Key Features:

  • Uses either the base model or a fine-tuned model for SQL query generation.
  • Loads sample data and generates SQL queries for each sample.

Configuration:

  • BASE_MODEL: The base model to use for inference.
  • MODEL_PATH: Path to the fine-tuned model.
  • LORA_CHECKPOINT: Latest checkpoint for the fine-tuned model.
  • TEST_DATA: Path to the test data file.

Following a 15-minute training session, the finetuned model demonstrates enhanced proficiency in generating SQL queries that more accurately reflect the given questions, compared to the base model. With additional training steps, we can anticipate further improvements in the model's response accuracy:

Finetuned model generation:

Base model generation:

Default Configurations

Model

  • Default base model for fine-tuning: openlm-research/open_llama_3b
  • Model path for saving the fine-tuned LoRA adaptor (incase of interruptions): ./saved_model
  • Path for saving task based (here it is text to sql) LoRA adaptors: ./lora_models

Dataset

  • Default dataset for fine-tuning: b-mc2/sql-create-context

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

sql_llm's People

Contributors

rahulunair avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sidhu0708

sql_llm's Issues

Juxtapose against prompt engineering?

Hi I'm curious how your system works with arbitrary schemas. I've been experimenting in this space too and I'm having some success wtih the system prompt including the schema and some sample data, for an arbitrary schema. Fine-tuning can always help of course but curious because your prompts seemed to be schemaless? Best of luck and thanks for sharing your project!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.