LLM_Text2SQL

performance evaluation of LLM models on Text to SQL

Below are the various LLM models tested on the SPIDER Dataset on the Text to SQL problem

Model Name - Parameter size

Mistral - 7B
LLaMA 2 - 7B
WizardLM - 7B
Flan-T5 - 11B
PaLM - 540B

HOW TO RUN THE LLM

All the open-source models were run locally using the module llama-cpp-python. The GGUF files for the open-source models were downloaded from HuggingFace Repositories. For the PaLM model, the PaLM API was used to send requests and receive the results of the query.

Once the gguf files are downloaded, place them in a directory named models. The test set used for this is the dev_set from SPIDER dataset. The test set is in the location : "spider/dev.json. The spider directory must also contain the database to use when we want to provide the schema for the DB along with the user query.

For the PaLM testing, run the following command

python main_Palm.py --test "test_file " --schema 1

The --schema 1 queries the database manually for the schema of the database and appends it to the model prompt as additional information. Set it to 0 to not include this information

For the other gguf files, the python file internally uses the llama-cpp-python to run the inference locally.

python main.py --model_name "" --test "" --schema ""

Create a folder "results" to store results of the inference. The result file is of csv format which contains

The question queried
Gold query
Predicted query

The file name will be "model_name/with_schema" if schema bit is 1. Otherwise it will be "model_name/without_schema"

Spider dataset provides an evaluator to test the accuracy of the predictions. To run the Python program, use the following command

python3 evaluators/evaluation.py --gold "Gold_file --pred "Pred_file" --etype all

The Gold_file must contain only the Gold queries where each query is separated by a newline The pred_file must contain only the predicted queries where each query is separated by a newline

Results on Zero-shot performance of models

CONSISTENCY METRIC

MISTRAL 7B

LLaMA 2 7B

Description of each model

1. Mistral LLM

HIGHLIGHTS

Uses Grouped-query attention
- Speeds up inference of the model
- Reduces memory req during decoding
Uses Sliding window attention
- Handles longer sequences with a reduced computation cost

CAPABILITIES

Code generation
Reasoning
Mathematics

LIMITATIONS

Prone to hallucination
Prone to prompt injections
Low knowledge store due to low parameter size

2. WizardLM LLM

HIGHLIGHTS

It is a fine-tuned LLaMA LLM using the evol-instruct method
- trained with fully evolved instructions
Optimized to perform highly complex instructions
Outperforms Vicuna and Alpaca

CAPABILITIES

instruction-following LLMs
Code Generation

LIMITATIONS

Prone to hallucination
Low knowledge store due to low parameter size

2. LLaMA 2 LLM by META

HIGHLIGHTS

Llama 2 is pre-trained using publicly available online data (2 trillion "tokens").
Iteratively refined using (RLHF), which includes rejection sampling and proximal policy optimization (PPO)
Only open-source model on par with ChatGPT, Anthropic, and PaLM on all general NLP tasks

CAPABILITIES

Applied to many different use cases for example
Code Generation
Sentence completion
Summarization
Sentiment analysis

LIMITATIONS

Prone to hallucination
Inappropriate content (if not used responsibly)
Potential for bias

Flan-T5 LLM

Highlights

Enhanced T5: Builds upon the powerful T5 model with further fine-tuning
Multi-task learning: Trained on diverse tasks, making it versatile for various NLP applications.
Five sizes: small, base, large, XL, and XXL for different performance and resource requirements.
Open-sourced: Accessible through Hugging Face and can be fine-tuned for specific tasks.

CAPABILITIES:

Text summarization
Question answering
Text generation
Language Translation

LIMITATIONS:

Potential for bias
Inappropriate content (if not used responsibly)
Significant computational resources for training and inference.

PaLM

HIGHLIGHTS

Massive parameter size: advanced reasoning and understanding capabilities.
Multi-task learning: Trained on a diverse set of tasks
Improved zero-shot and few-shot learning.
Handles multiple languages with fluency and accuracy.

CAPABILITIES

Advanced reasoning tasks: Solves complex problems, comprehends riddles
Question answering
Natural language generation: creative text formats like poems, scripts, emails
Code understanding and generation: Analyzes existing code, generates new code snippets, and helps with code completion.

Limitations:

Potential for bias: Trained on a massive dataset that may contain inherent biases, reflected in its outputs.
Ethical considerations: Can generate inappropriate content if not used responsibly.
Demands significant computational resources for training and inference.

saivignesh-05 / llm_text2sql Goto Github PK

llm_text2sql's Introduction

LLM_Text2SQL

Below are the various LLM models tested on the SPIDER Dataset on the Text to SQL problem

Model Name - Parameter size

HOW TO RUN THE LLM

Results on Zero-shot performance of models

CONSISTENCY METRIC

MISTRAL 7B

LLaMA 2 7B

Description of each model

1. Mistral LLM

2. WizardLM LLM

2. LLaMA 2 LLM by META

Flan-T5 LLM

PaLM

llm_text2sql's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org