Code Monkey home page Code Monkey logo

gar's Introduction

GAR: A Generate-and-Rank Approach for Natural Language to SQL Translation

NL2SQL by generating candidate SQLs and ranking for them.

The official repository which contains the code and pre-trained models for our paper GAR: A Generate-and-Rank Approach for Natural Language to SQL Translation. 2023 IEEE 39th International Conference on Data Engineering (ICDE).

license stars FORK Issues

If you use our code in your study, or find GAR useful, please cite it as follows:

@inproceedings{Yuankai2023:GAR,
  author = {Yuankai Fan,Zhenying He,Tonghui Ren,Dianjun Guo,Lin Chen,Ruisi Zhu,Guanduo Chen,Yinan Jing,Kai Zhang,X.Sean Wang},
  title = {{GAR}: A Generate-and-Rank Approach for Natural Language to SQL Translation},
  booktitle = {39th {IEEE} International Conference on Data Engineering, {ICDE} 2023, Anaheim, CA, USA, April 3-7, 2023},
  pages = {110--122},
  publisher = {{IEEE}},
  year = {2023},
  url = {https://doi.org/10.1109/ICDE55515.2023.00016},
  doi = {10.1109/ICDE55515.2023.00016},
  timestamp = {Fri, 28 Jul 2023 08:30:20 +0200},
  biburl = {https://dblp.org/rec/conf/icde/FanHRGCZCJZW23.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Watch The Video

Watch the video

Overview

This code implements:

  • The GAR method for generate-and-ranking queries.
  • An NL2SQL model under a few-shot learning setting, which can achieve significant improvement over several benchmarks.

About GAR

TL;DR: We introduce GAR -- a novel generate-and-rank approach to tackle NL2SQL translation problem. GAR assumes a set of sample queries on a database is given and can use SQL queries that are "component-similar" to the given samples to answer NL queries.

The objective of NL2SQL translation is to convert a natural language query into an SQL query.

Although seq2seq-based approaches have shown good results on standard benchmarks, they may not perform well on more complex queries that demand an understanding of the database's specific structure and semantics. The main issue is that such complex queries require more training data on the target database, which is not generally provided in the benchmarks.

A more effective strategy would be to move away from the conventional seq2seq framework and aim to address the shortage of training data for the NL2SQL task. This is the approach taken by the GAR method.

How it works

Given a set of sample SQL queries, GAR uses the following three steps to do the translation:

  1. Generalization: Use a set of generalization rules to generalize the sample queries to provide good coverage for component-similar queries.
  2. SQL2NL: Translate the sample and generalized SQL queries to NL expressions called dialects.
  3. Learning-to-rank: Rank the dialect expressions based on the semantic similarity with a given NL query and find the closest one and hence the SQL query as the translation result.

This process is illustrated in the diagram below:

Quick Start

Prerequisites

First, you should set up a python environment. This code base has been tested under python 3.7.

  1. Install the required packages
pip install -r requirements.txt --no-deps
  1. Download the Spider and GEO datasets, and put the data into the datasets folder (QBEN data is published here.). Unpack the datasets and create the following directory structure:
/datasets
├── spider
│   ├── database
│   │   └── ...
│   ├── dev.json
│   ├── dev_gold.sql
│   ├── tables.json
│   ├── train_gold.sql
│   ├── train_others.json
│   └── train_spider.json
└── geo
    ├── database
    │   └── ...
    ├── dev.json
    ├── test.json
    ├── train.json
    ├── tables.json

Training

The training script is located in the root directory train_pipeline.sh. You can run it with:

$ bash train_pipeline.sh <dataset_name> <train_data_path> <dev_data_path> <table_path> <db_dir>

The training script will create the directory saved_models in the current directory. Training artifacts like checkpoints will be stored in this directory.

The training includes four phases:

  1. Retrieval model training data generation. Please note that this phase expects to take some time to generate a large set of SQL-dialect pairs for each training database.
  2. Retrieval model training
  3. Re-ranking model training data generation
  4. Re-ranking model training

The default configuration is stored in configs/cofig.py. You can use this configuration to reproduce our results.

Download the checkpoints

We have uploaded ranking model checkpoints on different datasets below. Meanwhile, since the generalization process is time-consuming, we also provide the generalized SQL queries for reproducing our experimental results.

Model Dataset Download Queries Download Model
gar.geo GEO gar.geo.zip gar.geo.tar.gz
gar-j.geo GEO gar-j.geo.zip gar-j.geo.tar.gz
gar.spider Spider gar.spider.zip gar.spider.tar.gz
gar-j.spider Spider gar-j.spider.zip gar-j.spider.tar.gz

Unpack the model checkpoints and the corresponding generalized queries with the following directory structure:

/saved_models
├── gar.spider
│     └── ...
├── gar-j.spider
│     └── ...
├── gar.geo
│     └── ...
├── gar-j.geo
│     └── ...
/serialization
├── gar.spider
├── gar-j.spider
├── gar.geo
├── gar-j.geo

Evaluation

The evaluation script is located in the root directory test_pipeline.sh. You can run it with:

$ bash test_pipeline.sh <dataset_name>  <test_file_path> <test_gold_sql_file> <table_path> <db_dir>

The evaluation script will create the directory output in the current directory. The evaluation results will be stored there.

Demonstration

GAR is accomplished with demonstration as well! This is our demonstration paper GENSQL: A Generative Natural Language Interface to Database Systems. 2023 IEEE 39th International Conference on Data Engineering (ICDE).

Application Demo

http://47.116.100.156:30279

If you want to try on your own data, please contact us for the permission of the Admin Interface.😊

Please cite it if you use GenSQL in your work:

@inproceedings{Yuankai2023:GenSQL,
  author = {Yuankai Fan,Yuankai Fan, Tonghui Ren, Zhenying He, X.Sean Wang, Ye Zhang, Xingang Li},
  title = {{GenSQL}: A Generative Natural Language Interface to Database Systems},
  booktitle = {39th {IEEE} International Conference on Data Engineering, {ICDE} 2023,Anaheim, CA, USA, April 3-7, 2023},
  pages = {3603--3606},
  publisher = {{IEEE}},
  year = {2023},
  url = {https://doi.org/10.1109/ICDE55515.2023.00278},
  doi = {10.1109/ICDE55515.2023.00278},
  timestamp = {Thu, 27 Jul 2023 17:17:25 +0200},
  biburl = {https://dblp.org/rec/conf/icde/FanRHWZL23.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contributing

This project welcomes contributions and suggestions 👍.

If you find bugs in our code, encounter problems when running the code, or have suggestions for GAR, please submit an issue or reach out to me ([email protected])!

gar's People

Contributors

kaimary avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

hc20000702

gar's Issues

Error SQL statements in training steps

Thank you for your excellent work. However, when I used spider to train the model (I used the script bash train_pipeline.sh spider datasets/spider/train_spider.json datasets/spider/dev.json datasets/spider/tables.json datasets/spider/database). When sql_str is select count ( * ) from follows group by "value", an AssertionError occurs: Error col: "value".

I wonder what I should do to solve this problem. Should I skip all the SQL strings that cause such an error by using a simple try-except statement?

The traceback is:

  4%|█████▊                                                                                                                                           | 278/7000 [00:00<00:18, 371.50it/s]Traceback (most recent call last):
  File "/f/.conda/envs/SQL_GAR/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/f/.conda/envs/SQL_GAR/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/f/code/SQL/GAR/scripts/retrieval_model_train_script.py", line 271, in <module>
    main()
  File "/f/.conda/envs/SQL_GAR/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/f/.conda/envs/SQL_GAR/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/f/.conda/envs/SQL_GAR/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/f/.conda/envs/SQL_GAR/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/f/code/SQL/GAR/scripts/retrieval_model_train_script.py", line 39, in main
    dataset_name, train_file, tables_file, db_dir, output_dir
  File "/f/code/SQL/GAR/scripts/retrieval_model_train_script.py", line 118, in generate_triples_for_retrieval_model
    checker = RecallChecker(data_file, tables_file, db_dir, output_dir)
  File "/f/code/SQL/GAR/utils/recall_checker_utils.py", line 31, in __init__
    self.initialize(dataset_file, tables_file)
  File "/f/code/SQL/GAR/utils/recall_checker_utils.py", line 42, in initialize
    g_sql = rebuild_sql(db_id, self.db_dir, sql_nested_query_tmp_name_convert(query), self.kmaps, tables_file)
  File "/f/code/SQL/GAR/utils/evaluation/evaluate.py", line 537, in rebuild_sql
    sql = get_sql(schema, sql_str)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 630, in get_sql
    _, sql = parse_sql(toks, 0, tables_with_alias, schema)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 566, in parse_sql
    idx, group_col_units = parse_group_by(toks, idx, tables_with_alias, schema, default_tables)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 477, in parse_group_by
    idx, col_unit = parse_col_unit(toks, idx, tables_with_alias, schema, default_tables)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 241, in parse_col_unit
    idx, col_id = parse_col(toks, idx, tables_with_alias, schema, default_tables)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 209, in parse_col
    assert False, "Error col: {}".format(tok)
AssertionError: Error col: "value"
RESULT REPORT: Retrieval model fine-tune data failed!

(Because I added print(sql_str) to the Line 536 of utils/evaluation/evaluate.py, File "/f/code/SQL/GAR/utils/evaluation/evaluate.py", line 537, in rebuild_sql should be Line 536)

Error SQL statements in the test steps

It seems that the value filter stage failed because there is a wrong table name events in the SQL statement.

For the SQL SELECT EVENTS.EVENT_DETAILS , PARTICIPANTS_IN_EVENTS.EVENT_ID FROM EVENTS AS T1 JOIN PARTICIPANTS_IN_EVENTS AS T2 ON T1.EVENT_ID = T2.EVENT_ID GROUP BY PARTICIPANTS_IN_EVENTS.EVENT_ID HAVING COUNT ( * ) > 'VALUE', there is an error:

Traceback (most recent call last):                                                                                                                  File "/f/.conda/envs/SQL_GAR2/lib/python3.7/runpy.py", line 193, in _run_module_as_main                                                             "__main__", mod_spec)                                                                                                                           File "/f/.conda/envs/SQL_GAR2/lib/python3.7/runpy.py", line 85, in _run_code                                                                    
    exec(code, run_globals)                                                                                                                       
  File "/f/code/SQL/GAR/scripts/value_postprocessing/candidate_filter_top10.py", line 187, in <module>                                            
    main()                                                                                                                                        
  File "/f/.conda/envs/SQL_GAR2/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/f/.conda/envs/SQL_GAR2/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/f/.conda/envs/SQL_GAR2/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/f/.conda/envs/SQL_GAR2/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/f/code/SQL/GAR/scripts/value_postprocessing/candidate_filter_top10.py", line 158, in main
    tmp = candidate_filter(candidates, db_id, db_context, tables_file, database_dir)
  File "/f/code/SQL/GAR/scripts/value_postprocessing/candidate_filter_top10.py", line 97, in candidate_filter
    sql_dict = rebuild_sql(db_id, dataset_path, sql_nested_query_tmp_name_convert(candidate), kmaps)
  File "/f/code/SQL/GAR/utils/evaluation/evaluate.py", line 536, in rebuild_sql
    sql = get_sql(schema, sql_str)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 630, in get_sql
    _, sql = parse_sql(toks, 0, tables_with_alias, schema)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 556, in parse_sql
    from_end_idx, table_units, conds, default_tables = parse_from(toks, start_idx, tables_with_alias, schema)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 433, in parse_from
    idx, table_unit, table_name = parse_table_unit(toks, idx, tables_with_alias, schema)
  File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 281, in parse_table_unit
    key = tables_with_alias[toks[idx]]
KeyError: 'events'
RESULT REPORT: Value filter result failed!

Thank you in advance! Maybe there is something that I need to modify in the datasets.

Problems when self.skeletons is an empty list

I suggest to add an assert statement assert self.skeletons, "self.skeletons is []" before

for skeleton in self.combinatorial_rule_base["skeleton"].copy():
.

This is because when self.skeletons is [], we will make self.combinatorial_rule_base["skeleton"] to be [] and will return None in the function in _default_choice in

def _default_choice(self, node_name, state_machine):
and choice(self, node_name, state_machine) in
def choice(self, node_name, state_machine):
which is used in
expression = handler.choice(node_name, state_machine)
.

If the returned value of dfs_random is None,

ret = dfs_random("skeleton", state_machine)
will always cause ret = None and make while True: a dead cycle.


This problem occurs for me because the file serialization/gar.spider/qunits.json contains {'global_syntactic': {'max_projection_num': 0, 'max_where_predicate_num': 0, 'max_having_predicate_num': 0, 'has_output': False, 'has_iue': False, 'has_group': False, 'max_group_num': 0, 'max_where_nested_num': 0, 'max_having_nested_num': 0, 'max_iue_num': 0, 'max_order_num': 0, 'has_where': False}, 'units': [], 'skeleton': []} for concert_singer. After deleted this file, the code works.

AttributeError: module 'torch' has no attribute 'frombuffer'

While runnning the test script I am encountering the following error:

(GAR) [husainmalwat@localhost GAR]$ bash test_pipeline.sh "spider" "datasets/spider/dev.json" "datasets/spider/dev_gold.sql" "datasets/spider/tables.json" "datasets/spider/database"
Dataset name: spider
Test JSON file: datasets/spider/dev.json
Gold SQL TXT file: datasets/spider/dev_gold.sql
Schema file of the dataset: datasets/spider/tables.json
Databases directory of the dataset: datasets/spider/database
==================================================================
ACTION REPORT: Testing pipeline starts ......
Re-ranker test data exists!
==================================================================
ACTION REPORT: Start to test re-ranker model /home/husainmalwat/GAR/saved_models/gar.spider/reranker/bertpooler+roberta-base/model.tar.gz
/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2023-06-19 20:09:10,725 - INFO - allennlp.models.archival - loading archive file /home/husainmalwat/GAR/saved_models/gar.spider/reranker/bertpooler+roberta-base/model.tar.gz
2023-06-19 20:09:10,725 - INFO - allennlp.models.archival - extracting archive file /home/husainmalwat/GAR/saved_models/gar.spider/reranker/bertpooler+roberta-base/model.tar.gz to temp dir /tmp/tmpwtncs7m0
2023-06-19 20:09:14,081 - WARNING - allennlp.common.params - error loading _jsonnet (this is expected on Windows), treating /tmp/tmpwtncs7m0/config.json as plain json
2023-06-19 20:09:14,082 - INFO - allennlp.common.params - dataset_reader.type = listwise_pair_ranker_reader
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.lazy = False
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.cache_directory = None
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.max_instances = 100000
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.manual_multi_process_sharding = False
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.type = pretrained_transformer
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.model_name = roberta-base
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.add_special_tokens = False
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.max_length = None
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.stride = 0
2023-06-19 20:09:14,084 - INFO - allennlp.common.params - dataset_reader.tokenizer.tokenizer_kwargs = None
2023-06-19 20:09:20,318 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.type = pretrained_transformer
2023-06-19 20:09:20,318 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.token_min_padding_length = 0
2023-06-19 20:09:20,318 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.model_name = roberta-base
2023-06-19 20:09:20,318 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.namespace = tags
2023-06-19 20:09:20,319 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.max_length = None
2023-06-19 20:09:20,319 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.tokenizer_kwargs = None
2023-06-19 20:09:20,320 - INFO - allennlp.common.params - dataset_reader.max_tokens = None
2023-06-19 20:09:20,321 - INFO - allennlp.common.params - validation_dataset_reader.type = listwise_pair_ranker_reader
2023-06-19 20:09:20,321 - INFO - allennlp.common.params - validation_dataset_reader.lazy = False
2023-06-19 20:09:20,321 - INFO - allennlp.common.params - validation_dataset_reader.cache_directory = None
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.max_instances = None
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.manual_distributed_sharding = False
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.manual_multi_process_sharding = False
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.type = pretrained_transformer
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.model_name = roberta-base
2023-06-19 20:09:20,323 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.add_special_tokens = False
2023-06-19 20:09:20,323 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.max_length = None
2023-06-19 20:09:20,323 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.stride = 0
2023-06-19 20:09:20,323 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.tokenizer_kwargs = None
2023-06-19 20:09:20,324 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.type = pretrained_transformer
2023-06-19 20:09:20,324 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.token_min_padding_length = 0
2023-06-19 20:09:20,325 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.model_name = roberta-base
2023-06-19 20:09:20,325 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.namespace = tags
2023-06-19 20:09:20,325 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.max_length = None
2023-06-19 20:09:20,325 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.tokenizer_kwargs = None
2023-06-19 20:09:20,326 - INFO - allennlp.common.params - validation_dataset_reader.max_tokens = None
2023-06-19 20:09:20,326 - INFO - allennlp.common.params - type = from_instances
2023-06-19 20:09:20,326 - INFO - allennlp.data.vocabulary - Loading token dictionary from /tmp/tmpwtncs7m0/vocabulary.
2023-06-19 20:09:20,328 - INFO - allennlp.common.params - model.type = listwise_pair_ranker
2023-06-19 20:09:20,329 - INFO - allennlp.common.params - model.regularizer = None
2023-06-19 20:09:20,329 - INFO - allennlp.common.params - model.text_field_embedder.type = basic
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.type = pretrained_transformer
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.model_name = roberta-base
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.max_length = None
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.sub_module = None
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.train_parameters = True
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.last_layer_only = True
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.override_weights_file = None
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.override_weights_strip_prefix = None
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.gradient_checkpointing = None
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.tokenizer_kwargs = None
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.transformer_kwargs = None
2023-06-19 20:09:21,282 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpwtncs7m0
Traceback (most recent call last):
  File "/home/husainmalwat/anaconda3/envs/GAR/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/__main__.py", line 34, in run
    main(prog="allennlp")
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/commands/__init__.py", line 118, in main
    args.func(args)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/commands/predict.py", line 205, in _predict
    predictor = _get_predictor(args)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/commands/predict.py", line 106, in _get_predictor
    archive = load_archive(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/models/archival.py", line 208, in load_archive
    model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/models/archival.py", line 242, in _load_model
    return Model.load(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/models/model.py", line 406, in load
    return model_class._load(config, serialization_dir, weights_file, cuda_device)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/models/model.py", line 304, in _load
    model = Model.from_params(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 595, in from_params
    return retyped_subclass.from_params(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 627, in from_params
    kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 198, in create_kwargs
    constructed_arg = pop_and_construct_arg(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 305, in pop_and_construct_arg
    return construct_arg(class_name, name, popped_params, annotation, default, **extras)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 339, in construct_arg
    return annotation.from_params(params=popped_params, **subextras)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 595, in from_params
    return retyped_subclass.from_params(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 627, in from_params
    kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 198, in create_kwargs
    constructed_arg = pop_and_construct_arg(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 305, in pop_and_construct_arg
    return construct_arg(class_name, name, popped_params, annotation, default, **extras)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 380, in construct_arg
    value_dict[key] = construct_arg(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 339, in construct_arg
    return annotation.from_params(params=popped_params, **subextras)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 595, in from_params
    return retyped_subclass.from_params(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 629, in from_params
    return constructor_to_call(**kwargs)  # type: ignore
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/modules/token_embedders/pretrained_transformer_embedder.py", line 74, in __init__
    self.transformer_model = cached_transformers.get(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/cached_transformers.py", line 84, in get
    transformer = AutoModel.from_pretrained(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
    return model_class.from_pretrained(
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2429, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/transformers/modeling_utils.py", line 413, in load_state_dict
    return safe_load_file(checkpoint_file)
  File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/safetensors/torch.py", line 261, in load_file
    result[k] = f.get_tensor(k)
AttributeError: module 'torch' has no attribute 'frombuffer'

Issue in the code to run the Train script and evaluation script

I am not getting what arguments to be given while running the training script and evaluation script
Would yo pls mention the local path that you are using for each of these arguments...

$ bash train_pipeline.sh <dataset_name> <train_data_path> <dev_data_path> <table_path> <db_dir>

$ bash test_pipeline.sh <dataset_name> <test_file_path> <test_gold_sql_file> <table_path> <db_dir>

Package conflicts

I found the following error:

ERROR: Cannot install allennlp==1.2.0 and sentence-transformers==2.2.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    sentence-transformers 2.2.2 depends on transformers<5.0.0 and >=4.6.0
    allennlp 1.2.0 depends on transformers<3.5 and >=3.1

However, these versions are shown in requirements.txt. I wonder what is the suitable versions of these packages?

Can i directly run the evaluation script in my system without training

Would you explain each term in code to run the training script as well as evaluation script:
$ bash train_pipeline.sh <dataset_name> <train_data_path> <dev_data_path> <table_path> <db_dir>

Would you pls verify:
Dataset name: spider
train_data_path: /c/Users/Lenovo/GAR/datasets/train_spider.json
dev_data_path: /c/Users/Lenovo/GAR/datasets/dev.json
table_path: /c/Users/Lenovo/GAR/datasets/spider/tables.json
db_dir: /c/Users/Lenovo/GAR/datasets/spider/database

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.