kaimary / gar Goto Github PK
View Code? Open in Web Editor NEWICDE 2023 Paper, GAR: A Generate-and-Rank Approach for Natural Language to SQL Translation
License: Apache License 2.0
ICDE 2023 Paper, GAR: A Generate-and-Rank Approach for Natural Language to SQL Translation
License: Apache License 2.0
It seems that the value filter stage
failed because there is a wrong table name events
in the SQL statement.
For the SQL SELECT EVENTS.EVENT_DETAILS , PARTICIPANTS_IN_EVENTS.EVENT_ID FROM EVENTS AS T1 JOIN PARTICIPANTS_IN_EVENTS AS T2 ON T1.EVENT_ID = T2.EVENT_ID GROUP BY PARTICIPANTS_IN_EVENTS.EVENT_ID HAVING COUNT ( * ) > 'VALUE'
, there is an error:
Traceback (most recent call last): File "/f/.conda/envs/SQL_GAR2/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/f/.conda/envs/SQL_GAR2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/f/code/SQL/GAR/scripts/value_postprocessing/candidate_filter_top10.py", line 187, in <module>
main()
File "/f/.conda/envs/SQL_GAR2/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/f/.conda/envs/SQL_GAR2/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/f/.conda/envs/SQL_GAR2/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/f/.conda/envs/SQL_GAR2/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/f/code/SQL/GAR/scripts/value_postprocessing/candidate_filter_top10.py", line 158, in main
tmp = candidate_filter(candidates, db_id, db_context, tables_file, database_dir)
File "/f/code/SQL/GAR/scripts/value_postprocessing/candidate_filter_top10.py", line 97, in candidate_filter
sql_dict = rebuild_sql(db_id, dataset_path, sql_nested_query_tmp_name_convert(candidate), kmaps)
File "/f/code/SQL/GAR/utils/evaluation/evaluate.py", line 536, in rebuild_sql
sql = get_sql(schema, sql_str)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 630, in get_sql
_, sql = parse_sql(toks, 0, tables_with_alias, schema)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 556, in parse_sql
from_end_idx, table_units, conds, default_tables = parse_from(toks, start_idx, tables_with_alias, schema)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 433, in parse_from
idx, table_unit, table_name = parse_table_unit(toks, idx, tables_with_alias, schema)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 281, in parse_table_unit
key = tables_with_alias[toks[idx]]
KeyError: 'events'
RESULT REPORT: Value filter result failed!
Thank you in advance! Maybe there is something that I need to modify in the datasets.
I am trying to train the model but I am getting
"AssertionError: Error col: "value""
,
I have attached the entire output log.
I am not getting what arguments to be given while running the training script and evaluation script
Would yo pls mention the local path that you are using for each of these arguments...
$ bash train_pipeline.sh <dataset_name> <train_data_path> <dev_data_path> <table_path> <db_dir>
$ bash test_pipeline.sh <dataset_name> <test_file_path> <test_gold_sql_file> <table_path> <db_dir>
I found the following error:
ERROR: Cannot install allennlp==1.2.0 and sentence-transformers==2.2.2 because these package versions have conflicting dependencies.
The conflict is caused by:
sentence-transformers 2.2.2 depends on transformers<5.0.0 and >=4.6.0
allennlp 1.2.0 depends on transformers<3.5 and >=3.1
However, these versions are shown in requirements.txt
. I wonder what is the suitable versions of these packages?
Thank you for your excellent work. However, when I used spider to train the model (I used the script bash train_pipeline.sh spider datasets/spider/train_spider.json datasets/spider/dev.json datasets/spider/tables.json datasets/spider/database
). When sql_str
is select count ( * ) from follows group by "value"
, an AssertionError occurs: Error col: "value"
.
I wonder what I should do to solve this problem. Should I skip all the SQL strings that cause such an error by using a simple try-except statement?
The traceback is:
4%|█████▊ | 278/7000 [00:00<00:18, 371.50it/s]Traceback (most recent call last):
File "/f/.conda/envs/SQL_GAR/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/f/.conda/envs/SQL_GAR/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/f/code/SQL/GAR/scripts/retrieval_model_train_script.py", line 271, in <module>
main()
File "/f/.conda/envs/SQL_GAR/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/f/.conda/envs/SQL_GAR/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/f/.conda/envs/SQL_GAR/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/f/.conda/envs/SQL_GAR/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/f/code/SQL/GAR/scripts/retrieval_model_train_script.py", line 39, in main
dataset_name, train_file, tables_file, db_dir, output_dir
File "/f/code/SQL/GAR/scripts/retrieval_model_train_script.py", line 118, in generate_triples_for_retrieval_model
checker = RecallChecker(data_file, tables_file, db_dir, output_dir)
File "/f/code/SQL/GAR/utils/recall_checker_utils.py", line 31, in __init__
self.initialize(dataset_file, tables_file)
File "/f/code/SQL/GAR/utils/recall_checker_utils.py", line 42, in initialize
g_sql = rebuild_sql(db_id, self.db_dir, sql_nested_query_tmp_name_convert(query), self.kmaps, tables_file)
File "/f/code/SQL/GAR/utils/evaluation/evaluate.py", line 537, in rebuild_sql
sql = get_sql(schema, sql_str)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 630, in get_sql
_, sql = parse_sql(toks, 0, tables_with_alias, schema)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 566, in parse_sql
idx, group_col_units = parse_group_by(toks, idx, tables_with_alias, schema, default_tables)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 477, in parse_group_by
idx, col_unit = parse_col_unit(toks, idx, tables_with_alias, schema, default_tables)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 241, in parse_col_unit
idx, col_id = parse_col(toks, idx, tables_with_alias, schema, default_tables)
File "/f/code/SQL/GAR/utils/evaluation/process_sql.py", line 209, in parse_col
assert False, "Error col: {}".format(tok)
AssertionError: Error col: "value"
RESULT REPORT: Retrieval model fine-tune data failed!
(Because I added print(sql_str)
to the Line 536 of utils/evaluation/evaluate.py
, File "/f/code/SQL/GAR/utils/evaluation/evaluate.py", line 537, in rebuild_sql
should be Line 536)
I suggest to add an assert statement assert self.skeletons, "self.skeletons is []"
before
This is because when self.skeletons
is []
, we will make self.combinatorial_rule_base["skeleton"]
to be []
and will return None in the function in _default_choice
in
choice(self, node_name, state_machine)
in
which is used in
.
If the returned value of dfs_random
is None
,
ret = None
and make while True:
a dead cycle.
This problem occurs for me because the file serialization/gar.spider/qunits.json
contains {'global_syntactic': {'max_projection_num': 0, 'max_where_predicate_num': 0, 'max_having_predicate_num': 0, 'has_output': False, 'has_iue': False, 'has_group': False, 'max_group_num': 0, 'max_where_nested_num': 0, 'max_having_nested_num': 0, 'max_iue_num': 0, 'max_order_num': 0, 'has_where': False}, 'units': [], 'skeleton': []}
for concert_singer
. After deleted this file, the code works.
While runnning the test script I am encountering the following error:
(GAR) [husainmalwat@localhost GAR]$ bash test_pipeline.sh "spider" "datasets/spider/dev.json" "datasets/spider/dev_gold.sql" "datasets/spider/tables.json" "datasets/spider/database"
Dataset name: spider
Test JSON file: datasets/spider/dev.json
Gold SQL TXT file: datasets/spider/dev_gold.sql
Schema file of the dataset: datasets/spider/tables.json
Databases directory of the dataset: datasets/spider/database
==================================================================
ACTION REPORT: Testing pipeline starts ......
Re-ranker test data exists!
==================================================================
ACTION REPORT: Start to test re-ranker model /home/husainmalwat/GAR/saved_models/gar.spider/reranker/bertpooler+roberta-base/model.tar.gz
/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2023-06-19 20:09:10,725 - INFO - allennlp.models.archival - loading archive file /home/husainmalwat/GAR/saved_models/gar.spider/reranker/bertpooler+roberta-base/model.tar.gz
2023-06-19 20:09:10,725 - INFO - allennlp.models.archival - extracting archive file /home/husainmalwat/GAR/saved_models/gar.spider/reranker/bertpooler+roberta-base/model.tar.gz to temp dir /tmp/tmpwtncs7m0
2023-06-19 20:09:14,081 - WARNING - allennlp.common.params - error loading _jsonnet (this is expected on Windows), treating /tmp/tmpwtncs7m0/config.json as plain json
2023-06-19 20:09:14,082 - INFO - allennlp.common.params - dataset_reader.type = listwise_pair_ranker_reader
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.lazy = False
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.cache_directory = None
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.max_instances = 100000
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.manual_multi_process_sharding = False
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.type = pretrained_transformer
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.model_name = roberta-base
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.add_special_tokens = False
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.max_length = None
2023-06-19 20:09:14,083 - INFO - allennlp.common.params - dataset_reader.tokenizer.stride = 0
2023-06-19 20:09:14,084 - INFO - allennlp.common.params - dataset_reader.tokenizer.tokenizer_kwargs = None
2023-06-19 20:09:20,318 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.type = pretrained_transformer
2023-06-19 20:09:20,318 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.token_min_padding_length = 0
2023-06-19 20:09:20,318 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.model_name = roberta-base
2023-06-19 20:09:20,318 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.namespace = tags
2023-06-19 20:09:20,319 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.max_length = None
2023-06-19 20:09:20,319 - INFO - allennlp.common.params - dataset_reader.token_indexers.bert.tokenizer_kwargs = None
2023-06-19 20:09:20,320 - INFO - allennlp.common.params - dataset_reader.max_tokens = None
2023-06-19 20:09:20,321 - INFO - allennlp.common.params - validation_dataset_reader.type = listwise_pair_ranker_reader
2023-06-19 20:09:20,321 - INFO - allennlp.common.params - validation_dataset_reader.lazy = False
2023-06-19 20:09:20,321 - INFO - allennlp.common.params - validation_dataset_reader.cache_directory = None
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.max_instances = None
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.manual_distributed_sharding = False
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.manual_multi_process_sharding = False
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.type = pretrained_transformer
2023-06-19 20:09:20,322 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.model_name = roberta-base
2023-06-19 20:09:20,323 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.add_special_tokens = False
2023-06-19 20:09:20,323 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.max_length = None
2023-06-19 20:09:20,323 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.stride = 0
2023-06-19 20:09:20,323 - INFO - allennlp.common.params - validation_dataset_reader.tokenizer.tokenizer_kwargs = None
2023-06-19 20:09:20,324 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.type = pretrained_transformer
2023-06-19 20:09:20,324 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.token_min_padding_length = 0
2023-06-19 20:09:20,325 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.model_name = roberta-base
2023-06-19 20:09:20,325 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.namespace = tags
2023-06-19 20:09:20,325 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.max_length = None
2023-06-19 20:09:20,325 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.bert.tokenizer_kwargs = None
2023-06-19 20:09:20,326 - INFO - allennlp.common.params - validation_dataset_reader.max_tokens = None
2023-06-19 20:09:20,326 - INFO - allennlp.common.params - type = from_instances
2023-06-19 20:09:20,326 - INFO - allennlp.data.vocabulary - Loading token dictionary from /tmp/tmpwtncs7m0/vocabulary.
2023-06-19 20:09:20,328 - INFO - allennlp.common.params - model.type = listwise_pair_ranker
2023-06-19 20:09:20,329 - INFO - allennlp.common.params - model.regularizer = None
2023-06-19 20:09:20,329 - INFO - allennlp.common.params - model.text_field_embedder.type = basic
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.type = pretrained_transformer
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.model_name = roberta-base
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.max_length = None
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.sub_module = None
2023-06-19 20:09:20,330 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.train_parameters = True
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.last_layer_only = True
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.override_weights_file = None
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.override_weights_strip_prefix = None
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.gradient_checkpointing = None
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.tokenizer_kwargs = None
2023-06-19 20:09:20,331 - INFO - allennlp.common.params - model.text_field_embedder.token_embedders.bert.transformer_kwargs = None
2023-06-19 20:09:21,282 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpwtncs7m0
Traceback (most recent call last):
File "/home/husainmalwat/anaconda3/envs/GAR/bin/allennlp", line 8, in <module>
sys.exit(run())
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/commands/__init__.py", line 118, in main
args.func(args)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/commands/predict.py", line 205, in _predict
predictor = _get_predictor(args)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/commands/predict.py", line 106, in _get_predictor
archive = load_archive(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/models/archival.py", line 208, in load_archive
model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/models/archival.py", line 242, in _load_model
return Model.load(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/models/model.py", line 406, in load
return model_class._load(config, serialization_dir, weights_file, cuda_device)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/models/model.py", line 304, in _load
model = Model.from_params(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 595, in from_params
return retyped_subclass.from_params(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 627, in from_params
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 198, in create_kwargs
constructed_arg = pop_and_construct_arg(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 305, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 339, in construct_arg
return annotation.from_params(params=popped_params, **subextras)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 595, in from_params
return retyped_subclass.from_params(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 627, in from_params
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 198, in create_kwargs
constructed_arg = pop_and_construct_arg(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 305, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 380, in construct_arg
value_dict[key] = construct_arg(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 339, in construct_arg
return annotation.from_params(params=popped_params, **subextras)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 595, in from_params
return retyped_subclass.from_params(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/from_params.py", line 629, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/modules/token_embedders/pretrained_transformer_embedder.py", line 74, in __init__
self.transformer_model = cached_transformers.get(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/allennlp/common/cached_transformers.py", line 84, in get
transformer = AutoModel.from_pretrained(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2429, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/transformers/modeling_utils.py", line 413, in load_state_dict
return safe_load_file(checkpoint_file)
File "/home/husainmalwat/anaconda3/envs/GAR/lib/python3.8/site-packages/safetensors/torch.py", line 261, in load_file
result[k] = f.get_tensor(k)
AttributeError: module 'torch' has no attribute 'frombuffer'
Would you explain each term in code to run the training script as well as evaluation script:
$ bash train_pipeline.sh <dataset_name> <train_data_path> <dev_data_path> <table_path> <db_dir>
Would you pls verify:
Dataset name: spider
train_data_path: /c/Users/Lenovo/GAR/datasets/train_spider.json
dev_data_path: /c/Users/Lenovo/GAR/datasets/dev.json
table_path: /c/Users/Lenovo/GAR/datasets/spider/tables.json
db_dir: /c/Users/Lenovo/GAR/datasets/spider/database
In http://47.116.100.156:30279/#/chat , when I click "Choose", there is no database in the list.
There is also an error 未查找到可用表
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.