Code Monkey home page Code Monkey logo

resdsql's People

Contributors

lihaoyang-ruc avatar wikty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

resdsql's Issues

XLM-ROBERTA-LARGE做分类的模型如何多卡运行?

试着尝试了一下多卡运行,
model = nn.DataParallel(model, device_ids=devices)
model.to(device)
结果会报bug,
Traceback (most recent call last):
File "schema_item_classifier_gpus.py", line 470, in
_train(opt)
File "schema_item_classifier_gpus.py", line 287, in _train
loss = encoder_loss_func.compute_loss(
File "/workspace/RESDSQL/utils/classifier_loss.py", line 60, in compute_loss
table_loss = self.compute_batch_loss(batch_table_name_cls_logits, batch_table_labels, batch_size)
File "/workspace/RESDSQL/utils/classifier_loss.py", line 47, in compute_batch_loss
loss += self.focal_loss(logits, labels)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/RESDSQL/utils/classifier_loss.py", line 16, in forward
assert input_tensor.shape[0] == target_tensor.shape[0]
是由于这个分类模型的结构设计,无法实现多卡运行吗?

Dataset used for finetuning mt5 model

Hi
First of all, thank you for your great work on this project. You've reached among best results on Spider benchmark and your clear and complete readme file allowed me to run your code very easily.

I want to see if I can finetune a text2natsql model on mt5 like you did on CSpider. I was wondering how much data I have to create as I want to create a dataset like CSpider but in Persian languge.

Was CSpider the only dataset used for finetuning mt5 backbone or other datasets were also used?

NatSQL-Parser

Hey,
I'm very interested in your work. I want to train RESDSQL+NatSQL on my own dataset. I had no problems to train only RESDSQL but I don't know how to create the NatSQL-JSON-file. Do you know if there is any script available for parsing SQL-queries into NatSQL?
Thanks in advance!

Obtaining query_toks_no_value from query

I'm attempting to try training on another dataset by appending the dataset's train.json, dev.json, and tables.json to Spider's (and adding the database into Spider's too) with RESDSQL. I'm a bit stumped on how to generate the query_toks_no_value from a query. Is there a script for this, or do you have any advice on how to make one?

Low accuracy in predicting SQL using RESDSQL on my dataset

Hello everyone,

I hope you're doing well. I encountered an issue while using RESDSQL for predicting SQL on my dataset. Despite following all the recommended steps, I'm observing an accuracy range of only 30-40%. I would greatly appreciate any suggestions or insights on increasing the predictions' accuracy.

Thank you in advance for your assistance!

The SQL skeleton is a too easy objective

I have trained 10 epochs in Spider with seq-to-seq framework model. If the target objective just as original SQL, the results is about 60%+. But when switching to skeleton+SQL, the performance is so bad.

After manual check, I found that the model inference result only contains the skeleton, and there is no SQL at all. Have you ever encountered this problem?

Cspider不加natsql训练步骤,运行到第二步的时候报错:RuntimeError: input must have 3 dimensions, got 2

报错信息如下,我这边下载了xlm-roberta-large,放在base_models目录下的,下载地址:https://huggingface.co/xlm-roberta-large/tree/main
Namespace(add_fk_info=False, alpha=0.75, batch_size=4, dev_filepath='./data/preprocessed_data/preprocessed_dev_cspider_natsql.json', device='0', epochs=128, gamma=2.0, gradient_descent_step=2, learning_rate=1e-05, mode='train', model_name_or_path='./base_models/xlm-roberta-large', output_filepath='data/pre-processing/dataset_with_pred_probs.json', patience=4, save_path='./models/xlm_roberta_text2natsql_schema_item_classifier', seed=42, tensorboard_save_path='./tensorboard_log/xlm_roberta_text2natsql_schema_item_classifier', train_filepath='./data/preprocessed_data/preprocessed_train_cspider_natsql.json', use_contents=True)
Some weights of the model checkpoint at ./base_models/xlm-roberta-large were not used when initializing XLMRobertaModel: ['lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias']

  • This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    This is epoch 1.
    Traceback (most recent call last):
    File "schema_item_classifier.py", line 463, in
    _train(opt)
    File "schema_item_classifier.py", line 277, in _train
    batch_column_number_in_each_table
    File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "/workspace/RESDSQL/utils/classifier_model.py", line 191, in forward
    batch_column_number_in_each_table
    File "/workspace/RESDSQL/utils/classifier_model.py", line 134, in table_column_cls
    output_t, (hidden_state_t, cell_state_t) = self.table_name_bilstm(table_name_embeddings)
    File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 677, in forward
    self.check_forward_args(input, hx, batch_sizes)
    File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 620, in check_forward_args
    self.check_input(input, batch_sizes)
    File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 203, in check_input
    expected_input_dim, input.dim()))
    RuntimeError: input must have 3 dimensions, got 2

SQL句中的value是如何确定的?

以spider、Cspider为例,一条生成的SQL语句,包含sql语法(骨架)、字段信息、值信息(value),比如:在”计算年龄大于50岁的男性人口”中,50就是值。
我的问题是,值是如何确定的?具体而言,从自然语言问句中分辨出哪些是值(比如,从“年龄大于50岁的男性人口”中,分辨出50是值),是模型通过训练获得的能力,还是spder、Cspider已经预先界定了哪些是值?如果是通过训练获得了分辨值的能力,能简单介绍一下思路吗?

CSpider 训练bash好像有错误,同时不完整

./scripts/train/cspider_text2natsql/generate_text2natsql_dataset.sh 里面存在如下两个问题(相同情况在 cspider_text2sql也有):

  1. line 4, text2sql_data_generator.py 的 input_dataset_path 应为带有列、表概率的 train_cspider_with_probs_natsql.json;
  2. 缺少对训练数据运行schema_item_classifier.py,写在line 4的 preprocessed_train_cspider_natsql.json 是该模型的输入才对。

Timestamp Functionality

Hey, I would like to know what to do, to train the model with data so that it supports timestamps functionality as like in Druid SQL interface for example. Is there a way to do add timestamps functionality as well.

Requirements

When I run the requirements file.

I am getting this error.
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for jarowinkler
Failed to build rapidfuzz tokenizers jarowinkler
ERROR: Could not build wheels for rapidfuzz, tokenizers, jarowinkler, which is required to install pyproject.toml-based projects

can you please tell me how do i install this which are on requirement list?

natsql是如何转换的?如何取骨架?

我注意到,作者在另一个问题里提到过,似乎是sql-to-natsql的代码没有开源,所以作者是直接使用了已经转换后的数据集吗?(Cspider也有现成的数据集?)
同时,skeleton-aware的decoder,前半部分是sql骨架,后半部分是sql,那对于natsql来说,是否可以理解为:前半部分是natsql骨架,后半部分是natsql?那么,natsql的骨架是怎么处理的呢?

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

请问一下 label标签为什么每次只取前面四个(在prepare_batch_inputs_and_labels方法中),比如我tables.json总共设置了十张表,但是前面四个算法后的label都是0,只有后面6个label为1,经过循环之后每次只在列表中追加返回的都是前面四个label就全为0,就报了该错误,后面我调整整了tables.json中的表的顺序是可以了,后面评估schema_item_classifier又报同样的错误了?是样本不均衡的问题吗?还是说train.json中的question对应的sql查询使用的表只能在tables.json中,没有用到的就不用写进去吗?有人遇到过这样的问题吗?我是使用自己的数据集,训练集结构已经跟要求的一致。

The repository does not have a license

Thank you so much for your wonderful work.

Currently, the repository does not have a license. According to the github documentation

You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. If you're creating an open source project, we strongly encourage you to include an open source license.

Do you think you could add an open source license to the repository, so that other people are legally allowed to reproduce, distribute, or create derivative works from it?

More discussion on this matter

Inference script

Hi!

I would like to try your repo with my own queries. I can see that there are inference scripts for the datasets you are supporting. I can't see a script that accepts queries from users. Is there sth like this or should I write it myself?

Thanks

How I can do inference on the model only with a question?

Hi, I want to know if is possible to use the model to get directly the SQL statment, after giving it a question in natural language. Actually if for example, I use the dev.json (modified version from spider dataset) attached, I have no result in pred.sql.
Thank you for help.
Screenshot 2023-03-16 111012

在解码的时候,做了哪些后处理

我想知道在解码的时候做了哪些后处理,有具体的步骤么
pred_natsql = fix_fatal_errors_in_natsql(pred_natsql, batch_tc_original[batch_id])
if old_pred_natsql != pred_natsql:
print("Before fix:", old_pred_natsql)
print("After fix:", pred_natsql)
print("---------------")
pred_sql = natsql_to_sql(pred_natsql, db_id, db_file_path, table_dict[db_id]).strip()
因为我发现在不进行后处理,直接解码的效果很差

对中文的支持

目前看示例代码中使用的模型和数据集均是来自于英文,自测了一下也确实对中文的支持还不好。想请问一下,如果想移植到中文环境使用,是需要把训练使用的RoBERTa模型、T5模型、训练数据集都换成中文的是吧?大概在网上找了一下,也找了几个对应的模型和数据集,请问下研发团队之前做过类似的尝试吗,有没有遇到什么困难或者障碍?

我找到的几个中文模型及数据集资源:
https://github.com/brightmart/roberta_zh
https://github.com/SunnyGJing/t5-pegasus-chinese
https://taolusi.github.io/CSpider-explorer/

checkpoint下载失败

您好,请问一下 T5 checkpoints 怎么下载?表格里的google drive的两个link下载下来是两个文件夹:
text2sql_schema_item_classifier
text2natsql_schema_item_classifier

没有看到text2natsql-t5-3b,text2natsql-t5-base等目录,请问一下这些是怎么出来的?是我下载的不对吗?
image

How can we optimize the Model Inference time. Single NLQ taking more than a 45seconds.

Hello everyone,

I hope you're doing well. I encountered an issue while using Fine-Tuned RESDSQL on my dataset(spider-like) for predicting SQL .The inference time goes around one minute for it. While profiling the steps I found that schema_item_calssifier.py and text2sql.py are taking majority of the time. I would greatly appreciate any suggestions or insights on optimizing/minimizing the prediction time.
Thank you in advance for your assistance!

schema_item_classifier.py中column_number_in_each_table定义问题

您好,在看您的代码的时候发现,在schema_item_classifier.py文件中第156-162行有关于batch_column_number_in_each_table更新的定义,但是借助了table_labels和colum_lables的信息,且看后面代码中这个batch_column_number_in_each_table会作为一个参数输入模型进行推理,那么在没有labels的情况下,这个参数需要怎么定义呢?

Inference Run Killed

when i run the Inference the process is killed but i don't know why.
Screenshot 2023-03-08 162424
What stops the process and what can i do to fix it?

Can‘t find the file nltk_downloader.py

Thanks for your nice work! I can't find the file nltk_downloader.py which you mentioned in the file readme.md . Could you please offer it for me?Thank you.

与GPT4效果的对比

但从评测指标上,看您这边的方案指标甚至是强于没有针对sql强化的gpt4的,而且参数量远小于gpt,想了解下性能和准确率对比gpt4的情况

Error in Running Inference script

 raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './models/text2sql-t5-base/checkpoint-39312'. Use `repo_type` argument if needed.

I am also attaching the entire output log:

RESDSQL.txt

No matching distribution found for spacy==2.2.3

conda create -n your_env_name python=3.8.5
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

when install requirements.txt
will print those error message

Collecting spacy==2.2.3
  Using cached spacy-2.2.3.tar.gz (5.9 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /home/studio-lab-user/.conda/envs/studiolab/bin/python3.9 /home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpnbsyo1tb
       cwd: /tmp/pip-install-5hq5ozgn/spacy_c569f2d7ab7a48d689e1bd1e3adaedf5
  Complete output (49 lines):
  Traceback (most recent call last):
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/_vendor/packaging/requirements.py", line 35, in __init__
      parsed = _parse_requirement(requirement_string)
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/_vendor/packaging/_parser.py", line 64, in parse_requirement
      return _parse_requirement(Tokenizer(source, rules=DEFAULT_RULES))
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/_vendor/packaging/_parser.py", line 82, in _parse_requirement
      url, specifier, marker = _parse_requirement_details(tokenizer)
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/_vendor/packaging/_parser.py", line 126, in _parse_requirement_details
      marker = _parse_requirement_marker(
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/_vendor/packaging/_parser.py", line 147, in _parse_requirement_marker
      tokenizer.raise_syntax_error(
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/_vendor/packaging/_tokenizer.py", line 165, in raise_syntax_error
      raise ParserSyntaxError(
  setuptools.extern.packaging._tokenizer.ParserSyntaxError: Expected end or semicolon (after version specifier)
      spacy_lookups_data>=0.0.5<0.2.0
                        ~~~~~~~^
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 349, in <module>
      main()
    File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 331, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 117, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 338, in run_setup
      exec(code, locals())
    File "<string>", line 200, in <module>
    File "<string>", line 190, in setup_package
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 106, in setup
      _install_setup_requires(attrs)
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 77, in _install_setup_requires
      dist.parse_config_files(ignore_option_errors=True)
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 900, in parse_config_files
      self._finalize_requires()
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 596, in _finalize_requires
      self._convert_extras_requirements()
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 611, in _convert_extras_requirements
      for r in _reqs.parse(v):
    File "/tmp/pip-build-env-oijh_314/overlay/lib/python3.9/site-packages/setuptools/_vendor/packaging/requirements.py", line 37, in __init__
      raise InvalidRequirement(str(e)) from e
  setuptools.extern.packaging.requirements.InvalidRequirement: Expected end or semicolon (after version specifier)
      spacy_lookups_data>=0.0.5<0.2.0
                        ~~~~~~~^
  ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/b7/f2/052bfe5861761599b5421916aba3eb0064d83145ff3072390ecdc5a836de/spacy-2.2.3.tar.gz#sha256=1d14c9e7d65b2cecd56c566d9ffac8adbcb9ce2cff2274cbfdcf5468cd940e6a (from https://pypi.org/simple/spacy/) (requires-python:!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,>=2.7). Command errored out with exit status 1: /home/studio-lab-user/.conda/envs/studiolab/bin/python3.9 /home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpnbsyo1tb Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement spacy==2.2.3 (from versions: 0.31, 0.32, 0.33, 0.40, 0.51, 0.52, 0.60, 0.61, 0.62, 0.63, 0.64, 0.65, 0.67, 0.68, 0.70, 0.80, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.97, 0.98, 0.99, 0.100.0, 0.100.1, 0.100.2, 0.100.3, 0.100.4, 0.100.5, 0.100.6, 0.100.7, 0.101.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.5, 1.8.0, 1.8.1, 1.8.2, 1.9.0, 1.10.0, 1.10.1, 2.0.0, 2.0.1.dev0, 2.0.1, 2.0.2.dev0, 2.0.2, 2.0.3.dev0, 2.0.3, 2.0.4.dev0, 2.0.4, 2.0.5.dev0, 2.0.5, 2.0.6.dev0, 2.0.6, 2.0.7, 2.0.8, 2.0.9, 2.0.10.dev0, 2.0.10, 2.0.11.dev0, 2.0.11, 2.0.12.dev0, 2.0.12.dev1, 2.0.12, 2.0.13.dev0, 2.0.13.dev1, 2.0.13.dev2, 2.0.13.dev4, 2.0.13, 2.0.14.dev0, 2.0.14.dev1, 2.0.15, 2.0.16.dev0, 2.0.16, 2.0.17.dev0, 2.0.17.dev1, 2.0.17, 2.0.18.dev0, 2.0.18.dev1, 2.0.18, 2.1.0, 2.1.1.dev0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.1.5, 2.1.6, 2.1.7.dev0, 2.1.7, 2.1.8, 2.1.9, 2.2.0.dev10, 2.2.0.dev11, 2.2.0.dev13, 2.2.0.dev15, 2.2.0.dev17, 2.2.0.dev18, 2.2.0.dev19, 2.2.0, 2.2.1, 2.2.2.dev0, 2.2.2.dev4, 2.2.2, 2.2.3.dev0, 2.2.3, 2.2.4, 2.3.0.dev1, 2.3.0, 2.3.1, 2.3.2, 2.3.3.dev0, 2.3.3, 2.3.4, 2.3.5, 2.3.6, 2.3.7, 2.3.8, 2.3.9, 3.0.0, 3.0.1.dev0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.0.6, 3.0.7, 3.0.8, 3.0.9, 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4, 3.1.5, 3.1.6, 3.1.7, 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5, 3.2.6, 3.3.0.dev0, 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.4.0, 3.4.1, 3.4.2, 3.4.3, 3.4.4, 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4, 3.6.0.dev0, 3.6.0.dev1, 3.6.0, 3.7.0.dev0, 4.0.0.dev0, 4.0.0.dev1)
ERROR: No matching distribution found for spacy==2.2.3

First time I install in windows 10
then try to implement in AWS SageMaker Studio Lab which is like Google Colab
and also has same problem


I try to install spacy 3.0.0
Can install success
But execute shell script infer_text2natsql.sh have another problem

(studiolab) studio-lab-user@default:~/sagemaker-studiolab-notebooks/RESDSQL$ sh scripts/inference/infer_text2natsql.sh 3b spider
/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/spacy/util.py:715: UserWarning: [W094] Model 'en_core_web_sm' (2.2.0) specifies an under-constrained spaCy version requirement: >=2.2.0. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.0,<3.1.0
  warnings.warn(warn_msg)
Traceback (most recent call last):
  File "/home/studio-lab-user/sagemaker-studiolab-notebooks/RESDSQL/NatSQL/table_transform.py", line 885, in <module>
    _tokenizer = get_spacy_tokenizer()
  File "/home/studio-lab-user/sagemaker-studiolab-notebooks/RESDSQL/NatSQL/natsql2sql/preprocess/TokenString.py", line 249, in get_spacy_tokenizer
    nlp = spacy.load("en_core_web_sm")
  File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/spacy/__init__.py", line 47, in load
    return util.load_model(name, disable=disable, exclude=exclude, config=config)
  File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/spacy/util.py", line 322, in load_model
    return load_model_from_package(name, **kwargs)
  File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/spacy/util.py", line 355, in load_model_from_package
    return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config)
  File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/en_core_web_sm/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/spacy/util.py", line 514, in load_model_from_init_py
    return load_model_from_path(
  File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/spacy/util.py", line 388, in load_model_from_path
    config = load_config(config_path, overrides=dict_to_dot(config))
  File "/home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/spacy/util.py", line 545, in load_config
    raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
OSError: [E053] Could not read config.cfg from /home/studio-lab-user/.conda/envs/studiolab/lib/python3.9/site-packages/en_core_web_sm/en_core_web_sm-2.2.0/config.cfg

Is there anyone meet same problem?
Or any problem in my environment

Dev.json file

Hi,
I want to train the model using my own dataset and I saw in another thread that the dev.json file is required for this. Could you elaborate on how the dev.json file should be formatted, given some query and a database schema?

Best,
Adam

Running evaluate_robustness returns nothing

Hello, I've attached a screenshot below to better highlight this issue.

For some reason, running the following command sh scripts/evaluate_robustness/evaluate_on_spider_realistic.sh generates nothing on the eval_results directory. I could see the folders and the .txt file being generated but for some reason, nothing is being appended to the said document. It is worth noting that I have ran the pre-processing scripts in advance and every command and the pre-process command already as well sh scripts/evaluate_robustness/preprocess_spider_realistic.sh

image

inference scripts error

你好, 我在尝试使用模型推理的时候出现了一些问题:
我使用的模型是RESDSQL-base, 在前期工作准备完成后使用了sh scripts/inference/infer_text2sql.sh base spider 指令进行推理,出现了如下错误:

Traceback (most recent call last):
File "schema_item_classifier.py", line 463, in
total_table_pred_probs, total_column_pred_probs = _test(opt)
File "schema_item_classifier.py", line 428, in _test
batch_column_number_in_each_table
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/tt/RESDSQL/utils/classifier_model.py", line 191, in forward
batch_column_number_in_each_table
File "/mnt/data/tt/RESDSQL/utils/classifier_model.py", line 134, in table_column_cls
output_t, (hidden_state_t, cell_state_t) = self.table_name_bilstm(table_name_embeddings)
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 689, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 632, in check_forward_args
self.check_input(input, batch_sizes)
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 203, in check_input
expected_input_dim, input.dim()))
RuntimeError: input must have 3 dimensions, got 2

于是我在./untils/classifier_model.py的line 134 加入:

print(table_name_embeddings.size(),table_name_embeddings)
table_name_embeddings = table_name_embeddings.unsqueeze(0)
print(table_name_embeddings.size(),table_name_embeddings)

后续报错:

torch.Size([1, 1024]) tensor([[-0.3795, -0.9529, 0.9007, ..., -0.6501, -2.1801, 0.9587]],
device='cuda:0')
torch.Size([1, 1, 1024]) tensor([[[-0.3795, -0.9529, 0.9007, ..., -0.6501, -2.1801, 0.9587]]],
device='cuda:0')
torch.Size([1, 1024]) tensor([[-0.7597, -0.5682, -0.4270, ..., 0.3219, 1.5417, 0.3518]],
device='cuda:0')
torch.Size([1, 1, 1024]) tensor([[[-0.7597, -0.5682, -0.4270, ..., 0.3219, 1.5417, 0.3518]]],
device='cuda:0')
torch.Size([1, 1024]) tensor([[-0.4921, -1.1286, 0.9307, ..., -0.5373, -2.0887, 0.9216]],
device='cuda:0')
torch.Size([1, 1, 1024]) tensor([[[-0.4921, -1.1286, 0.9307, ..., -0.5373, -2.0887, 0.9216]]],
device='cuda:0')
torch.Size([3, 1024]) tensor([[-0.5896, -1.3575, 1.1120, ..., -0.6104, -1.9414, 0.6679],
[-0.6831, -1.3711, 1.1447, ..., -0.5117, -2.0709, 0.8956],
[-0.6337, -1.3548, 1.2228, ..., -0.4896, -2.0505, 0.8417]],
device='cuda:0')
torch.Size([1, 3, 1024]) tensor([[[-0.5896, -1.3575, 1.1120, ..., -0.6104, -1.9414, 0.6679],
[-0.6831, -1.3711, 1.1447, ..., -0.5117, -2.0709, 0.8956],
[-0.6337, -1.3548, 1.2228, ..., -0.4896, -2.0505, 0.8417]]],
device='cuda:0')
0%| | 0/33 [00:01<?, ?it/s]
Traceback (most recent call last):
File "schema_item_classifier.py", line 462, in
total_table_pred_probs, total_column_pred_probs = _test(opt)
File "schema_item_classifier.py", line 427, in _test
batch_column_number_in_each_table
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/data/tt/RESDSQL/utils/classifier_model.py", line 194, in forward
batch_column_number_in_each_table
File "/mnt/data/tt/RESDSQL/utils/classifier_model.py", line 138, in table_column_cls
table_name_embedding = hidden_state_t[-2:, :].view(1, 1024)
RuntimeError: shape '[1, 1024]' is invalid for input of size 3072

对于修改这些错误,需要一些帮助。感谢🙏

这text2sql.py阶段所有的验证集都出现["sql placeholder"]是什么原因

sql placeholder
near "sql": syntax error
sql placeholder
near "sql": syntax error
sql placeholder
near "sql": syntax error
sql placeholder
near "sql": syntax error
sql placeholder
near "sql": syntax error
sql placeholder
near "sql": syntax error
sql placeholder
near "sql": syntax error
sql placeholder
near "sql": syntax error
100%|█████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.33s/it]
2023-03-01 14:17:33,138 INFO root 输出结果
['sql placeholder']

这是用单例测试,用的32G V100

TypeError: 'datetime.datetime' object is not subscriptable

File "F:\python_project\RESDSQL\NatSQL\natsql2sql\preprocess\db_match.py", line 194, in db_col_type_check
if skip_once and len(values) > 7 and not v[0][0].isdigit():
TypeError: 'datetime.datetime' object is not subscriptable

就是说query = "select distinct "+col[1]+" from " + self.table_list[table_idx] + " order by "+col[1]+" limit 500" 这条语句查询出来的[()....()]是'datetime.datetime' 数据类型的就会报错,难道只能是sqlite这种没有datetime类型时间类型的数据库吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.