Code Monkey home page Code Monkey logo

tabularsemanticparsing's Introduction

Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

This is the official code release of the following paper:

Xi Victoria Lin, Richard Socher and Caiming Xiong. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. Findings of EMNLP 2020.

Overview

Cross-domain tabular semantic parsing (X-TSP) is the task of predicting the executable structured query language given a natural language question issued to some database. The model may or may not have seen the target database during training.

This library implements

  • A strong sequence-to-sequence based cross-domain text-to-SQL semantic parser that achieved state-of-the-art performance on two widely used benchmark datasets: Spider and WikiSQL.
  • A set of SQL processing tools for parsing, tokenizing and validating SQL queries, adapted from the Moz SQL Parser.

The parser can be adapted to learn mappings from text to other structured query languages such as SOQL by modifying the formal langauge pre-processing and post-processing modules.

Model

BRIDGE architecture

Our model takes a natural language utterance and a database (schema + field picklists) as input, and generates SQL queries as token sequences. We apply schema-guided decoding and post-processing to make sure the final output is executable.

  • Preprocessing: We concatenate the serialized database schema with the utterance to form a tagged sequence. A fuzzy string matching algorithm is used to identify picklist items mentioned in the utterance. The mentioned picklist items are appended to the corresponding field name in the tagged sequence.
  • Translating: The hybrid sequence is passed through the BRIDGE model, which output raw program sequences with probability scores via beam search.
  • Postprocessing: The raw program sequences are passed through a SQL checker, which verifies its syntactical correctness and schema consistency. Sequences that failed to pass the checker are discarded from the output.

Quick Start

Install Dependencies

Our implementation has been tested using Pytorch 1.7 and Cuda 11.0 with a single GPU.

git clone https://github.com/salesforce/TabularSemanticParsing
cd TabularSemanticParsing

pip install torch torchvision
python3 -m pip install -r requirements.txt

Set up Environment

export PYTHONPATH=`pwd` && python -m nltk.downloader punkt

Process Data

Spider

Download the official data release and unzip the folder. Manually merge spider/train_spider.json with spider/train_others.json into a single file spider/train.json.

mv spider data/ 

# Data Repair (more details in section 4.3 of paper)
python3 data/spider/scripts/amend_missing_foreign_keys.py data/spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --process_data 0

WikiSQL

Download the official data release.

wget https://github.com/salesforce/WikiSQL/raw/master/data.tar.bz2
tar xf data.tar.bz2 -C data && mv data/data data/wikisql1.1
./experiment-bridge.sh configs/bridge/wikisql-bridge-bert-large.sh --process_data 0

The processed data will be stored in a separate pickle file.

Train

Train the model using the following commands. The checkpoint of the best model will be stored in a directory specified by the hyperparameters in the configuration file.

Spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --train 0

WikiSQL

./experiment-bridge.sh configs/bridge/wikisql-bridge-bert-large.sh --train 0

Inference

Decode SQL predictions from pre-trained models. The following commands run inference with the checkpoints stored in the directory specified by the hyperparameters in the configuration file.

Spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --inference 0

WikiSQL

./experiment-bridge.sh configs/bridge/wikisql-bridge-bert-large.sh --inference 0

Note:

  1. Add the --test flag to the above commands to obtain the test set evaluation results on the corresponding dataset. This flag is invalid for Spider, as its test set is hidden.
  2. Add the --checkpoint_path [path_to_checkpoint_tar_file] flag to decode using a checkpoint that's not stored in the default location.
  3. Evaluation metrics will be printed out at the end of decoding. The WikiSQL evaluation takes some time because it computes execution accuracy.

Inference with Model Ensemble

To decode with model ensemble, first list the checkpoint directories of the individual models in the ensemble model configuration file, then run the following command(s).

Spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --ensemble_inference 0

Commandline Demo

You can interact with a pre-trained checkpoint through the commandline using the following commands:

Spider

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --demo 0 --demo_db [db_name] --checkpoint_path [path_to_checkpoint_tar_file]

Hyperparameter Changes

To change the hyperparameters and other experiment set up, start from the configuration files.

Pre-trained Checkpoints

Spider

Download pre-trained checkpoints here:

URL E-SM EXE
https://drive.google.com/file/d/1dlrUdGMLvvvfR3kNVy76H12rR7gr4DXI/view?usp=sharing 70.1 68.2
mv bridge-spider-bert-large-ems-70-1-exe-68-2.tar.gz model
gunzip model/bridge-spider-bert-large-ems-70-1-exe-68-2.tar.gz

Download cached SQL execution order to normal order mappings:

URL
https://drive.google.com/file/d/1vk14iR4V_f5x4e17MAaL_L8T9wgjcKCy/view?usp=sharing

Why this cache? The overhead of converting thousands of SQL queries from execution order to normal order is large, so we cached the conversion for Spider dev set in our experiments. Without using the cache inference on the dev set will be slow. The model still runs fast for individual queries without using a cache.

mv dev.eo.pred.restored.pkl.gz data/spider
gunzip data/spider/dev.eo.pred.restored.pkl.gz

Citation

If you find the resource in this repository helpful, please cite

@inproceedings{LinRX2020:BRIDGE, 
  author = {Xi Victoria Lin and Richard Socher and Caiming Xiong}, 
  title = {Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing}, 
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural
               Language Processing: Findings, {EMNLP} 2020, November 16-20, 2020},
  year = {2020} 
}

Related Links

The parser has been integrated in the Photon web demo: http://naturalsql.com/. Please visit our website to test it live and try it on your own databases!

tabularsemanticparsing's People

Contributors

crafterkolyan avatar dependabot[bot] avatar svc-scm avatar todpole3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tabularsemanticparsing's Issues

Failed to load other language data.

This is Chinese NL2SQL dataset. It has same format with wikisql. https://github.com/ZhuiyiTechnology/TableQA
Except a little of difference.
"sql": [2]. It uses list to wrap the value.
When I load this dataset, I get error.
image
I change the tokenizer to bert_base_chinese. still no working.
So, what can I do to finetuen your model in Chinese NL2SQL dataset?
Thank You very much!!!

Issue loading pre-training model.

Hi

As mentioned in the readme I downloaded ang gunzipped the .tar.gz file, obtaining a .tar file.

mv bridge-spider-bert-large-ems-70-1-exe-68-2.tar.gz model
gunzip model/bridge-spider-bert-large-ems-70-1-exe-68-2.tar.gz
mv bridge-spider-bert-large-ems-70-1-exe-68-2.tar.gz model-best.tar

However, when performing inference, using the following line:

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --inference 0 --checkpoint_path /app/TabularSemanticParsing/model/model-best.tar

I obtain this error:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/tarfile.py", line 187, in nti
n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'ings.tra'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/tarfile.py", line 2287, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/opt/conda/lib/python3.7/tarfile.py", line 1095, in fromtarfile
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
File "/opt/conda/lib/python3.7/tarfile.py", line 1037, in frombuf
chksum = nti(buf[148:156])
File "/opt/conda/lib/python3.7/tarfile.py", line 189, in nti
raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/.local/lib/python3.7/site-packages/torch/serialization.py", line 595, in _load
return legacy_load(f)
File "/root/.local/lib/python3.7/site-packages/torch/serialization.py", line 506, in legacy_load
with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
File "/opt/conda/lib/python3.7/tarfile.py", line 1591, in open
return func(name, filemode, fileobj, **kwargs)
File "/opt/conda/lib/python3.7/tarfile.py", line 1621, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/opt/conda/lib/python3.7/tarfile.py", line 1484, in init
self.firstmember = self.next()
File "/opt/conda/lib/python3.7/tarfile.py", line 2299, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/app/TabularSemanticParsing/src/experiments.py", line 407, in
run_experiment(args)
File "/app/TabularSemanticParsing/src/experiments.py", line 394, in run_experiment
inference(sp)
File "/app/TabularSemanticParsing/src/experiments.py", line 104, in inference
sp.load_checkpoint(get_checkpoint_path(args))
File "/app/TabularSemanticParsing/src/common/learn_framework.py", line 423, in load_checkpoint
checkpoint = torch.load(input_file)
File "/root/.local/lib/python3.7/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/root/.local/lib/python3.7/site-packages/torch/serialization.py", line 599, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: /app/TabularSemanticParsing/model/model-best.tar is a zip archive (did you mean to use torch.jit.load()?)

Does anyone have a clue of what might be going on ?

RuntimeError: CUDA out of memory. Error while training in Google colab

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/My Drive/TabularSemanticParsing/src/experiments.py", line 407, in
run_experiment(args)
File "/content/drive/My Drive/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
train(sp)
File "/content/drive/My Drive/TabularSemanticParsing/src/experiments.py", line 63, in train
sp.run_train(train_data, dev_data)
File "/content/drive/My Drive/TabularSemanticParsing/src/common/learn_framework.py", line 208, in run_train
loss = self.loss(formatted_batch)
File "/content/drive/My Drive/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 94, in loss
outputs = self.forward(formatted_batch)
File "/content/drive/My Drive/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 129, in forward
decoder_ptr_value_ids=decoder_ptr_value_ids)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1071, in _call_impl
result = forward_call(*input, **kwargs)
File "/content/drive/My Drive/TabularSemanticParsing/src/semantic_parser/bridge.py", line 50, in forward
inputs, input_masks, segments=segment_ids, position_ids=position_ids)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/content/drive/My Drive/TabularSemanticParsing/src/common/nn_modules.py", line 67, in forward
inputs, token_type_ids=segments, position_ids=position_ids, attention_mask=(~input_masks)))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_bert.py", line 790, in forward
encoder_attention_mask=encoder_extended_attention_mask,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_bert.py", line 407, in forward
hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_bert.py", line 368, in forward
self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_bert.py", line 314, in forward
hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_bert.py", line 234, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 11.17 GiB total capacity; 10.35 GiB already allocated; 33.81 MiB free; 10.68 GiB reserved in total by PyTorch)

Python Version

hello,can you share me the specific python version you use when developing?3.4 ,3.6,3.7 or some else.I get a problem and i guess that caused by the python version.
here is the detail error:

 WARNING: Building wheel for cryptacular failed: [Errno 2] No such file or directory: '/tmp/pip-wheel-zlvdpm39/cryptacular-1.5.5-cp37-abi3-manylinux2010_x86_64.whl'
Failed to build cryptacular
ERROR: Could not build wheels for cryptacular which use PEP 517 and cannot be installed directly

Maybe the cryptacular exists with python version,and maybe it can't be install directly ,so i guess the python version changing can solve it!
i am freshman to python and AI learning ,excepting your reply
thanks!

Is there a successful example in Chinese dataset?

I tried to use this model on DuSQL, a Chinese dataset which has a similar format with spider. And I change BERT to "bert based mutimingual uncased" but it turns out that it failed to be trained. Did anyone succeed in Chinese dataset?

CUDA out of memory on Spider

Hi,

Thanks for sharing the code. I found when training on Spider, CUDA out of memory error keeps occuring even though I have set the batch size of both train and dev in parse_args.py to 1. Any idea on that?

image

TypeError when training on Spider

Training on the spider data set fails with TypeError: '<' not supported between instances of 'NoneType' and 'int'.

The only modification made is using a smaller batch size, 8 instead of 16, to avoid memory issues.
I have not tried to debug vectorizers.py.

Training on WikiSQL worked fine.

The full error:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/hjalmar/Python/TabularSemanticParsing/src/experiments.py", line 407, in <module>
    run_experiment(args)
  File "/home/hjalmar/Python/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
    train(sp)
  File "/home/hjalmar/Python/TabularSemanticParsing/src/experiments.py", line 63, in train
    sp.run_train(train_data, dev_data)
  File "/home/hjalmar/Python/TabularSemanticParsing/src/common/learn_framework.py", line 207, in run_train
    formatted_batch = self.format_batch(mini_batch)
  File "/home/hjalmar/Python/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 425, in format_batch
    vec.vectorize_field_ptr_out(exp.program_singleton_field_tokens,
  File "/home/hjalmar/Python/TabularSemanticParsing/src/data_processor/vectorizers.py", line 102, in vectorize_field_ptr_out
    if schema_pos < num_included_nodes:
TypeError: '<' not supported between instances of 'NoneType' and 'int'

RuntimeError when training models on WikiSQL

Hello, I was able to finish the training part following the commands in the README and the fix in the #17 .
But I still got the following RuntimeError during the evaluation.

Could you please give some suggestions on how I can fix it? Thank you!


/home/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:795: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.
  warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19200/19200 [30:00<00:00, 10.67it/s]
Step 399.9791666666667: average training loss = 0.7160661821833583
  0%|                                                                                                                                                                                                  | 0/4211 [00:00<?, ?it/s]

wandb: Waiting for W&B process to finish, PID 54022
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/Code/TabularSemanticParsing/src/experiments.py", line 407, in <module>
    run_experiment(args)
  File "/home/Code/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
    train(sp)
  File "/home/Code/TabularSemanticParsing/src/experiments.py", line 63, in train
    sp.run_train(train_data, dev_data)
  File "/home/Code/TabularSemanticParsing/src/common/learn_framework.py", line 241, in run_train
    output_dict = self.inference(dev_data, restore_clause_order=self.args.process_sql_in_execution_order,
  File "/home/Code/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 157, in inference
    outputs = self.forward(formatted_batch, model_ensemble)
  File "/home/Code/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 124, in forward
    outputs = self.mdl(encoder_ptr_input_ids, encoder_ptr_value_ids,
  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/Code/TabularSemanticParsing/src/semantic_parser/bridge.py", line 54, in forward
    self.encoder(inputs_embedded,
  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/Code/TabularSemanticParsing/src/semantic_parser/bridge.py", line 263, in forward
    schema_hiddens = self.schema_encoder(schema_hiddens, feature_ids)
  File "/home/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/Code/TabularSemanticParsing/src/semantic_parser/brwandb: Program failed with code 1. Press ctrl-c to abort syncing.
idge.py", line 166, in forward
    schema_hiddens = self.feature_fusion_layer(torch.cat([input_hiddens,
RuntimeError: Sizes of tensors must match except in dimension 1. Got 8 and 7 (The offending index is 0)
wandb: You can sync this run to the cloud by running: 
wandb: wandb sync wandb/run-20211001_050816-evuizgcy
Exception ignored in: <function _ConnectionRecord.checkout.<locals>.<lambda> at 0x7f6966c72160>
Traceback (most recent call last):
  File "/home/Code/TabularSemanticParsing/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 506, in <lambda>
  File "/home/Code/TabularSemanticParsing/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 714, in _finalize_fairy
  File "/home/Code/TabularSemanticParsing/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 531, in checkin
  File "/home/Code/TabularSemanticParsing/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 388, in _return_conn
  File "/home/Code/TabularSemanticParsing/venv/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 238, in _do_return_conn
  File "/home/Code/TabularSemanticParsing/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 543, in close
  File "/home/Code/TabularSemanticParsing/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 645, in __close
  File "/home/Code/TabularSemanticParsing/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 267, in _close_connection
  File "/usr/lib/python3.8/logging/__init__.py", line 1433, in debug
  File "/usr/lib/python3.8/logging/__init__.py", line 1699, in isEnabledFor
TypeError: 'NoneType' object is not callable

Matchfirst not iterable during data processing

Hello,

When preprocessing the data via ./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --process_data 0

I found this error - as
TypeError: argument of type 'MatchFirst' is not iterable

This comes from moz_sp/formatting.py as it attempts to check the incoming identifier against the set of RESERVED names, however as these are infact pyparsing objects, they aren't iterable. I tries this via 'identifier not in RESERVED'.
https://pythonhosted.org/pyparsing/pyparsing.MatchFirst-class.html

It appears that 'not RESERVED.searchString(identifier)' may solve this issue?

I'm using python 3.8
pyparsing==2.4.7

Failed to load other language data.

This is Chinese NL2SQL dataset. It has same format with wikisql. https://github.com/ZhuiyiTechnology/TableQA
Except a little of difference.
"sql": [2]. It uses list to wrap the value.
When I load this dataset, I get error.
image
I change the tokenizer to bert_base_chinese. still no working.
So, what can I do to finetuen your model in Chinese NL2SQL dataset?
Thank You very much!!!

ValueError when running inference

Hi,
I have two problems.

First,in Process Spider Data part,"mv spider data/ "can not execute,I think it is because "data/spider/scripts" already exists.Does it mean add scripts folder under spider folder and then put spider folder under data folder?

Second,in inference part,after using "./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --inference 0 --checkpoint_path /home/guest31/spiderModel/TabularSemanticParsing/model/bridge-spider-bert-large-ems-70-1-exe-68-2.tar",the output shows ”ValueError: too many values to unpack (expected 3)"
The details are as follows:
Traceback (most recent call last):
File "/home/guest31/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/guest31/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/guest31/spiderModel/TabularSemanticParsing/src/experiments.py", line 407, in
run_experiment(args)
File "/home/guest31/spiderModel/TabularSemanticParsing/src/experiments.py", line 394, in run_experiment
inference(sp)
File "/home/guest31/spiderModel/TabularSemanticParsing/src/experiments.py", line 122, in inference
engine=engine, inline_eval=True, verbose=True)
File "/home/guest31/spiderModel/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 209, in inference
restored_pred, grammatical, schema_consistent = pred_restored_cache[db_name][pred_sql]
ValueError: too many values to unpack (expected 3)

Could you please give me some suggestions?
Thank you.

cuDNN error: CUDNN_STATUS_MAPPING_ERROR

Hi there,

Thanks for releasing code. I occurred an cuDNN error when training the model on spider dataset. I am using Pytorch 1.7.0 with cuda 11.0 and python 3.7 on GeForce RTX 3090.

I only changed the batch size and delete some samples in the original spider dataset. The log is showed below. Any idea on that?

Model initialization (xavier)
encoder_embeddings.trans_parameters.embeddings.word_embeddings.weight (skipped)
encoder_embeddings.trans_parameters.embeddings.position_embeddings.weight (skipped)
encoder_embeddings.trans_parameters.embeddings.token_type_embeddings.weight (skipped)
encoder_embeddings.trans_parameters.embeddings.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.embeddings.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.value.weight (skipped)
.......

Model Parameters
.....
mdl.encoder.text_encoder.rnn.rnn.rnn.bias_ih_l0 800 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.bias_hh_l0 800 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.weight_ih_l0_reverse 320000 requires_grad=True
Total # parameters = 342157588

wandb: Tracking run with wandb version 0.8.30
wandb: Wandb version 0.10.33 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20210706_030553-1b8pui3b
wandb: Syncing run spider.bridge.lstm.meta.ts.ppl-0.85.2.dn.eo.feat.bert-large-uncased.xavier-1024-400-400-8-2-0.0005-inv-sqr-0.0005-4000-6e-05-inv-sqr-3e-05-4000-0.3-0.3-0
.0-0.0-1-8-0.0-0.0-res-0.2-0.0-ff-0.4-0.0.210706-030553.daud
wandb: ⭐️ View project at https://app.wandb.ai/ningzheng/smore-spider-group--final
wandb: 🚀 View run at https://app.wandb.ai/ningzheng/smore-spider-group--final/runs/1b8pui3b
wandb: Run wandb off to turn off syncing.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [19:18<00:00, 1.73it/s]
Step 999.5: average training loss = 1.4097217868864536
0 pre-computed prediction order reconstruction cached
10%|█████████████▎ | 30/302 [05:23<48:53, 10.79s/it]
Traceback (most recent call last):
File "/home/ningzheng/miniconda3/envs/tsp/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ningzheng/miniconda3/envs/tsp/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ningzheng/TabularSemanticParsing/src/experiments.py", line 407, in
run_experiment(args)
File "/home/ningzheng/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
train(sp)
File "/home/ningzheng/TabularSemanticParsing/src/experiments.py", line 63, in train
sp.run_train(train_data, dev_data)

wandb: Waiting for W&B process to finish, PID 1765293
File "/home/ningzheng/TabularSemanticParsing/src/common/learn_framework.py", line 250, in run_train
engine=engine, inline_eval=True, verbose=False)
File "/home/ningzheng/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 157, in inference
outputs = self.forward(formatted_batch, model_ensemble)
File "/home/ningzheng/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 129, in forward
decoder_ptr_value_ids=decoder_ptr_value_ids)
File "/home/ningzheng/miniconda3/envs/tsp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ningzheng/TabularSemanticParsing/src/semantic_parser/bridge.py", line 100, in forward
no_from=(self.dataset_name == 'wikisql'))
File "/home/ningzheng/TabularSemanticParsing/src/semantic_parser/decoding_algorithms.py", line 251, in beam_search
last_output=input)
File "/home/ningzheng/miniconda3/envs/tsp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ningzheng/TabularSemanticParsing/src/semantic_parser/bridge.py", line 349, in forward
output, hidden = self.rnn(input_sa, hidden)
File "/home/ningzheng/miniconda3/envs/tsp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ningzheng/TabularSemanticParsing/src/common/nn_modules.py", line 98, in forward
return self.rnn(inputs, hidden)
File "/home/ningzheng/miniconda3/envs/tsp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ningzheng/miniconda3/envs/tsp/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 582, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [15,0,0], thread: [4,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [15,0,0], thread: [5,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [15,0,0], thread: [6,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [15,0,0], thread: [7,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [15,0,0], thread: [8,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [15,0,0], thread: [9,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [18,0,0], thread: [3,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [18,0,0], thread: [4,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [18,0,0], thread: [5,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [18,0,0], thread: [6,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [18,0,0], thread: [7,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [18,0,0], thread: [8,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [18,0,0], thread: [9,0,0] Assertion idx_dim >= 0 & & idx_dim < index_size && "index out of bounds" failed.
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: Run summary:
wandb: _step 4
wandb: _timestamp 1625541921.4755883
wandb: learning_rate/spider 0.0005
wandb: _runtime 1190.0860652923584
wandb: fine_tuning_rate/spider 3.37575e-05
wandb: cross_entropy_loss/spider 1.4097217868864536
wandb: Syncing files in wandb/run-20210706_030553-1b8pui3b:
wandb: code/src/experiments.py
wandb: plus 7 W&B file(s) and 1 media file(s)
wandb:
wandb: Synced spider.bridge.lstm.meta.ts.ppl-0.85.2.dn.eo.feat.bert-large-uncased.xavier-1024-400-400-8-2-0.0005-inv-sqr-0.0005-4000-6e-05-inv-sqr-3e-05-4000-0.3-0.3-0.0-0.
0-1-8-0.0-0.0-res-0.2-0.0-ff-0.4-0.0.210706-030553.daud: https://app.wandb.ai/ningzheng/smore-spider-group--final/runs/1b8pui3b

Multiple GPU computing

Hello, I want to use multiple GPUs for computing. But I see only "GPU =$3" in the parameter Settings. How can I change the calculation to multiple Gpus?

Unable to reproduce the results from pretrained checkpoint

After fixing the bugs mentioned in #14 (downgrade src/semantic_parser/learn_framework.py L206-225 and moz_sp/__init__.py restore_clause_order function), I got the following results from your pretrained checkpoints:

E-SM EXE
70.0 67.9

Do you know the reason for this difference? Could you please make the code compatible with your checkpoints (fix #14)?

pre-processing

Hi, when I am running the following code,

Data Repair (more details in section 4.3 of paper)

python3 data/spider/scripts/amend_missing_foreign_keys.py data/spider

I got the following error message:

for i, c in enumerate(table['column_names_original']):
KeyError: 'column_names_original'

I checked the spider dataset; tables there do not have the filed 'column_names_original'.

Does anyone encounter the same problem?

Can't load checkpoint to continue training

Hi, I encountered a bug when the system finished an epoch from a checkpoint.

The original error message may be a little confusing. Sorry about that. I found that when the training reaches about the 9000 step, the loss will become 'nan'. Compared with the original parameters, due to the 12GB limit, I changed the 'train_batch_size' and 'num_accumulation_steps', in order to ensure that the full batch size is equal to 32.


Supplement: I'm sorry I forgot to provide the changed parameters. I made two changes in total:

num_accumulation_steps: 2 -> 16
train_batch_size: 16 -> 2

In addition, I cloned the latest version of the code yesterday, and once again encountered the problem of loss becoming ‘nan’. The error message is as followed:

Step 6999.9375: average training loss = 0.02193380185714989
0 pre-computed prediction order reconstruction cached
100%|██████████| 345/345 [28:50<00:00,  3.37s/it]
Dev set performance:
Top-1 exact match: 0.6334622823984526
Top-3 exact match: 0.7272727272727273
 73%|███████▎  | 11648/16000 [1:58:43<39:58,  1.81it/s]> /home/rxsun/TableQA/TabularSemanticParsing/src/common/nn_modules.py(695)forward()
-> if loss > 1e8:
(Pdb) l
690                 return 0
691             loss = F.nll_loss(masked_inputs, masked_targets)
692             if torch.isnan(loss):
693                 import pdb
694                 pdb.set_trace()
695  ->         if loss > 1e8:
696                 import pdb
697                 pdb.set_trace()
698             return loss
699
700
(Pdb) loss
tensor(nan, device='cuda:0', grad_fn=<NllLossBackward>)

Could you give me some advice about that?Thanks!

Continue training from the checkpoint

I trained my model and noticed that after each step the best model was saved. accidentally I stopped training.
How can I pick up where I left off
Thanks in advance

Training error

Hi,

I followed the steps and trained a model with Spider dataset and experienced the following error:

./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --train 0
run CUDA_VISIBLE_DEVICES=0 python3 -m src.experiments     --train     --data_dir data/spider     --db_dir data/spider/database     --dataset_name spider     --question_split          --question_only               --denormalize_sql               --table_shuffling     --use_lstm_encoder     --use_meta_data_encoding          --sql_consistency_check          --use_picklist     --anchor_text_match_threshold 0.85               --top_k_picklist_matches 2     --process_sql_in_execution_order               --num_random_tables_added 0                         --save_best_model_only     --schema_augmentation_factor 1          --data_augmentation_factor 1          --vocab_min_freq 0     --text_vocab_min_freq 0     --program_vocab_min_freq 0     --num_values_per_field 0     --max_in_seq_len 512     --max_out_seq_len 60     --model bridge     --num_steps 100000     --curriculum_interval 0     --num_peek_steps 1000     --num_accumulation_steps 2     --train_batch_size 16     --dev_batch_size 24     --encoder_input_dim 1024     --encoder_hidden_dim 400     --decoder_input_dim 400     --num_rnn_layers 1     --num_const_attn_layers 0     --emb_dropout_rate 0.3     --pretrained_lm_dropout_rate 0     --rnn_layer_dropout_rate 0     --rnn_weight_dropout_rate 0     --cross_attn_dropout_rate 0     --cross_attn_num_heads 8     --res_input_dropout_rate 0.2     --res_layer_dropout_rate 0     --ff_input_dropout_rate 0.4     --ff_hidden_dropout_rate 0.0     --pretrained_transformer bert-large-uncased          --bert_finetune_rate 0.00006     --learning_rate 0.0005     --learning_rate_scheduler inverse-square     --trans_learning_rate_scheduler inverse-square     --warmup_init_lr 0.0005     --warmup_init_ft_lr 0.00003     --num_warmup_steps 4000     --grad_norm 0.3     --decoding_algorithm beam-search     --beam_size 16     --bs_alpha 1.05     --gpu 0
./experiment-bridge.sh: line 205: CUDA_VISIBLE_DEVICES=0: command not found
(pyvenv37-oda-text2sql-tabular-semantic-parsing) [vchoang@oda2-vm-gpu3-4-ad3-03 TabularSemanticParsing]$ vim ./experiment-bridge.sh
(pyvenv37-oda-text2sql-tabular-semantic-parsing) [vchoang@oda2-vm-gpu3-4-ad3-03 TabularSemanticParsing]$ ./experiment-bridge.sh configs/bridge/spider-bridge-bert-large.sh --train 0
run python3 -m src.experiments     --train     --data_dir data/spider     --db_dir data/spider/database     --dataset_name spider     --question_split          --question_only               --denormalize_sql               --table_shuffling     --use_lstm_encoder     --use_meta_data_encoding          --sql_consistency_check          --use_picklist     --anchor_text_match_threshold 0.85               --top_k_picklist_matches 2     --process_sql_in_execution_order               --num_random_tables_added 0                         --save_best_model_only     --schema_augmentation_factor 1          --data_augmentation_factor 1          --vocab_min_freq 0     --text_vocab_min_freq 0     --program_vocab_min_freq 0     --num_values_per_field 0     --max_in_seq_len 512     --max_out_seq_len 60     --model bridge     --num_steps 100000     --curriculum_interval 0     --num_peek_steps 1000     --num_accumulation_steps 2     --train_batch_size 16     --dev_batch_size 24     --encoder_input_dim 1024     --encoder_hidden_dim 400     --decoder_input_dim 400     --num_rnn_layers 1     --num_const_attn_layers 0     --emb_dropout_rate 0.3     --pretrained_lm_dropout_rate 0     --rnn_layer_dropout_rate 0     --rnn_weight_dropout_rate 0     --cross_attn_dropout_rate 0     --cross_attn_num_heads 8     --res_input_dropout_rate 0.2     --res_layer_dropout_rate 0     --ff_input_dropout_rate 0.4     --ff_hidden_dropout_rate 0.0     --pretrained_transformer bert-large-uncased          --bert_finetune_rate 0.00006     --learning_rate 0.0005     --learning_rate_scheduler inverse-square     --trans_learning_rate_scheduler inverse-square     --warmup_init_lr 0.0005     --warmup_init_ft_lr 0.00003     --num_warmup_steps 4000     --grad_norm 0.3     --decoding_algorithm beam-search     --beam_size 16     --bs_alpha 1.05     --gpu 0
Model directory created: /mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/model/spider.bridge.lstm.meta.ts.ppl-0.85.2.dn.eo.feat.bert-large-uncased.xavier-1024-400-400-16-2-0.0005-inv-sqr-0.0005-4000-6e-05-inv-sqr-3e-05-4000-0.3-0.3-0.0-0.0-1-8-0.0-0.0-res-0.2-0.0-ff-0.4-0.0.210111-233401.kvzx
Visualization directory created: /mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/viz/spider.bridge.lstm.meta.ts.ppl-0.85.2.dn.eo.feat.bert-large-uncased.xavier-1024-400-400-16-2-0.0005-inv-sqr-0.0005-4000-6e-05-inv-sqr-3e-05-4000-0.3-0.3-0.0-0.0-1-8-0.0-0.0-res-0.2-0.0-ff-0.4-0.0.210111-233401.kvzx
* text vocab size = 30522
* program vocab size = 99

pretrained_transformer = bert-large-uncased
fix_pretrained_transformer_parameters = False

bridge module created
execution order restoration cache copied
source: data/spider/dev.eo.pred.restored.pkl
dest: /mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/model/spider.bridge.lstm.meta.ts.ppl-0.85.2.dn.eo.feat.bert-large-uncased.xavier-1024-400-400-16-2-0.0005-inv-sqr-0.0005-4000-6e-05-inv-sqr-3e-05-4000-0.3-0.3-0.0-0.0-1-8-0.0-0.0-res-0.2-0.0-ff-0.4-0.0.210111-233401.kvzx/dev.eo.pred.restored.pkl

loading preprocessed data: data/spider/spider.bridge.question-split.ppl-0.85.2.dn.eo.bert.pkl
8659 training examples loaded
1034 dev examples loaded
Model initialization (xavier)
--------------------------
encoder_embeddings.trans_parameters.embeddings.word_embeddings.weight (skipped)
encoder_embeddings.trans_parameters.embeddings.position_embeddings.weight (skipped)
encoder_embeddings.trans_parameters.embeddings.token_type_embeddings.weight (skipped)
encoder_embeddings.trans_parameters.embeddings.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.embeddings.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.0.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.1.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.2.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.3.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.4.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.5.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.6.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.7.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.8.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.9.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.10.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.11.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.12.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.13.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.14.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.15.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.16.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.17.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.18.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.19.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.20.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.21.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.22.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.self.query.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.self.query.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.self.key.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.self.key.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.self.value.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.self.value.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.attention.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.intermediate.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.intermediate.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.output.dense.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.output.dense.bias (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.output.LayerNorm.weight (skipped)
encoder_embeddings.trans_parameters.encoder.layer.23.output.LayerNorm.bias (skipped)
encoder_embeddings.trans_parameters.pooler.dense.weight (skipped)
encoder_embeddings.trans_parameters.pooler.dense.bias (skipped)
decoder_embeddings.embeddings.weight done
encoder.bilstm_encoder.rnn.rnn.rnn.weight_ih_l0 done
encoder.bilstm_encoder.rnn.rnn.rnn.weight_hh_l0 done
encoder.bilstm_encoder.rnn.rnn.rnn.bias_ih_l0
encoder.bilstm_encoder.rnn.rnn.rnn.bias_hh_l0
encoder.bilstm_encoder.rnn.rnn.rnn.weight_ih_l0_reverse done
encoder.bilstm_encoder.rnn.rnn.rnn.weight_hh_l0_reverse done
encoder.bilstm_encoder.rnn.rnn.rnn.bias_ih_l0_reverse
encoder.bilstm_encoder.rnn.rnn.rnn.bias_hh_l0_reverse
encoder.text_encoder.rnn.rnn.rnn.weight_ih_l0 done
encoder.text_encoder.rnn.rnn.rnn.weight_hh_l0 done
encoder.text_encoder.rnn.rnn.rnn.bias_ih_l0
encoder.text_encoder.rnn.rnn.rnn.bias_hh_l0
encoder.text_encoder.rnn.rnn.rnn.weight_ih_l0_reverse done
encoder.text_encoder.rnn.rnn.rnn.weight_hh_l0_reverse done
encoder.text_encoder.rnn.rnn.rnn.bias_ih_l0_reverse
encoder.text_encoder.rnn.rnn.rnn.bias_hh_l0_reverse
encoder.schema_encoder.primary_key_embeddings.embeddings.weight done
encoder.schema_encoder.foreign_key_embeddings.embeddings.weight done
encoder.schema_encoder.field_type_embeddings.embeddings.weight done
encoder.schema_encoder.feature_fusion_layer.linear1.weight done
encoder.schema_encoder.feature_fusion_layer.linear1.bias
encoder.schema_encoder.feature_fusion_layer.linear2.weight done
encoder.schema_encoder.feature_fusion_layer.linear2.bias
encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.mdl_with_residual_connection.mdl.linear1.weight done
encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.mdl_with_residual_connection.mdl.linear1.bias
encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.mdl_with_residual_connection.mdl.linear2.weight done
encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.mdl_with_residual_connection.mdl.linear2.bias
decoder.out.linear.weight done
decoder.out.linear.bias
decoder.rnn.rnn.weight_ih_l0 done
decoder.rnn.rnn.weight_hh_l0 done
decoder.rnn.rnn.bias_ih_l0
decoder.rnn.rnn.bias_hh_l0
decoder.attn.wq.weight done
decoder.attn.wk.weight done
decoder.attn.wv.weight done
decoder.attn.wo.weight done
decoder.attn_combine.linear1.weight done
decoder.attn_combine.linear1.bias
decoder.pointer_switch.project.linear1.weight done
decoder.pointer_switch.project.linear1.bias
--------------------------

Model Parameters
--------------------------
mdl.decoder_embeddings.embeddings.weight 39600 requires_grad=True
mdl.encoder.bilstm_encoder.rnn.rnn.rnn.weight_ih_l0 819200 requires_grad=True
mdl.encoder.bilstm_encoder.rnn.rnn.rnn.weight_hh_l0 160000 requires_grad=True
mdl.encoder.bilstm_encoder.rnn.rnn.rnn.bias_ih_l0 800 requires_grad=True
mdl.encoder.bilstm_encoder.rnn.rnn.rnn.bias_hh_l0 800 requires_grad=True
mdl.encoder.bilstm_encoder.rnn.rnn.rnn.weight_ih_l0_reverse 819200 requires_grad=True
mdl.encoder.bilstm_encoder.rnn.rnn.rnn.weight_hh_l0_reverse 160000 requires_grad=True
mdl.encoder.bilstm_encoder.rnn.rnn.rnn.bias_ih_l0_reverse 800 requires_grad=True
mdl.encoder.bilstm_encoder.rnn.rnn.rnn.bias_hh_l0_reverse 800 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.weight_ih_l0 320000 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.weight_hh_l0 160000 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.bias_ih_l0 800 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.bias_hh_l0 800 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.weight_ih_l0_reverse 320000 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.weight_hh_l0_reverse 160000 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.bias_ih_l0_reverse 800 requires_grad=True
mdl.encoder.text_encoder.rnn.rnn.rnn.bias_hh_l0_reverse 800 requires_grad=True
mdl.encoder.schema_encoder.primary_key_embeddings.embeddings.weight 800 requires_grad=True
mdl.encoder.schema_encoder.foreign_key_embeddings.embeddings.weight 800 requires_grad=True
mdl.encoder.schema_encoder.field_type_embeddings.embeddings.weight 2400 requires_grad=True
mdl.encoder.schema_encoder.feature_fusion_layer.linear1.weight 640000 requires_grad=True
mdl.encoder.schema_encoder.feature_fusion_layer.linear1.bias 400 requires_grad=True
mdl.encoder.schema_encoder.feature_fusion_layer.linear2.weight 160000 requires_grad=True
mdl.encoder.schema_encoder.feature_fusion_layer.linear2.bias 400 requires_grad=True
mdl.encoder.schema_encoder.field_table_fusion_layer.layer_norm.gamma 400 requires_grad=True
mdl.encoder.schema_encoder.field_table_fusion_layer.layer_norm.beta 400 requires_grad=True
mdl.encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.mdl_with_residual_connection.mdl.linear1.weight 160000 requires_grad=True
mdl.encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.mdl_with_residual_connection.mdl.linear1.bias 400 requires_grad=True
mdl.encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.mdl_with_residual_connection.mdl.linear2.weight 160000 requires_grad=True
mdl.encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.mdl_with_residual_connection.mdl.linear2.bias 400 requires_grad=True
mdl.encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.layer_norm.gamma 400 requires_grad=True
mdl.encoder.schema_encoder.field_table_fusion_layer.res_feed_forward.layer_norm.beta 400 requires_grad=True
mdl.decoder.out.linear.weight 39600 requires_grad=True
mdl.decoder.out.linear.bias 99 requires_grad=True
mdl.decoder.rnn.rnn.weight_ih_l0 1280000 requires_grad=True
mdl.decoder.rnn.rnn.weight_hh_l0 640000 requires_grad=True
mdl.decoder.rnn.rnn.bias_ih_l0 1600 requires_grad=True
mdl.decoder.rnn.rnn.bias_hh_l0 1600 requires_grad=True
mdl.decoder.attn.wq.weight 160000 requires_grad=True
mdl.decoder.attn.wk.weight 160000 requires_grad=True
mdl.decoder.attn.wv.weight 160000 requires_grad=True
mdl.decoder.attn.wo.weight 160000 requires_grad=True
mdl.decoder.attn_combine.linear1.weight 320000 requires_grad=True
mdl.decoder.attn_combine.linear1.bias 400 requires_grad=True
mdl.decoder.pointer_switch.project.linear1.weight 800 requires_grad=True
mdl.decoder.pointer_switch.project.linear1.bias 1 requires_grad=True
Total # parameters = 342157588
--------------------------

wandb: Tracking run with wandb version 0.8.30
wandb: Wandb version 0.10.13 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20210111_233422-17hmk4jb
wandb: Syncing run spider.bridge.lstm.meta.ts.ppl-0.85.2.dn.eo.feat.bert-large-uncased.xavier-1024-400-400-16-2-0.0005-inv-sqr-0.0005-4000-6e-05-inv-sqr-3e-05-4000-0.3-0.3-0.0-0.0-1-8-0.0-0.0-res-0.2-0.0-ff-0.4-0.0.210111-233422.pomb
wandb: ⭐️ View project at https://app.wandb.ai/duyvuleo/smore-spider-group--final
wandb: 🚀 View run at https://app.wandb.ai/duyvuleo/smore-spider-group--final/runs/17hmk4jb
wandb: Run `wandb off` to turn off syncing.

  0%|                                                                                                                 | 1/2000 [00:01<45:32,  1.37s/it]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/src/experiments.py", line 407, in <module>
    run_experiment(args)
  File "/mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
    train(sp)
  File "/mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/src/experiments.py", line 63, in train
    sp.run_train(train_data, dev_data)
  File "/mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/src/common/learn_framework.py", line 207, in run_train
    formatted_batch = self.format_batch(mini_batch)
  File "/mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 430, in format_batch
    num_included_nodes=num_included_nodes)
  File "/mnt/shared_ad3_mt1/vchoang/works/projects/oda/text2sql/code/TabularSemanticParsing/src/data_processor/vectorizers.py", line 102, in vectorize_field_ptr_out
    if schema_pos < num_included_nodes:
TypeError: '<' not supported between instances of 'NoneType' and 'int'

wandb: Waiting for W&B process to finish, PID 21092
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: Run summary:
wandb:      learning_rate/spider 0.0005
wandb:                  _runtime 44.21112084388733
wandb:                _timestamp 1610408083.642099
wandb:                     _step 1
wandb:   fine_tuning_rate/spider 3.00075e-05
wandb: Syncing files in wandb/run-20210111_233422-17hmk4jb:
wandb:   code/src/experiments.py
wandb: plus 7 W&B file(s) and 1 media file(s)
wandb:
wandb: Synced spider.bridge.lstm.meta.ts.ppl-0.85.2.dn.eo.feat.bert-large-uncased.xavier-1024-400-400-16-2-0.0005-inv-sqr-0.0005-4000-6e-05-inv-sqr-3e-05-4000-0.3-0.3-0.0-0.0-1-8-0.0-0.0-res-0.2-0.0-ff-0.4-0.0.210111-233422.pomb: https://app.wandb.ai/duyvuleo/smore-spider-group--final/runs/17hmk4jb

Do you know the reason for this error? Thanks!

How to run inference on CPU?

In the command line arguments --gpu is to specify the device id of the GPU. How to set this up for a CPU instance?

strange error when inferring on novel examples

I tried to run the pre-trained model interactively, with db_id=baseball_1.

When I run this question (from training set)
Find the full name and id of the college that has the most baseball players.
it runs normally.

But when I change the question to
return college that has the most baseball players

Strange thing happens: beam search returns CUDA assertion error as db_scope_update_idx.max() = 205 while m_field_masks.size(1) = 197. So the following scatter_add_ command cannot run.

m_field_masks.scatter_add_(index=db_scope_update_idx, src=db_scope_update_mask.long(), dim=1)

Any idea why this happens? Happy to provide more details for replicating this weird bug...

Plan to share the trained model weights?

Thank you for open source the related code.

I noticed that BRIGE uses BERT-large to encode both question and tables. As mentioned in the paper, "The training time of our model on an NVIDIA A100 GPU is approximately 51.5h (including intermediate results verification time)." It takes a lot of time and device cost.

Are you considering sharing the trained model weights? Looking forward to your reply.

Test on Custom Data

Hi Team,

Thanks for open-sourcing the project.
I am curious if you can guide me how I can use this framework on my custom data using wikisql model.

Thank you.

RuntimeError (Sizes of tensors must match) when training on 'WikiSQL'

Hi,

I followed the steps to train on Spider & WikiSQL using a Tesla M40 (24GB memory) using 'train_batch_size=4' (No other changes are made to the model configuration):

# wikisql-bridge-bert-large.sh
num_steps=30000
curriculum_interval=0
num_peek_steps=400
num_accumulation_steps=3
save_best_model_only="True"
train_batch_size=4  # from 16 to 4

It works well on Spider dataset,
but when comes to WikiSQL , I experienced the following error:

--------------------------

wandb: Tracking run with wandb version 0.8.30
wandb: Wandb version 0.10.21 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20210303_163242-23o2pmxp
wandb: Syncing run wikisql.bridge.lstm.meta.ts.ppl-0.85.2.dn.no_from.feat.bert-large-uncased.xavier-1024-512-512-4-3-0.0003-inv-sqr-0.0003-3000-5e-05-inv-sqr-0.0-3000-0.3-0.3-0.0-0.0-1-8-0.1-0.0-res-0.2-0.0-ff-0.4-0.0.210304-003242.scz2
wandb: ⭐️ View project at https://app.wandb.ai/zjy/smore-wikisql-group--final
wandb: 🚀 View run at https://app.wandb.ai/zjy/smore-wikisql-group--final/runs/23o2pmxp
wandb: Run `wandb off` to turn off syncing.

  2%|█▉                                                              | 19/1200 [00:08<08:29,  2.32it/s]
Traceback (most recent call last):
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 407, in <module>
    run_experiment(args)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
    train(sp)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 63, in train
    sp.run_train(train_data, dev_data)
  File "/data/users/zjy/TabularSemanticParsing/src/common/learn_framework.py", line 208, in run_train
    loss = self.loss(formatted_batch)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 94, in loss
    outputs = self.forward(formatted_batch)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 129, in forward
    decoder_ptr_value_ids=decoder_ptr_value_ids)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 59, in forward
    transformer_output_value_masks)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 263, in forward
    schema_hiddens = self.schema_encoder(schema_hiddens, feature_ids)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 169, in forward
    field_type_embeddings], dim=2))
RuntimeError: Sizes of tensors must match except in dimension 1. Got 9 and 11 (The offending index is 0)

wandb: Waiting for W&B process to finish, PID 957961
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: Run summary:
wandb:                   _runtime 83.59444427490234
wandb:      learning_rate/wikisql 0.0003
wandb:                      _step 1
wandb:                 _timestamp 1614789212.7551596
wandb:   fine_tuning_rate/wikisql 1.6666666666666667e-08
wandb: Syncing files in wandb/run-20210303_163242-23o2pmxp:
wandb:   code/src/experiments.py
wandb: plus 8 W&B file(s) and 1 media file(s)
wandb:                                                                                
wandb: Synced wikisql.bridge.lstm.meta.ts.ppl-0.85.2.dn.no_from.feat.bert-large-uncased.xavier-1024-512-512-4-3-0.0003-inv-sqr-0.0003-3000-5e-05-inv-sqr-0.0-3000-0.3-0.3-0.0-0.0-1-8-0.1-0.0-res-0.2-0.0-ff-0.4-0.0.210304-003242.scz2: https://app.wandb.ai/zjy/smore-wikisql-group--final/runs/23o2pmxp

I also tried train_batch_size of 2 , but still Useless , same error occured when switch to GeForce GTX Titan Xp (12GB) or Tesla K80 (11GB).
Any suggestion on the reason of this or what I can try to get rid of it ? Thank you !

NaN loss when training

Hi,
when loss became 'nan', it stopped at pdb.set_trace():

(Pdb) l
697 return 0
698 loss = F.nll_loss(masked_inputs, masked_targets)
699 if torch.isnan(loss):
700 import pdb
701 pdb.set_trace()
702 -> if loss > 1e8:
703 import pdb
704 pdb.set_trace()
705 return loss

And the Model output is nan.

(Pdb) p masked_inputs
tensor([[ nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan,

Why Nan loss? How to deal with Nan loss?

Is repaired Spider dataset available?

In your paper (https://arxiv.org/pdf/2012.12627.pdf) on page 7 in section Data repair you say that

...
We manually corrected some errors in the
train and dev examples. For comparison with other
models in §5.1, we report metrics using the official
dev/test sets. For our own ablation study and analysis, we report metrics using the corrected dev files
...

I couldn't find any link to this corrected dataset. Is it available anywhere?

Run out of memory when training on collab using GPU

The module runs out of memory on google collab when training on spider and wikiSQL on both:
1.while training using the default configuration on
2. After manipulating the train_batch_size and make it =1

Also , when trying to run it on my machine
With GPU Nvidia 1060 max-Q with 6gb

It also run out of memory.

How can i run it on my machine or on google Collab ?

And how to run it on another Large Database like adventure works database ?

Not all modules are set in requirements.txt

After trying to preprocess data with Spider dataset I get:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/crafterkolyan/workspace/TabularSemanticParsing/src/experiments.py", line 15, in <module>
    import src.data_processor.data_loader as data_loader
  File "/home/crafterkolyan/workspace/TabularSemanticParsing/src/data_processor/data_loader.py", line 15, in <module>
    from src.data_processor.processor_utils import WIKISQL, SPIDER, OTHERS
  File "/home/crafterkolyan/workspace/TabularSemanticParsing/src/data_processor/processor_utils.py", line 13, in <module>
    from moz_sp import denormalize, parse
  File "/home/crafterkolyan/workspace/TabularSemanticParsing/moz_sp/__init__.py", line 25, in <module>
    from mo_future import text, number_types, binary_type, items, string_types
ModuleNotFoundError: No module named 'mo_future'

Training error with NaNs

Hello! Thanks for sharing the code.
I'm using a Tesla P100-PCIE-12GB GPU, which is not able to hold the model with BERT-large (raising CUDA out-of-memory error), so I have to switch to BERT-base. I also need to decrease the batch size from 16 to 8 and increase gradient accumulation steps from 2 to 4. No other changes are made to the model configuration.
When I train the model, I got the following error:

Traceback (most recent call last):
  File "/vault/.pyenv/versions/3.7.5/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/vault/.pyenv/versions/3.7.5/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/users/yshao/TabularSemanticParsing/src/experiments.py", line 407, in <module>
    run_experiment(args)
  File "/users/yshao/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
    train(sp)
  File "/users/yshao/TabularSemanticParsing/src/experiments.py", line 63, in train
    sp.run_train(train_data, dev_data)
  File "/users/yshao/TabularSemanticParsing/src/common/learn_framework.py", line 209, in run_train
    loss.backward()
  File "/vault/.pyenv/versions/py3.7.5-bridge/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/vault/.pyenv/versions/py3.7.5-bridge/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/vault/.pyenv/versions/py3.7.5-bridge/lib/python3.7/site-packages/wandb/wandb_torch.py", line 256, in <lambda>
    handle = var.register_hook(lambda grad: _callback(grad, log_track))
  File "/vault/.pyenv/versions/py3.7.5-bridge/lib/python3.7/site-packages/wandb/wandb_torch.py", line 254, in _callback
    self.log_tensor_stats(grad.data, name)
  File "/vault/.pyenv/versions/py3.7.5-bridge/lib/python3.7/site-packages/wandb/wandb_torch.py", line 207, in log_tensor_stats
    tensor = flat.histc(bins=self._num_bins, min=tmin, max=tmax)
RuntimeError: range of [nan, nan] is not finite

I also tried decreasing grad_norm from 0.3 to 0.1, but the same error occurred.
Any suggestion on the reason of this or what I can try to get rid of it? Thank you!

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

HI Lin,

I keep got the following error when I was trying to train either Spider and Wikisql. Any comment from you will be greatly appreciated.
P.S. I run the code following the instruction in README, and fix some bugs you've helped in the previous issues reports. Thanks a lot.

Best,
Zea
------------------------------------------------------------they are bug lines------------------------------------------------------------

File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "TabularSemanticParsing/src/experiments.py", line 408, in
run_experiment(args)
File "TabularSemanticParsing/src/experiments.py", line 393, in run_experiment
train(sp)
File "TabularSemanticParsing/src/experiments.py", line 64, in train
sp.run_train(train_data, dev_data)
File "TabularSemanticParsing/src/common/learn_framework.py", line 208, in run_train
loss = self.loss(formatted_batch)
File "TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 94, in loss
outputs = self.forward(formatted_batch)
File "TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 124, in forward
outputs = self.mdl(encoder_ptr_input_ids, encoder_ptr_value_ids,
File "spenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "TabularSemanticParsing/src/semantic_parser/bridge.py", line 49, in forward
inputs_embedded, _ = self.encoder_embeddings(
File "spenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "TabularSemanticParsing/src/common/nn_modules.py", line 69, in forward
return self.dropout(last_hidden_states), pooler_output
File "spenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "spenv/lib/python3.10/site-packages/torch/nn/modules/dropout.py", line 59, in forward
return F.dropout(input, self.p, self.training, self.inplace)
File "spenv/lib/python3.10/site-packages/torch/nn/functional.py", line 1252, in dropout
return VF.dropout(input, p, training) if inplace else _VF.dropout(input, p, training)
TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

RuntimeError: CUDA error: device-side assert triggered

I am new in pytorch. I have a problem with my model when I run my code I have this error.

pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [53,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [54,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [55,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [56,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [57,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [58,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [59,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [60,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [61,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [62,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [63,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [0,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [1,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [2,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [3,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [4,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [5,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [6,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [7,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [8,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [9,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [10,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [11,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [12,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [13,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [14,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [15,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [16,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [17,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [18,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [19,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [20,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [21,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [22,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [23,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [24,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [25,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [26,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [27,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [28,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [29,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [30,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [4,0,0], thread: [31,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/experiments.py", line 414, in
run_experiment(args)
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/experiments.py", line 405, in run_experiment
demo(args)
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/experiments.py", line 362, in demo
output = t2sql.process(text, schema.name)
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/demos/demos.py", line 153, in process
sql_query = self.translate(example)
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/demos/demos.py", line 125, in translate
model_ensemble=self.model_ensemble, verbose=False)
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 157, in inference
outputs = self.forward(formatted_batch, model_ensemble)
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 129, in forward
decoder_ptr_value_ids=decoder_ptr_value_ids)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in call_impl
return forward_call(*input, **kwargs)
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/semantic_parser/bridge.py", line 100, in forward
no_from=(self.dataset_name == 'wikisql'))
File "/content/gdrive/MyDrive/textTosql/spider_model/TabularSemanticParsing/src/semantic_parser/decoding_algorithms.py", line 218, in beam_search
m_field_masks.scatter_add
(index=db_scope_update_idx, src=db_scope_update_mask.long(), dim=1)
RuntimeError: CUDA error: device-side assert triggered
`
I don't have a very advanced knowledge in ML and in Pytorch some blog talks about Inconsistency between the number of labels/classes and the number of output units. where I can change that. Here is the configuration used

`
#!/usr/bin/env bash

data_dir="/notebooks/TabularSemanticParsing/data/spider"
db_dir="/notebooks/TabularSemanticParsing/data/spider/database"
dataset_name="spider"
model="bridge"
question_split="True"
query_split="False"
question_only="True"
normalize_variables="False"
denormalize_sql="True"
omit_from_clause="False"
no_join_condition="False"
table_shuffling="True"
use_lstm_encoder="True"
use_meta_data_encoding="True"
use_graph_encoding="False"
use_typed_field_markers="False"
use_picklist="True"
anchor_text_match_threshold=0.85
no_anchor_text="False"
top_k_picklist_matches=2
sql_consistency_check="True"
atomic_value_copy="False"
process_sql_in_execution_order="True"
share_vocab="False"
sample_ground_truth="False"
save_nn_weights_for_visualizations="False"
vocab_min_freq=0
text_vocab_min_freq=0
program_vocab_min_freq=0
max_in_seq_len=512
max_out_seq_len=60

num_steps=100000
curriculum_interval=0
num_peek_steps=1000
num_accumulation_steps=2
save_best_model_only="True"
train_batch_size=8
dev_batch_size=8
encoder_input_dim=1024
encoder_hidden_dim=400
decoder_input_dim=400
num_rnn_layers=1
num_const_attn_layers=0

use_oracle_tables="False"
num_random_tables_added=0
use_additive_features="False"

schema_augmentation_factor=1
random_field_order="False"
data_augmentation_factor=1
augment_with_wikisql="False"
num_values_per_field=0
pretrained_transformer="bert-large-uncased"
fix_pretrained_transformer_parameters="False"
bert_finetune_rate=0.00006
learning_rate=0.0005
learning_rate_scheduler="inverse-square"
trans_learning_rate_scheduler="inverse-square"
warmup_init_lr=0.0005
warmup_init_ft_lr=0.00003
num_warmup_steps=4000
emb_dropout_rate=0.3
pretrained_lm_dropout_rate=0
rnn_layer_dropout_rate=0
rnn_weight_dropout_rate=0
cross_attn_dropout_rate=0
cross_attn_num_heads=8
res_input_dropout_rate=0.2
res_layer_dropout_rate=0
ff_input_dropout_rate=0.4
ff_hidden_dropout_rate=0.0

grad_norm=0.3
decoding_algorithm="beam-search"
beam_size=16
bs_alpha=1.05

data_parallel="False"
`

Trouble reproducing Bridge-Large 70.0% EM on Spider dev

First of all, thank you for sharing this terrific work. I found it really straightforward to plug and start training.

However, when training Bridge-L (with BERT large) on Spider I'm unable to reach 70.0% EM on the dev set. My results keep peaking at around 66.7%.
I'm training on GeForce RTX 3090 and the default setting with batch size 16 was too much for it's 24GB mem. So, I've tried out a few runs with batch sizes 8, 4, 2 and accum steps 4, 8, 16 respecitvely (other hyperparameters are the default ones). All these runs ended up capping at <67% after more than 100K steps.

I was wondering whether this is simply the result of the different hardware and my batch sizes being smaller than 16, or am I missing something?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.