Code Monkey home page Code Monkey logo

picard's Introduction

ServiceNow completed its acquisition of Element AI on January 8, 2021. All references to Element AI in the materials that are part of this project should refer to ServiceNow.


make it parse

build license

This is the official implementation of the following paper:

Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau. PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).

If you use this code, please cite:

@inproceedings{Scholak2021:PICARD,
  author = {Torsten Scholak and Nathan Schucher and Dzmitry Bahdanau},
  title = "{PICARD}: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models",
  booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
  month = nov,
  year = "2021",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2021.emnlp-main.779",
  pages = "9895--9901",
}

Watch The Video

Watch the video

Overview

This code implements:

  • The PICARD algorithm for constrained decoding from language models.
  • A text-to-SQL semantic parser based on pre-trained sequence-to-sequence models and PICARD achieving state-of-the-art performance on both the Spider and the CoSQL datasets.

About PICARD

TL;DR: We introduce PICARD -- a new method for simple and effective constrained decoding from large pre-trained language models. On the challenging Spider and CoSQL text-to-SQL datasets, PICARD significantly improves the performance of fine-tuned but otherwise unmodified T5 models. Using PICARD, our T5-3B models achieved state-of-the-art performance on both Spider and CoSQL.

In text-to-SQL translation, the goal is to translate a natural language question into a SQL query. There are two main challenges to this task:

  1. The generated SQL needs to be semantically correct, that is, correctly reflect the meaning of the question.
  2. The SQL also needs to be valid, that is, it must not result in an execution error.

So far, there has been a trade-off between these two goals: The second problem can be solved by using a special decoder architecture that -- by construction -- always produces valid SQL. This is the approach taken by most prior work. Those decoders are called "constrained decoders", and they need to be trained from scratch on the text-to-SQL dataset. However, this limits the generality of the decoders, which is a problem for the first goal.

A better approach would be to use a pre-trained encoder-decoder model and to constrain its decoder to produce valid SQL after fine-tuning the model on the text-to-SQL task. This is the approach taken by the PICARD algorithm.

How is PICARD different from existing constrained decoders?

  • It’s an incremental parsing algorithm that integrates with ordinary beam search.
  • It doesn’t require any training.
  • It doesn’t require modifying the model.
  • It works with any model that generates a sequence of tokens (including language models).
  • It doesn’t require a special vocabulary.
  • It works with character-, sub-word-, and word-level language models.

How does PICARD work?

The following picture shows how PICARD is integrated with beam search.



Decoding starts from the left and proceeds to the right. The algorithm begins with a single token (usually <s>), and then keeps expanding the beam with hypotheses generated token-by-token by the decoder. At each decoding step and for each hypothesis, PICARD checks whether the next top-k tokens are valid. In the image above, only 3 token predictions are shown, and k is set to 2. Valid tokens (☑) are added to the beam. Invalid ones (☒) are discarded. The k+1-th, k+2-th, ... tokens are discarded, too. Like in normal beam search, the beam is pruned to contain only the top-n hypotheses. n is the beam size, and in the image above it is set to 2 as well. Hypotheses that are terminated with the end-of-sentence token (usually </s>) are not expanded further. The algorithm stops when the all hypotheses are terminated or when the maximum number of tokens has been reached.

How does PICARD know whether a token is valid?

In PICARD, checking, accepting, and rejecting of tokens and token sequences is achieved through parsing. Parsing means that we attempt to assemble a data structure from the tokens that are currently in the beam or are about to be added to it. This data structure (and the parsing rules that are used to build it) encode the constraints we want to enforce.

In the case of SQL, the data structure we parse to is the abstract syntax tree (AST) of the SQL query. The parsing rules are defined in a computer program called a parser. Database engines, such as PostgreSQL, MySQL, and SQLite, have their own built-in parser that they use internally to process SQL queries. For Spider and CoSQL, we have implemented a parser that supports a subset of the SQLite syntax and that checks additional constraints on the AST. In our implementation, the parsing rules are made up from simpler rules and primitives that are provided by a third-party parsing library.

PICARD uses a parsing library called attoparsec that supports incremental input. This is a special capability that is not available in many other parsing libraries. You can feed attoparsec a string that represents only part of the expected input to parse. When parsing reaches the end of an input fragment, attoparsec will return a continuation function that can be used to continue parsing. Think of the continuation function as a suspended computation that can be resumed later. Input fragments can be parsed one after the other when they become available until the input is complete.

Herein lies the key to PICARD: Incremental parsing of input fragments is exactly what we need to check tokens one by one during decoding.

In PICARD, parsing is initialized with an empty string, and attoparsec will return the first continuation function. We then call that continuation function with all the token predictions we want to check in the first decoding step. For those tokens that are valid, the continuation function will return a new continuation function that we can use to continue parsing in the next decoding step. For those tokens that are invalid, the continuation function will return a failure value which cannot be used to continue parsing. Such failures are discarded and never end up in the beam. We repeat the process until the end of the input is reached. The input is complete once the model predicts the end-of-sentence token. When that happens, we finalize the parsing by calling the continuation function with an empty string. If the parsing is successful, it will return the final AST. If not, it will return a failure value.

The parsing rules are described at a high level in the PICARD paper. For details, see the PICARD code, specifically the Language.SQL.SpiderSQL.Parse module.

How well does PICARD work?

Let's look at the numbers:

URL Based on Exact-set Match Accuracy Execution Accuracy
Dev Test Dev Test
tscholak/cxmefzzi w PICARD T5-3B 75.5 % 71.9 % 79.3 % 75.1 %
tscholak/cxmefzzi w/o PICARD T5-3B 71.5 % 68.0 % 74.4 % 70.1 %
tscholak/3vnuv1vf w PICARD t5.1.1.lm100k.large 74.8 % 79.2 %
tscholak/3vnuv1vf w/o PICARD t5.1.1.lm100k.large 71.2 % 74.4 %
tscholak/1wnr382e w PICARD T5-Large 69.1 % 72.9 %
tscholak/1wnr382e w/o PICARD T5-Large 65.3 % 67.2 %
tscholak/1zha5ono w PICARD t5.1.1.lm100k.base 66.6 % 68.4 %
tscholak/1zha5ono w/o PICARD t5.1.1.lm100k.base 59.4 % 60.0 %

Click on the links to download the models. tscholak/cxmefzzi and tscholak/1wnr382e are the versions of the model that we used in our experiments for the paper, reported as T5-3B and T5-Large, respectively. tscholak/cxmefzzi, tscholak/3vnuv1vf, and tscholak/1zha5ono were trained to use database content, whereas tscholak/1wnr382e was not.

Note that, without PICARD, 12% of the SQL queries generated by tscholak/cxmefzzi on Spider’s development set resulted in an execution error. With PICARD, this number decreased to 2%.

On CoSQL Dialogue State Tracking

URL Based on Question Match Accuracy Interaction Match Accuracy
Dev Test Dev Test
tscholak/2e826ioa w PICARD T5-3B 56.9 % 54.6 % 24.2 % 23.7 %
tscholak/2e826ioa w/o PICARD T5-3B 53.8 % 51.4 % 21.8 % 21.7 %
tscholak/2jrayxos w PICARD t5.1.1.lm100k.large 54.2 %
tscholak/2jrayxos w/o PICARD t5.1.1.lm100k.large 52.5 %

Click on the links to download the models. tscholak/2e826ioa is the version of the model that we used in our experiments for the paper, reported as T5-3B.

Quick Start

Prerequisites

This repository uses git submodules. Clone it like this:

$ git clone [email protected]:ElementAI/picard.git
$ cd picard
$ git submodule update --init --recursive

Training

The training script is located in seq2seq/run_seq2seq.py. You can run it with:

$ make train

The model will be trained on the Spider dataset by default. You can also train on CoSQL by running make train-cosql.

The training script will create the directory train in the current directory. Training artifacts like checkpoints will be stored in this directory.

The default configuration is stored in configs/train.json. The settings are optimized for a GPU with 40GB of memory.

These training settings should result in a model with at least 71% exact-set-match accuracy on the Spider development set. With PICARD, the accuracy should go up to at least 75%.

We have uploaded a model trained on the Spider dataset to the huggingface model hub, tscholak/cxmefzzi. A model trained on the CoSQL dialog state tracking dataset is available, too, tscholak/2e826ioa.

Evaluation

The evaluation script is located in seq2seq/run_seq2seq.py. You can run it with:

$ make eval

By default, the evaluation will be run on the Spider evaluation set. Evaluation on the CoSQL evaluation set can be run with make eval-cosql.

The evaluation script will create the directory eval in the current directory. The evaluation results will be stored there.

The default configuration is stored in configs/eval.json.

Serving

A trained model can be served using the seq2seq/serve_seq2seq.py script. The configuration file can be found in configs/serve.json. You can start serving with:

$ make serve

By default, the 800-million-parameter tscholak/3vnuv1vf model will be loaded. You can also load a different model by specifying the model name in the configuration file. The device to use can be specified as well. The default is to use the first available GPU. CPU can be used by specifying -1.

When the script is called, it uses the folder specified by the db_path option to look for SQL database files. The default folder is database, which will be created in the current directory. Initially, this folder will be empty, and you can add your own SQL files to it. The structure of the folder should be like this:

database/
  my_1st_database/
    my_1st_database.sqlite
  my_2nd_database/
    my_2nd_database.sqlite

where my_1st_database and my_2nd_database are the db_ids of the databases.

Once the server is up and running, use the Swagger UI to test inference with the /ask endpoint. The server will be listening at http://localhost:8000/, and the Swagger UI will be available at http://localhost:8000/docs#/default/ask_ask__db_id___question__get.

Docker

There are three docker images that can be used to run the code:

  • tscholak/text-to-sql-dev: Base image with development dependencies. Use this for development. Pull it with make pull-dev-image from the docker hub. Rebuild the image with make build-dev-image.
  • tsscholak/text-to-sql-train: Training image with development dependencies but without Picard dependencies. Use this for fine-tuning a model. Pull it with make pull-train-image from the docker hub. Rebuild the image with make build-train-image.
  • tscholak/text-to-sql-eval: Training/evaluation image with all dependencies. Use this for evaluating a fine-tuned model with Picard. This image can also be used for training if you want to run evaluation during training with Picard. Pull it with make pull-eval-image from the docker hub. Rebuild the image with make build-eval-image.

All images are tagged with the current commit hash. The images are built with the buildx tool which is available in the latest docker-ce. Use make init-buildkit to initialize the buildx tool on your machine. You can then use make build-dev-image, make build-train-image, etc. to rebuild the images. Local changes to the code will not be reflected in the docker images unless they are committed to git.

picard's People

Contributors

danieltremblay avatar dependabot[bot] avatar salokr avatar shreyas90999 avatar tscholak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

picard's Issues

How is the input sequence length determined?

Hi @tscholak !

I am wondering how you decide the maximum src_length and tgt_length. It seems that there are several input larger than the limit 512. Does this truncation deteriorate the performance? Since, truncation always occurs in DB_Content and DB_Schema?

Thanks!

Question on reproducing t5.1.1.lm100k.large CoSQL model without using db_content

Hi, I'm trying to reproduce your CoSQL model trained based on t5.1.1.lm100k.large.

I trained the CoSQL model with a p3.16xlarge EC2 instance without using db_content (using 8 GPU, mini batch size per device = 1, gradient accumulation steps = 250, so that my batch_size is 2000). Given the model saved on /home/ubuntu/code/src/t5-v1_1-large is downloaded from gs://t5-data/pretrained_models/t5.1.1.lm100k.large, here is the config used in my model training:

{
    "run_name": "t5-cosql",
    "model_name_or_path": "/home/ubuntu/code/src/t5-v1_1-large",
    "dataset": "cosql+spider",
    "source_prefix": "",
    "schema_serialization_type": "peteshaw",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": false,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "/home/ubuntu/code/src/code_train",
    "cache_dir": "/home/ubuntu/code/src/code_transformers_cache",
    "do_train": true,
    "do_eval": true,
    "fp16": false,
    "num_train_epochs": 250,
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 1,
    "gradient_accumulation_steps": 250,
    "label_smoothing_factor": 0.0,
    "learning_rate": 1e-4,
    "adafactor": true,
    "adam_eps": 1e-6,
    "lr_scheduler_type": "constant",
    "warmup_ratio": 0.0,
    "warmup_steps": 0,
    "seed": 1,
    "report_to": ["wandb"],
    "logging_strategy": "steps",
    "logging_first_step": true,
    "logging_steps": 4,
    "load_best_model_at_end": true,
    "metric_for_best_model": "exact_match",
    "greater_is_better": true,
    "save_total_limit": 64,
    "save_steps": 64,
    "evaluation_strategy": "steps",
    "eval_steps":64,
    "predict_with_generate": true,
    "num_beams": 1,
    "num_beam_groups": 1,
    "use_picard": false
}

For your CoSQL model (https://huggingface.co/tscholak/2jrayxos) and my model, I run evaluation on eval docker image with Picard enabled. Here's what I got:

Your model achieved

eval_exact_match = 0.5433 
eval_exec = 0.6324

while my model only obtained

eval_exact_match = 0.5069
eval_exec = 0.5935

For both metrics, I am 4 percentage points away from your model performance. That seems like a big difference.
Does the config look good to you? Any tips on training t5.1.1.lm100k.large based models? Is there anything I miss for this reproducing experiment? Thank you.

The config parameters for fine-tune T5-base without using db_content.

Hi, @tscholak ,I'm trying to reproduce your experiment for fine-tuning t5-base and get an result for 56.96 EM score and I notice that in your paper, you achieve 57.2 EM score. My config/train.json is as follows(I run it on 4 3090 GPUs and use ddp mode, it consumes about 34 hours):

{
    "run_name": "t5-spider",
    "model_name_or_path": "t5-base",
    "dataset": "spider",
    "source_prefix": "",
    "schema_serialization_type": "peteshaw",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": false,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "./experiment/train_without_db_content",
    "cache_dir": "./transformers_cache",
    "do_train": true,
    "do_eval": true,
    "fp16": false,
    "num_train_epochs": 3072,
    "per_device_train_batch_size": 8,
    "per_device_eval_batch_size": 8,
    "gradient_accumulation_steps": 64,
    "label_smoothing_factor": 0.0,
    "learning_rate": 1e-4,
    "adafactor": true,
    "adam_eps": 1e-6,
    "lr_scheduler_type": "constant",
    "warmup_ratio": 0.0,
    "warmup_steps": 0,
    "seed": 1,
    "report_to": ["wandb"],
    "logging_strategy": "steps",
    "logging_first_step": true,
    "logging_steps": 4,
    "load_best_model_at_end": true,
    "metric_for_best_model": "exact_match",
    "greater_is_better": true,
    "save_total_limit": 128,
    "save_steps": 64,
    "evaluation_strategy": "steps",
    "eval_steps": 64,
    "predict_with_generate": true,
    "num_beams": 4,
    "num_beam_groups": 1,
    "use_picard": false
}

If it is convenient for you, could you please provide the parameters settings for this experiment of fine-tune t5-base that achieve 57.2 EM score. Thank you.

Slow on training

Hi @tscholak

What's the approximate training time on your side?
We only take a t5-base model and use 2048 as the batch size. Using 4 P40 GPUs, it takes 80 seconds to train one step. Is that normal?

Training PICARD on different dataset

I'd like to train PICARD on a dataset other than Spider or CoSQL - what's the most straightforward way to do that? I'm guessing I need to make an equivalent to spider.py for my data - is there anything else that would need to be done?

Smaller T5 on CoSQL

It's really an excellent work!

And I'm wondering the results of CoSQL using T5-base/T5-Large with/without PICARD? There only the T5-3B result has been reported in paper. Would the T5(Big model) still be competive with task-specific model if we decrease the model size?

Thanks in advance! :)

Very imbalanced memory usage.

Hi, @tscholak . I'm fine-tuning the t5-base on 8 GeForce RTX 3090 GPUs(each memory is about 24GB). And I want to reproduce your experimet for t5-base with exact-setmatch accuracy (EM) of 57.2.
My train.json setting is :

{
    "run_name": "t5-spider",
    "model_name_or_path": "t5-base",
    "dataset": "spider",
    "source_prefix": "",
    "schema_serialization_type": "peteshaw",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": false,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "/train",
    "cache_dir": "/transformers_cache",
    "do_train": true,
    "do_eval": true,
    "fp16": false,
    "num_train_epochs": 3072,
    "per_device_train_batch_size": 2,
    "per_device_eval_batch_size": 2,
    "gradient_accumulation_steps": 128,
    "label_smoothing_factor": 0.0,
    "learning_rate": 1e-4,
    "adafactor": true,
    "adam_eps": 1e-6,
    "lr_scheduler_type": "constant",
    "warmup_ratio": 0.0,
    "warmup_steps": 0,
    "seed": 1,
    "report_to": ["wandb"],
    "logging_strategy": "steps",
    "logging_first_step": true,
    "logging_steps": 4,
    "load_best_model_at_end": true,
    "metric_for_best_model": "exact_match",
    "greater_is_better": true,
    "save_total_limit": 128,
    "save_steps": 64,
    "evaluation_strategy": "steps",
    "eval_steps": 64,
    "predict_with_generate": true,
    "num_beams": 4,
    "num_beam_groups": 1,
    "use_picard": false
}

and the GPU usage is :

(base) jxqi@server2:~$ nvidia-smi
Tue Oct 26 11:17:16 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 55%   57C    P2   153W / 350W |   6194MiB / 24268MiB |     97%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:25:00.0 Off |                  N/A |
| 57%   58C    P2   143W / 350W |   6192MiB / 24268MiB |     91%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:41:00.0 Off |                  N/A |
| 52%   55C    P2   138W / 350W |   6192MiB / 24268MiB |     86%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:61:00.0 Off |                  N/A |
| 49%   54C    P2   156W / 350W |   6194MiB / 24268MiB |     95%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  On   | 00000000:81:00.0 Off |                  N/A |
| 54%   56C    P2   144W / 350W |   6192MiB / 24268MiB |     93%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  On   | 00000000:A1:00.0 Off |                  N/A |
| 49%   54C    P2   142W / 350W |   6192MiB / 24268MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce ...  On   | 00000000:C1:00.0 Off |                  N/A |
| 53%   55C    P2   159W / 350W |   6192MiB / 24268MiB |     57%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce ...  On   | 00000000:E1:00.0 Off |                  N/A |
| 49%   54C    P2   143W / 350W |  23060MiB / 24268MiB |     92%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    356248      C   python                           6189MiB |
|    1   N/A  N/A    356248      C   python                           6189MiB |
|    2   N/A  N/A    356248      C   python                           6189MiB |
|    3   N/A  N/A    356248      C   python                           6189MiB |
|    4   N/A  N/A    356248      C   python                           6189MiB |
|    5   N/A  N/A    356248      C   python                           6189MiB |
|    6   N/A  N/A    356248      C   python                           6189MiB |
|    7   N/A  N/A    356248      C   python                          23057MiB |
+-----------------------------------------------------------------------------+

it seems that it use Dataparallel mode(because it is only one process.), and the memory usage is very imbalanced and I have to set "per_device_train_batch_size": 2, to run it.

Could you tell me how to solve this problem? Thanky you @tscholak

Executing 'make train'

Hi,
I have been working on text2sql project and after cloning the git repo I went to the step 'make train' but I am unable to execute it.
The result of my execution is as follows: make: *** No rule to make target 'train'. Stop.
There is a possibility that I might be doing the execution wrong. Kindly help me with the same.

compiling picard

Hi,

I've gone through the process of installing all picard dependencies (folly, hsthrift, etc), what I thought was required to have the environment ready to compile picard's code. However, when I reached to the point I submit the following command, I get an error (shown below). Do you know why is this error or have an intuition of how to solve it? Thanks!

cabal install --overwrite-policy=always --install-method=copy exe:picard

Error:

...
dist/build/Foreign/CPP/Dynamic.hs
Foreign/CPP/Dynamic.hsc
....
....
error: Dynamic.hsc:196:10: fatal error: cpp/cdynamic.h: No such file or directory
compilation terminated.

I'm using a different OS base than the one in your docker images
OS: CentOS 7

Questions about the dev.json performance is lower than in the paper?

Hi, I love your excellent work. When I tried to see the performance of the model on the dev set(dev.json), I encountered a very strange problem: the performance of T5-large(tscholak/1wnr382e with picard) on the dev set was 69.3 which is much lower than the 72.9 in the paper.

My configs/serve.json is:
{ "model_path": "tscholak/1wnr382e", "source_prefix": "", "schema_serialization_type": "peteshaw", "schema_serialization_randomized": false, "schema_serialization_with_db_id": true, "schema_serialization_with_db_content": true, "normalize_query": true, "target_with_db_id": true, "db_path": "/database", "cache_dir": "/transformers_cache", "num_beams": 4, "use_picard": true, "launch_picard": true, "picard_mode": "parse_with_guards", "picard_schedule": "incremental", "picard_max_tokens_to_check": 2, "device": 7 }

and the Makefile is
.PHONY: serve serve: pull-eval-image mkdir -p -m 777 database mkdir -p -m 777 transformers_cache docker run \ -it \ --rm \ --user 13011:13011 \ --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all \ -p 8000:8000 \ --mount type=bind,source=$(BASE_DIR)/database,target=/database \ --mount type=bind,source=$(BASE_DIR)/transformers_cache,target=/transformers_cache \ --mount type=bind,source=$(BASE_DIR)/configs,target=/app/configs \ tscholak/$(EVAL_IMAGE_NAME):$(GIT_HEAD_REF) \ /bin/bash -c "python seq2seq/serve_seq2seq.py configs/serve.json"

and my predict.py code is like that

`import json
import requests
import tqdm
import time

output = 'preds.sql'
net_error = 0
query_error = 0
with open(output, "w") as g:
with open('./dev.json') as f:
dev_json = json.load(f)
for i, el in enumerate(tqdm.tqdm(dev_json)):
try:
r = requests.get(f'http://localhost:8000/ask/{el["db_id"]}/{el["question"]}')
res = r.json()
if 'query' in res:
print(res['query'])
g.write(f"{res['query']}\t{el['db_id']}\n")
g.flush()
else:
query_error += 1
print("query error:%d"%(query_error))
g.write(f"select \t{el['db_id']}\n")
g.flush()
except:
net_error += 1
print("net error:%d"%(net_error))
g.write(f"select \t{el['db_id']}\n")
g.flush()
time.sleep(1)

print(f"query_error:{query_error}")`

when I run the python predict.py, the performance is

                       easy               medium             hard                extra                all            
count               248                  446                  174                 166                 1034              
=====================   EXECUTION ACCURACY     =====================
execution            0.871                0.742                0.580                0.416                0.693               

====================== EXACT MATCHING ACCURACY =====================
exact match          0.863                0.733                0.580                0.428                0.690 

and I have run the code twice, the result is the same;

I have checked the dev.json which is from the spider official; the performance is lower than expected (72.9).

I want to confirm that there is something I was doing wrong?

Using Picard with trained model from HuggingFace

Hello, Thank you for providing a great repo!
Currently I am using the pretrained model provided on Huggingface "tscholak/3vnuv1vf" for inference in following way

#Text to SQL
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM,T5Tokenizer, T5ForConditionalGeneration

# tokenizer = AutoTokenizer.from_pretrained("tscholak/1wnr382e")
# tokenizer = AutoTokenizer.from_pretrained("tscholak/cxmefzzi")
tokenizer = AutoTokenizer.from_pretrained("tscholak/3vnuv1vf") 


# model = AutoModelForSeq2SeqLM.from_pretrained("tscholak/1wnr382e")
# model = AutoModelForSeq2SeqLM.from_pretrained("tscholak/cxmefzzi")
model = AutoModelForSeq2SeqLM.from_pretrained("tscholak/3vnuv1vf")

pipe = pipeline('translation_xx_to_yy', model=model, tokenizer=tokenizer)

#Inference
input_data = "how many customers bought a product with price greater than hundred and quantity greater than fight? | customer_data | customers : customer_id, name, contact, product_id, quantity | products : product_id, price"
output = pipe(input_data)[0]['translation_text']
print(output)

This works out fine for me. But as you have mentioned in issue #39 that when we use this model from huggingface, we are not using Picard along with it. I was thinking that is there any way I could use Picard here itself in this code above somehow by including it just before the tokenizer?
Or do I have to completely use the docker approach you mentioned and this code cannot be used to include picard?

[question] Do you renormalize each step's decoding score during inference?

Hi, I have a quick question regarding your constrained decoding process.

Based on my understanding, the core idea of Picard is to directly impose some constraints for each decoding step (i.e., dynamically deriving a set of admissible tokens) based on the decoding history using T5. My question is, do you normalize the score for these admissible tokens? Or do you use the original probability over the original vocabulary and simply ignore the invalid tokens? In terms of implementation, the former case is basically passing a vocab mask with all invalid tokens being masked out right before doing softmax, while the later passing the vocab mask right after softmax. Which one exactly do you use? (sorry I did not get enough time to check your code carefully 😕)

My guess is that you don't actually normalize the score, because intuitively I feel there might be some problems if normalizing them. For example, if for a certain decoding step, there is only one admissible token, then no matter it matches the question or not, it will get a probability of 1. This is just an extreme case to show the problem, but actually similar issues can happen in many general cases.

Could you give me a hint on this? Many thanks!

Questions about Database access in Serve Mode

Hey @tscholak , so I have been fiddling with your model for a while now, I love the work you guys have done, I just wanted to ask a few questions about the files that go into your ./database folder when you deploy it on serving mode, as per the ReadMe the format's supposed to be like the one shown below

database/
  my_1st_database/
    my_1st_database.sqlite
  my_2nd_database/
    my_2nd_database.sqlite

I am just wondering about the content in each of these files, are they supposed to have both the schema and the rows of data?

And another thing is for my current use case that I want to try your model on, my data is stored on a Postgres AWS server, I can't really convert these to SQLite or even export this data to a local machine (company policy) so how would you suggest me to get the model working with such a setup, what are some of the changes that I would have to make?

Thank you for taking out your time

Error: Internal server error 500 while running make serve.

Hello @tscholak and the team. I am trying to use make serve for the T5+PICARD and run into error 500 on the fastapi page. Below is crash trace and also my configs. Thank you in advance.
Screen Shot 2022-01-26 at 1 55 32 PM
Screen Shot 2022-01-26 at 1 50 52 PM
Screen Shot 2022-01-26 at 1 50 59 PM

NFO: 172.17.0.1:62060 - "GET /ask/transactions/what%20is%20category%20for%20Canada%3F HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 208, in call
await super().call(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in call
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in call
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 656, in call
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 259, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 61, in app
response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 227, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/opt/conda/lib/python3.7/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/opt/conda/lib/python3.7/site-packages/anyio/to_thread.py", line 29, in run_sync
limiter=limiter)
File "/opt/conda/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/opt/conda/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "seq2seq/serve_seq2seq.py", line 126, in ask
outputs = pipe(inputs=Text2SQLInput(utterance=question, db_id=db_id))
File "/app/seq2seq/utils/pipeline.py", line 76, in call
result = super().call(inputs, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/text2text_generation.py", line 137, in call
result = super().call(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/base.py", line 1101, in call
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/base.py", line 1108, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/base.py", line 1034, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/text2text_generation.py", line 155, in _forward
output_ids = self.model.generate(**model_inputs, **generate_kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/app/seq2seq/utils/picard_model_wrapper.py", line 249, in _generate
input_ids, decoder_start_token_id=decoder_start_token_id, bos_token_id=bos_token_id
File "/opt/conda/lib/python3.7/site-packages/transformers/generation_utils.py", line 502, in _prepare_decoder_input_ids_for_generation
return torch.ones((batch_size, 1), dtype=torch.long, device=self.device) * decoder_start_token_id
TypeError: ones(): argument 'size' must be tuple of ints, but found element of type Tensor at pos 1
01/26/2022 21:50:42 - ERROR - uvicorn.error - Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/opt/conda/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in call
return await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/fastapi/applications.py", line 208, in call
await super().call(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in call
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in call
raise exc
File "/opt/conda/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 656, in call
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 259, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.7/site-packages/starlette/routing.py", line 61, in app
response = await func(request)
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 227, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/opt/conda/lib/python3.7/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/opt/conda/lib/python3.7/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/opt/conda/lib/python3.7/site-packages/anyio/to_thread.py", line 29, in run_sync
limiter=limiter)
File "/opt/conda/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/opt/conda/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "seq2seq/serve_seq2seq.py", line 126, in ask
outputs = pipe(inputs=Text2SQLInput(utterance=question, db_id=db_id))
File "/app/seq2seq/utils/pipeline.py", line 76, in call
result = super().call(inputs, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/text2text_generation.py", line 137, in call
result = super().call(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/base.py", line 1101, in call
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/base.py", line 1108, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/base.py", line 1034, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/opt/conda/lib/python3.7/site-packages/transformers/pipelines/text2text_generation.py", line 155, in _forward
output_ids = self.model.generate(**model_inputs, **generate_kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/app/seq2seq/utils/picard_model_wrapper.py", line 249, in _generate
input_ids, decoder_start_token_id=decoder_start_token_id, bos_token_id=bos_token_id
File "/opt/conda/lib/python3.7/site-packages/transformers/generation_utils.py", line 502, in _prepare_decoder_input_ids_for_generation
return torch.ones((batch_size, 1), dtype=torch.long, device=self.device) * decoder_start_token_id
TypeError: ones(): argument 'size' must be tuple of ints, but found element of type Tensor at pos 1

make init-buildkit

Hello,

Thank you for your work. I am working in a text2sql project, so I cloned your repo.
I followed your instructions:

$ git clone [email protected]:ElementAI/picard.git
$ cd picard
$ git submodule update --init --recursive

Until here, it worked very well. Later, I executed: make pull-eval-image and also it worked.
However, when I executed make init-buildkit, it returned the following mistake.

picard_error

My goal is to get the output file and reproduce your results to check the sql queries the model is generating. Besides, I would like to deploy your model.

Specifications:

I am using the Ubuntu terminal in Windows.
I am only using CPU.

Thank you so much in advance.

Out of memory with default configs/train.json on A100 GPU

Hi Torsten,

I just got a node with 8 A100 GPUs, when I run your code with default setting (make train) on one A100 (40G memory), I encountered out of memory error. Did you set up other hyper-parameters (e.g. fp16, gradient checkpointing, deep speed) to make it work? When I reduced the batch size to 1, it can work. However, if I scale up to 8 A100 GPUs (DDP), even with batch size of 1, there is still out of memory error.

Best,
Wuwei

A quick question on Lexing (i.e., Section 2.1)

Hi,

I have a clarification question regarding the Lexing part of PICARD.
First of all, I understand that the purpose of Lexing is to get rid of the invalid tokens (e.g., cell_phone in your paper) from the decoder's prediction. However, I am confused with how you generate tokens like cell_phone in the first place? Considering that cell_phone is not likely to be in T5's pertained vocab, what you would probably have are just cell and phone. So do you predict a single schema item cell_phone in multiple steps? Wouldn't it be nicer to predict these schema tokens in an atomic way (concretely, you will have a candidate which is cell_number, but you would not have cell_phone as your candidate at all)? That would save much computation on generating those invalid tokens. I might also misunderstand your method. Could you clarify this part a bit for me? Many thanks!

Setting up PICARD without using the Docker images

I have been attempting to run the evaluation script without using the docker image. I am facing an error which says: picard is not available. Is there any specific set of instructions for setting up picard without using the docker images?

What is meant by the database content in model training.

Hi. I have a questions regarding the models trained on database content. So I am assuming it means the T5 model was trained with schema and as well as the database content. And the models which are not trained by database content and just database schema, are the other models.
I am trying to fine-tune with my dataset and was confused on the input for training for both with database content and without. If there are are tips you could provide it will be great. Also, do you have any comments on what can change in accuracy and model output, for the models that were trained using database content vs the models trained just on database schema. Thank You

Adapting to large database schema

Hey @tscholak, Thanks so much for this amazing work. It has been super helpful in my work.

I have been fiddling around Picard for the NL2SQL use case, I am able to successfully run with databases with 10 tables.

since we are encoding database scheme along with user query, I am wondering how will it scale in case 100 of tables and schema is big.

  • Does it become a limitation since language models can’t process tokens more than 512 (T5 transformers)?
  • Does experimenting with the long former version of the T5 transformer will help in this case?
  • Or is there any way of just passing input text along with chunks of the scheme or just passing text alone?

Any suggestions on how we can modify Picard for this use case?

Can I just use the schema and not the entire database for serving?

Hi. First of all great job on PICARD. I was trying to serve the model on my database schema. I am trying to understand do we need the entire database with values or can I just use the schema? Also If I want to train or fine tune using my data, do I need the schema or the whole database with values? Also is there an example data set I can use to first run the model without trying my data?
Thank you in advance

How to set wandb online?

Hi, @tscholak , I'm facing a problem for how to set wandb online. When I first run your code, it is set to be online, and I can view the runing process in wandb web page. However, when I run your code in a different machine, sometimes it is set to be offline. And I modify the Makefile as:

.PHONY: train
train: pull-train-image
	mkdir -p -m 777 train
	mkdir -p -m 777 transformers_cache
	mkdir -p -m 777 wandb
	docker run \
		-it \
		--rm \
		--ipc=host \
		--gpus all \
		--user 13011:13011 \
		--mount type=bind,source=$(PWD)/train,target=/train \
		--mount type=bind,source=$(PWD)/transformers_cache,target=/transformers_cache \
		--mount type=bind,source=$(PWD)/configs,target=/app/configs \
		--mount type=bind,source=$(PWD)/wandb,target=/app/wandb \
		tscholak/$(TRAIN_IMAGE_NAME):$(GIT_HEAD_REF) \
		/bin/bash -c "WANDB_MODE=online python seq2seq/run_seq2seq.py configs/train.json"

And the output is:

(slurm) jxqi@main-9:~/Text-to-SQL/default_picard/picard/picard$ make train
docker pull tscholak/text-to-sql-train:dd6dc5851cc76c9012b45488b901574c8b7f4862
dd6dc5851cc76c9012b45488b901574c8b7f4862: Pulling from tscholak/text-to-sql-train
Digest: sha256:add40f7791979e3217a8e86ed5867cda1060b4a53db24cd15959fbc18f8b9269
Status: Image is up to date for tscholak/text-to-sql-train:dd6dc5851cc76c9012b45488b901574c8b7f4862
docker.io/tscholak/text-to-sql-train:dd6dc5851cc76c9012b45488b901574c8b7f4862
mkdir -p -m 777 train
mkdir -p -m 777 transformers_cache
mkdir -p -m 777 wandb
docker run \
-it \
--rm \
--ipc=host \
--gpus all \
--user 13011:13011 \
--mount type=bind,source=/home/jxqi/Text-to-SQL/default_picard/picard/picard/train,target=/train \
--mount type=bind,source=/home/jxqi/Text-to-SQL/default_picard/picard/picard/transformers_cache,target=/transformers_cache \
--mount type=bind,source=/home/jxqi/Text-to-SQL/default_picard/picard/picard/configs,target=/app/configs \
--mount type=bind,source=/home/jxqi/Text-to-SQL/default_picard/picard/picard/wandb,target=/app/wandb \
tscholak/text-to-sql-train:dd6dc5851cc76c9012b45488b901574c8b7f4862 \
/bin/bash -c "WANDB_MODE=online python seq2seq/run_seq2seq.py configs/train.json"
==============================this line================================
W&B online, running your script from this directory will now sync to the cloud.
==============================this line================================
10/23/2021 02:47:50 - WARNING - seq2seq.utils.picard_model_wrapper -   Picard is not available.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter: 

CondaEnvException: Unable to determine environment

Please re-run this command with one of the following options:

* Provide an environment name via --name or -n
* Re-run this command inside an activated conda environment.

==============================this line================================
wandb: W&B syncing is set to `offline` in this directory.  Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
==============================this line================================
Downloading and preparing dataset spider/spider to /transformers_cache/spider/spider/1.0.0/3c571ccbabdc104a8d1f14edff4ce10d09d7724b0c3665fc42bec1ed51c84bf3...
0 examples [00:00, ? examples/s]

But sometimes it runs properly in online mode. How can I set wandb always online? Thank you.

Different results between training and eval

Sorry to bother you! But I find another interesting problem.
When I start training (with train.json) and get a result in the middle process such as:

      "epoch": 2304.0,
      "eval_exact_match": 0.6460348162475822,
      "eval_exec": 0.6460348162475822,
      "eval_loss": 0.41825902462005615,
      "eval_runtime": 90.718,
      "eval_samples_per_second": 11.398,
      "step": 2304

It can be seen that eval_exact_match is around 0.64.

But if I run evaluation mode (with eval.json), I will get:

   "eval_exact_match": 0.6247582205029013,
    "eval_exec": 0.6431334622823984,
    "eval_loss": 0.41071268916130066,
    "eval_runtime": 244.047,
    "eval_samples": 1034,
    "eval_samples_per_second": 4.237

The eval_exact_match is around 0.62
And the eval.json is

    "run_name": "t5+picard-spider-eval",
    "model_name_or_path": "train/checkpoint-2304",
    "dataset": "spider",
    "source_prefix": "",
    "schema_serialization_type": "peteshaw",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": true,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "/eval",
    "cache_dir": "/transformers_cache",
    "do_train": false,
    "do_eval": true,
    "fp16": false,
    "per_device_eval_batch_size": 5,
    "seed": 1,
    "report_to": ["tensorboard"],
    "predict_with_generate": true,
    "num_beams": 4,
    "num_beam_groups": 1,
    "diversity_penalty": 0.0,
    "max_val_samples": 1034,
    "use_picard": false,
    "launch_picard": false,
    "picard_mode": "parse_with_guards",
    "picard_schedule": "incremental",
    "picard_max_tokens_to_check": 2,
    "eval_accumulation_steps": 1,
    "metric_config": "both",
    "val_max_target_length": 512,
    "val_max_time": 1200

It is different about 2%. Have you ever seen its problem?

Training fault

When I run the following command to start training:

python -m torch.distributed.launch --nnodes=1 --nproc_per_node=8 run_seq2seq.py configs/train.json

It comes to the following fault:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.

Have you ever seen this before?

Confusion about batch_size

Hello, I read your code and try to fine-tune T5-base on Spider. I notice that in your configs/train.json file, some parameters are:

"per_device_train_batch_size": 5,
"per_device_eval_batch_size": 5,
"gradient_accumulation_steps": 410,

if I use N GPUs and pytorch DistributedDataParallel mode, the total train batch_size is
train_batch = 5*410*N = 2050*N
and if N is greater than 4, it will exceeding the total number in Spider train dataset(the number is 7000), and then there is only one step during an epoch(Even batch_size can't reach 2048, because DistributedDataSampler in Pytorch will split the data into N different parts per epoch first.)?

Thank you very much.

Training Process

Hi @tscholak,

Can you provide your loss, eval_exact_match, eval_exec for fine-tuning t5-3b and t5-base?

I would like to have a reference for my training process.

Thanks!

GPU Ram Consumption - How to Make Use Less RAM

Hello Folks,

I've tried running the make train command on an AWS p3.16xlarge instance (128GB GPU Ram), and PyTorch always dies claiming it can't allocate more RAM. What can I do to make it consume less GPU RAM? I'm using the Deep Learning AMI GPU TensorFlow 2.7.0 AMI i that makes any difference.

Below is a screenshot of nvidia-smi and it's output where we can see that the RAM is being consumed.

Any help is greatly appreciated.

RanOutOfMemory

make serve error

Hi , firstly thanks for your work !
I was trying to use the model directly so I made the steps git clone ... ---> git submodule ....
After that I went directly to make serve , so the image was in the process of creation and build container. But there was an error:
command:
make serve

output:
process_begin: CreateProcess(NULL, pwd, ...) failed.
docker pull tscholak/text-to-sql-eval:e37020b6eee18bff865d9d2ba852bd636f3ed777
e37020b: Pulling from tscholak/text-to-sql-eval
f22ccc0b8772: Pulling fs layer
3cf8fb62ba5f: Pulling fs layer
e80c964ece6a: Pulling fs layer
8a451ac89a87: Pulling fs layer
c563160b1f64: Pulling fs layer
596a46902202: Pulling fs layer
aa0805983180: Pulling fs layer
5718c3da35a0: Pulling fs layer
aa0805983180: Waiting
6a0071256cdd: Pulling fs layer
66d4bed311a2: Pulling fs layer
5718c3da35a0: Waiting
09ae9ad954c3: Pulling fs layer
003637b0851a: Waiting
3a6abf4fc04a: Pulling fs layer
2bcd1f8c7d19: Pulling fs layer
7e8b8fc54fef: Pulling fs layer
66d4bed311a2: Waiting
e8f6692d1906: Pulling fs layer
90f0d3c89fde: Pulling fs layer
8a451ac89a87: Waiting
c21e997a86f5: Pulling fs layer
e6296527eee8: Waiting
7327077e072d: Pulling fs layer
596a46902202: Waiting
2bcd1f8c7d19: Waiting
c563160b1f64: Waiting
1ac7723b3070: Waiting
0f6675c72c1c: Waiting
82a5ac9ac621: Waiting
1c8883b7d00d: Waiting
f7b06626b422: Waiting
c21e997a86f5: Waiting
d87a57a397c3: Pulling fs layer
d87a57a397c3: Waiting
f9ee145f9ac7: Waiting
831f5e07bc3b: Pulling fs layer
a5765519477d: Waiting
0707da233112: Pulling fs layer
0707da233112: Waiting
61099012b9e7: Waiting
72e8e2f6f12d: Waiting
088c5dabef1c: Pulling fs layer
98d1b4aec65f: Pull complete
181b2e1c80bf: Pull complete
2b49c10ec790: Pull complete
2dbedd7a9ee9: Pull complete
3cd6fdac9d0e: Pull complete
42f4d9060d75: Pull complete
a76e717043f1: Pull complete
3984587abe06: Pull complete
b1a5349e2397: Pull complete
daade2fcc64c: Pull complete
d52e83ba373b: Pull complete
99c87951de19: Pull complete
c4c12d6fdb90: Pull complete
Digest: sha256:e4710b52c1c74397d87f3d8065630e7527c47014dbf8521ecc78bbc54a5f4cdd
Status: Downloaded newer image for tscholak/text-to-sql-eval:e37020b6eee18bff865d9d2ba852bd636f3ed777
docker.io/tscholak/text-to-sql-eval:e37020b6eee18bff865d9d2ba852bd636f3ed777
mkdir -p -m 777 database
mkdir -p -m 777 transformers_cache
A subdirectory or file -p already exists.
Error occurred while processing: -p.
A subdirectory or file -m already exists.
Error occurred while processing: -m.
A subdirectory or file 777 already exists.
Error occurred while processing: 777.
make: *** [serve] Error 1

Also I had a db already that I added in a folder named databasee with a subfolder containing sqlite db in it.
Finally I changes the db_path in the serve.json for databasee. Thats all the changed I have made from the original guidelines.

Can you help me solve the error ?

using trained model "tscholak/3vnuv1vf" hugging Face

hello , thanks for your effort
I wanna ask is there anyway to use the trained model "tscholak/3vnuv1vf" on hugging Face Using the default way that be used there.
I tried to download the trained model but there is an issue in unpacking tokenizer in model ( 'list' object has no attribute 'size') for ( input_ids.size()), or i must follow seq2seq/serve_seq2seq.py ?

Run PICARD without root privilege

I have been trying to run PICARD on a machine without root privilege. Since I can't get the permission of using Docker or the apt-get from the administrator, I tried to build the environment from scratch following the Dockerfile, and I found it really difficult. Are there any other ways to run PICARD on a machine without root privilege?

Using the provided eval code on different dataset

Hey,

I have tried running "make eval" and was able to retrieve numbers on Spider-Dev.

Now, I would like to use the provided T5+PICARD for evaluation on other datasets like Spider-DK or Spider-realistic. Is there an easy and convenient way to use these dataset for eval only---for e.g., as a command-line arguments?

Thanks

Deployment questions

Hi, I've gone through previous issues and gleaned that the models were trained on a really beefed up setup with A100's and V100's. I'm interested in deploying this for inference only on an AWS instance and I'm wondering if an instance with a GPU is absolutely necessary if i'm doing only inference. If I can get away with CPU only, is there a minimum RAM you'd recommend?

Thanks for open sourcing this project!

Error : Could not open database file in serving mode

Hi @tscholak ,
I'm trying to run make serve using my own database directory.
But when i trying to get output from fastapi by providing db_id and question, it raises error that,
Unable to open database file
fastapi_serving

my server terminal looks good,
fastapi_terminal

my config/serve.json file,

{
    "model_path": "tscholak/cxmefzzi",
    "source_prefix": "",
    "schema_serialization_type": "peteshaw",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": true,
    "normalize_query": true,
    "target_with_db_id": true,
    "db_path": "database",
    "cache_dir": "transformers_cache",
    "num_beams": 4,
    "use_picard": true,
    "launch_picard": true,
    "picard_mode": "parse_with_guards",
    "picard_schedule": "incremental",
    "picard_max_tokens_to_check": 2,
    "device": -1
}

Please suggest any way how to fix this.
Thanks,

Smaller fine-tuned models

Hello, great work! I am wondering if you plan to release some smaller fine-tuned models like T5-base and T5-large. Thanks!

make build-eval-image failing

Hi,

We are trying to run the docker image for evaluation of the PICARD model. On executing "make build-eval-image", it throws an error "ERROR exporting to image, error: failed to solve: server message: insufficient_scope: authorization failed" after build completes successfully.

Without building the image, "make eval" shows "picard not available".

Can someone point out what we are doing wrong here? We want to evaluate the T5 model with PICARD on spider!

Thanks.

Docker memory keep increasing in "serve" mode

Hello! I was trying to use your model to parse a reasonably large amount of text into SQL (similar size as Spider). I used the "serve" mode. I allowed 8GB for docker container with 2GB swap. At the beginning everything was running fine, but as more and more text gets parsed, the memory usage of the docker container gets larger and larger, and finally exited with Error 137 (OOM).

What should we do to fix or avoid this issue? Any advice is appreciated. Thank you!

Are the table content/values used while training?

There are two queries regading the training process and the inference which are described below:

  1. Inside the config file, the following flag "schema_serialization_with_db_content" is set to be true, however it is not clear by looking at the code whether the table data values are used during the training and evaluation.

  2. In the predictions_eval.json file that gets generated at inference has the key "Context" present for each template which has the following structure : Question | db_id | table columns. Now, when the tag is observed, the column names are modified having an additional info. appended from the question content. An example is provided below:
    "context": "What is the difference between the Number of passengers carried in Bolivia in 1998 and that in 1999? | 1_8655 | table_data : years, series **( bolivia )**, number_of_passengers_carried_by_air_carriers_registered_in_a_country_number_of_passengers_carried"
    The additional content is mentioned between double asterik. Is this context fed to model and if not, what does it represent?

kindly clarify the above queries.

Tips of customization needed to make it work without GPU

Hi,

IMHO this is one of the best models I've ever tryed. But I can only test it with huggingface's transformers. Althoug It goes pretty well, I'd like to be able to train/finetune it:

  • Would it be possible that someone could point me out a few tips of what should be parametrized in the code in order to try to avoid GPU (GPU 40GB) and use it with a CPU only (i7 32GB RAM)? I've been trying to make it work following some of the tips that I've found in the previous issues. I don't mind the latency by now, just to make it work.

Thank you in advanced.

DuplicatedKeysError(key) when loading cosql dataset

Hi, great work!
I ran into DuplicatedKeysError(key) when loading cosql dataset. It seems that there exist different training examples with the same 'key'. I fix the _generate_examples function of [cosql.py] (https://github.com/ElementAI/picard/blob/main/seq2seq/datasets/cosql/cosql.py) like this and it works:

    def _generate_examples(self, data_filepath, db_path):
        """This function returns the examples in the raw (text) form."""
        logger.info("generating examples from = %s", data_filepath)
        key = 0 # indexing each training instance
        with open(data_filepath, encoding="utf-8") as f:
            cosql = json.load(f)
            for idx, sample in enumerate(cosql):
                db_id = sample["database_id"]
                if db_id not in self.schema_cache:
                    self.schema_cache[db_id] = dump_db_json_schema(
                        db_path + "/" + db_id + "/" + db_id + ".sqlite", db_id
                    )
                schema = self.schema_cache[db_id]

                db_stuff = {
                    "db_id": db_id,
                    "db_path": db_path,
                    "db_table_names": schema["table_names_original"],
                    "db_column_names": [
                        {"table_id": table_id, "column_name": column_name}
                        for table_id, column_name in schema["column_names_original"]
                    ],
                    "db_column_types": schema["column_types"],
                    "db_primary_keys": [{"column_id": column_id} for column_id in schema["primary_keys"]],
                    "db_foreign_keys": [
                        {"column_id": column_id, "other_column_id": other_column_id}
                        for column_id, other_column_id in schema["foreign_keys"]
                    ],
                }

                yield key, {
                    "utterances": [sample["final"]["utterance"]],
                    "query": sample["final"]["query"],
                    "turn_idx": -1,
                    **db_stuff,
                }
                key += 1
                utterances = []
                for turn_idx, turn in enumerate(sample["interaction"]):
                    utterances.extend((utterance.strip() for utterance in turn["utterance"].split(sep="|")))
                    yield key, {
                        "utterances": list(utterances),
                        "query": turn["query"],
                        "turn_idx": turn_idx,
                        **db_stuff,
                    }
                    key += 1

Besides, have you ever tried other hyperparameters to speed up training, such as smaller batch size? Any advice will be appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.