Code Monkey home page Code Monkey logo

nl2sql's Introduction

Natural Language to SQL

https://arxiv.org/abs/1801.00076

Installation

The data is in data.tar.bz2. Unzip the code by running

tar -xjvf data.tar.bz2

The code is written using PyTorch in Python 2.7. Check here to install PyTorch. You can install other dependency by running

pip install -r requirements.txt

Downloading the glove embedding.

Download the pretrained glove embedding from here using

bash download_glove.sh

Extract the glove embedding for training.

Run the following command to process the pretrained glove embedding for training the word embedding:

python extract_vocab.py

Train

The training script is train.py. To see the detailed parameters for running:

python train.py -h

Some typical usage are listed as below:

Train a model with bi-attention:

python train.py --ca

Train a model with column attention and trainable embedding (requires pretraining without training embedding, i.e., executing the command above):

python train.py --ca --train_emb

Test

The script for evaluation on the dev split and test split. The parameters for evaluation is roughly the same as the one used for training. For example, the commands for evaluating the models from above commands are:

Test a trained model with column attention

python test.py --ca

Test a trained model with column attention and trainable embedding:

python test.py --ca --train_emb

Reference

https://github.com/xiaojunxu/SQLNet

nl2sql's People

Contributors

guotong1988 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nl2sql's Issues

IndexError: too many indices for array

ub16hp@UB16HP:/ub16_prj/NL2SQL$ python train.py
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Using fixed embedding
Not using column attention on aggregator predicting
Not using column attention on selection predicting
Not using column attention on where predicting
/home/ub16hp/ub16_prj/NL2SQL/sqlnet/model/modules/aggregator_predict.py:55: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
att = self.softmax(att_val)
/home/ub16hp/ub16_prj/NL2SQL/sqlnet/model/modules/selection_predict.py:55: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
att = self.softmax(att_val)
/home/ub16hp/ub16_prj/NL2SQL/sqlnet/model/modules/sqlnet_condition_predict.py:123: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
num_col_att = self.softmax(num_col_att_val)
/home/ub16hp/ub16_prj/NL2SQL/sqlnet/model/modules/sqlnet_condition_predict.py:138: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
num_att = self.softmax(num_att_val)
/home/ub16hp/ub16_prj/NL2SQL/sqlnet/model/modules/sqlnet_condition_predict.py:163: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
col_att = self.softmax(col_att_val)
/home/ub16hp/ub16_prj/NL2SQL/sqlnet/model/modules/sqlnet_condition_predict.py:209: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
op_att = self.softmax(op_att_val)
Init dev acc_qm: 0.0
breakdown on (agg, sel, where): [0.05498159 0.16363852 0. ]
Epoch 1 @ 2018-10-01 12:27:07.696338
Traceback (most recent call last):
File "train.py", line 128, in
sql_data, table_data, TRAIN_ENTRY)
File "/home/ub16hp/ub16_prj/NL2SQL/sqlnet/utils.py", line 145, in epoch_train
cum_loss += loss.data.cpu().numpy()[0]*(ed - st)
IndexError: too many indices for array
ub16hp@UB16HP:
/ub16_prj/NL2SQL$

KeyError: 'query_tok'

mldl@ub1604:/ub16_prj/NL2SQL$ python train.py --ca
Loading from original dataset
Loading data from data/train.jsonl
Loading data from data/train.tables.jsonl
Loading data from data/dev.jsonl
Loading data from data/dev.tables.jsonl
Loading data from data/test.jsonl
Loading data from data/test.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Using fixed embedding
Using column attention on aggregator predicting
Using column attention on selection predicting
Using column attention on where predicting
Traceback (most recent call last):
File "train.py", line 103, in
val_sql_data, val_table_data, TRAIN_ENTRY)
File "/home/mldl/ub16_prj/NL2SQL/sqlnet/utils.py", line 200, in epoch_acc
q_seq, col_seq, col_num, ans_seq, query_seq, gt_cond_seq, raw_data = to_batch_seq(sql_data, table_data, perm, st, ed, ret_vis_data=True)
File "/home/mldl/ub16_prj/NL2SQL/sqlnet/utils.py", line 113, in to_batch_seq
query_seq.append(sql['query_tok'])
KeyError: 'query_tok'
mldl@ub1604:
/ub16_prj/NL2SQL$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.