jaypipes / sqltoast Goto Github PK

View Code? Open in Web Editor NEW

27.0 7.0 11.0 976 KB

A SQL parser written in C++

License: Apache License 2.0

C++ 97.84% CMake 1.21% Python 0.95%

c-plus-plus sql parser recursive-descent-parser

sqltoast's People

Contributors

Stargazers

Watchers

Forkers

radtek c-jamie ramasai1 mikewarriner awfeequdng madmax43v3r wwwlkk monkcage honbinz v-for-vasiliev wngtk

sqltoast's Issues

Figure out why `std::unique_ptr<identifier_t>` doesn't provide "optional" elements

The sqltoast::statements::create_schema class is defined thusly:

typedef struct create_schema : statement_t {
    identifier_t schema_identifier;
    std::unique_ptr<identifier_t> authorization_identifier;
    std::unique_ptr<identifier_t> default_charset;
    create_schema(
            identifier_t& schema_id,
            std::unique_ptr<identifier_t>& authorization_id,
            std::unique_ptr<identifier_t>& default_charset_id) :
        statement_t(STATEMENT_TYPE_CREATE_SCHEMA),
        schema_identifier(schema_id),
        authorization_identifier(std::move(authorization_id)),
        default_charset(std::move(default_charset_id))
    {}
    virtual const std::string to_string();
} create_schema_t;

and its to_string() method is defined thusly:

const std::string create_schema::to_string() {
    std::stringstream ss;
    ss << "<statement: CREATE SCHEMA" << std::endl
       << "   schema identifier: " << schema_identifier;
    if (authorization_identifier.get()) {
       ss << std::endl
          << "   authorization identifier: " << *authorization_identifier;
    }
    if (default_charset.get()) {
       ss << std::endl
          << "   default charset: " << *default_charset;
    }
    ss << ">" << std::endl;

    return ss.str();
}

Unfortunately, the above always prints out the authorization element (as "") and the default character set element (also as "") regardless of whether the parser actually finds those clauses.

Figure out why.

All value expression primaries should be passable as part of INSERT <column list>

Currently, only numeric literals and identifiers are parsed as members of the clause of the INSERT INTO ... VALUES statement:

$ ./sqltoaster "INSERT INTO t1 VALUES (CURRENT_DATE)"
Syntax error.
Expected a value item, but got keyword[CURRENT_DATE].
INSERT INTO t1 VALUES (CURRENT_DATE)
                      ^^^^^^^^^^^^^^

the above should produce a columns list of one value expression of type VALUE_EXPRESSION_TYPE_DATETIME_EXPRESSION

Support EXISTS predicate

The EXISTS predicate is part of the ANSI SQL-02 standard and is fairly common to see in real life.

It's grammar looks like this:

<exists predicate>    ::=   EXISTS <table subquery>

Support CASE expressions

Left support for the CASE expressions out of 0.1 milestone/release. Would be good to support them:

 <case expression>    ::=   <case abbreviation> | <case specification>

<case abbreviation>    ::=
         NULLIF <left paren> <value expression> <comma> <value expression> <right paren>
     |     COALESCE <left paren> <value expression> { <comma> <value expression> }... <right paren>

<case specification>    ::=   <simple case> | <searched case>

<simple case>    ::=
         CASE <case operand>
             <simple when clause> ...
             [ <else clause> ]
         END

<case operand>    ::=   <value expression>

<simple when clause>    ::=   WHEN <when operand> THEN <result>

<when operand>    ::=   <value expression>

<result>    ::=   <result expression> | NULL

<result expression>    ::=   <value expression>

<else clause>    ::=   ELSE <result>

<searched case>    ::=
         CASE
         <searched when clause> ...
         [ <else clause> ]
         END

<searched when clause>    ::=   WHEN <search condition> THEN <result>

Support ANSI not equals operator (<>)

The ANSI not equals operator is actually "<>", not "!=". Support it as a single symbol just like the double-char concatenation operator ("||").

Allow fast DDL vs DML recognition

There is now a sqltoast::parse_options_t switch that disables the construction of the sqltoast::statement objects if the caller doesn't want or need these objects.

One reason the caller may not want the objects is because they actually aren't attempting to execute operations using the information in the objects but rather need a simple top-level routing parser that can indicate the "category" of SQL statement that has been sent. For example, a top-level proxy/router might want to simply know whether the successfully parsed statement is a DDL statement (CREATE, DROP, SET, etc) or a DML statement (INSERT, SELECT, DELETE, UPDATE etc).

Further, such a proxy might want to know a sub-category of the DDL or DML statement, for instance to direct read operations to different backing endpoints than write operations.

In these use cases, there's no point in constructing the sqltoast::statement objects or even validating much more than the rough tokens corresponding to the SQL statement are valid. Instead, there should be a sqltoast::parse_options_t field to trigger short-circuiting parsing when the caller is only interested in the category of statement or subcategory.

Support OVERLAPS predicate

The OVERLAPS predicate is also very rarely seen in real life, but I'd still like to support it considering it's part of the ANSI SQL-92 standard. Its grammar looks like this:

<overlaps predicate>    ::=   <row value constructor 1> OVERLAPS <row value constructor 2>

<row value constructor 1>    ::=   <row value constructor>

<row value constructor 2>    ::=   <row value constructor>

Robustify syntax error reporting

Due the various things, including how the lexer's cursor is moved over time, sometimes the syntax errors produced when a parsing problem happens can be confusing. Por ejemplo:

./sqltoaster/sqltoaster "SELECT NULLIF(a, b) FROM t1"
OK
statements[0]:
  <statement: SELECT
   selected columns:
     0: nullif[column-reference[a],column-reference[b]]
   referenced tables:
     0: t1>

(took 73006 nanoseconds)

If I remove the a,b arguments accidentally, I get the following parser syntax error:

./sqltoaster/sqltoaster "SELECT NULLIF() FROM t1"
Syntax error.
Expected to find one of ('*'|<< identifier >>) but found symbol[')']
SELECT NULLIF() FROM t1
             ^^^^^^^^^^
(took 78092 nanoseconds)

What should be there is a message about how a is expected as the first argument to the NULLIF() function.

Support quantified comparison predicates

Quantified comparison predicates look like this:

SELECT * FROM t2
WHERE t2.t1_a > ANY (SELECT t1.a FROM t1)

It's not particularly common to see it used any more versus the now-more-standard JOIN expression that would be functionally the same:

SELECT t2.* FROM t2
JOIN t1
ON t2.t1_a > t1.a

But nevertheless, it's in the ANSI SQL-92 standard and I'd like to support the majority of the ANSI SQL-92 grammar.

 <quantified comparison predicate>    ::=   <row value constructor> <comp op> <quantifier> <table subquery>

<quantifier>    ::=   <all> | <some>

<all>    ::=   ALL

<some>    ::=   SOME | ANY

Handle DEFAULT row value constructor

Often used in VALUES() lists:

INSERT INTO <table> VALUES (DEFAULT)

Support the SELECT statement

Obviously, this is a huge task, broken out into many subtasks/subissues.

Support DROP TABLE statement

The DROP TABLE statement is identical to the DROP SCHEMA SQL statement

Start docs for building and usage

A good developer reference would be great now that the code is a) consistent and b) relatively full-featured.

Compound predicates with parens to dictate precedence return syntax error

Syntax error.
Expected to find ')' but found punctuator['=']
SELECT * FROM t1 WHERE a = 1 AND (b = 2 OR c = 3)
                                   ^^^^^^^^^^^^^^
(took 76613 nanoseconds)

Data type parsing

Column definition elements have a data type component, which has the following grammar:

<data type>    ::=
         <character string type> [ CHARACTER SET <character set specification> ]
     |     <national character string type>
     |     <bit string type>
     |     <numeric type>
     |     <datetime type>
     |     <interval type>

<character string type>    ::=
         CHARACTER [ <left paren> <length> <right paren> ]
     |     CHAR [ <left paren> <length> <right paren> ]
     |     CHARACTER VARYING [ <left paren> <length> <right paren> ]
     |     CHAR VARYING [ <left paren> <length> <right paren> ]
     |     VARCHAR [ <left paren> <length> <right paren> ]

<length>    ::=   <unsigned integer>

<national character string type>    ::=
         NATIONAL CHARACTER [ <left paren> <length> <right paren> ]
     |     NATIONAL CHAR [ <left paren> <length> <right paren> ]
     |     NCHAR [ <left paren> <length> <right paren> ]
     |     NATIONAL CHARACTER VARYING [ <left paren> <length> <right paren> ]
     |     NATIONAL CHAR VARYING [ <left paren> <length> <right paren> ]
     |     NCHAR VARYING [ <left paren> <length> <right paren> ]

<bit string type>    ::=
         BIT [ <left paren> <length> <right paren> ]
     |     BIT VARYING [ <left paren> <length> <right paren> ]

<numeric type>    ::=
         <exact numeric type>
     |     <approximate numeric type>

<exact numeric type>    ::=
         NUMERIC [ <left paren> <precision> [ <comma> <scale> ] <right paren> ]
     |     DECIMAL [ <left paren> <precision> [ <comma> <scale> ] <right paren> ]
     |     DEC [ <left paren> <precision> [ <comma> <scale> ] <right paren> ]
     |     INTEGER
     |     INT
     |     SMALLINT

<precision>    ::=   <unsigned integer>

<scale>    ::=   <unsigned integer>

<approximate numeric type>    ::=
         FLOAT [ <left paren> <precision> <right paren> ]
     |     REAL
     |     DOUBLE PRECISION

<datetime type>    ::=
         DATE
     | TIME [ <left paren> <time precision> <right paren> ] [ WITH TIME ZONE ]
     | TIMESTAMP [ <left paren> <timestamp precision> <right paren> ] [ WITH TIME ZONE ]

<time precision>    ::=   <time fractional seconds precision>

<time fractional seconds precision>    ::=   <unsigned integer>

<timestamp precision>    ::=   <time fractional seconds precision>

<interval type>    ::=   INTERVAL <interval qualifier>

<interval qualifier>    ::=
         <start field> TO <end field>
     | <single datetime field>

<start field>    ::=
         <non-second datetime field> [ <left paren> <interval leading field precision> <right paren> ]

<non-second datetime field>    ::=   YEAR | MONTH | DAY | HOUR | MINUTE

<interval leading field precision>    ::=   <unsigned integer>

<end field>    ::=
         <non-second datetime field>
     | SECOND [ <left paren> <interval fractional seconds precision> <right paren> ]

<interval fractional seconds precision>    ::=   <unsigned integer>

<single datetime field>    ::=
         <non-second datetime field> [ <left paren> <interval leading field precision> <right paren> ]
     | SECOND [ <left paren> <interval leading field precision> [ <comma> <left paren> <interval fractional seconds precision> ] <right paren> ]

<domain name>    ::=   <qualified name>

<qualified name>    ::=   [ <schema name> <period> ] <qualified identifier>

<default clause>    ::=   DEFAULT <default option>

<default option>    ::=
         <literal>
     |     <datetime value function>
     |     USER
     |     CURRENT_USER
     |     SESSION_USER
     |     SYSTEM_USER
     |     NULL

Column definition parsing

Statements like CREATE TABLE and ALTER TABLE have sub-elements that represent column definitions. The grammar for a column definition looks like this:

 <column definition>    ::=
         <column name> { <data type> | <domain name> } [ <default clause> ] [ <column constraint definition> ... ] [ <collate clause> ]

<column name>    ::=   <identifier>

Support the DELETE statement

Parse the DELETE SQL statement

Numerous issues parsing DEFAULT clause

jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "CREATE TABLE t1 (a INT DEFAULT CURRENT_USER)"
OK
statements[0]:
  <statement: CREATE TABLE
    table name: t1
    column definitions:
      a INT DEFAULT CURRENT_USER>

(took 27271 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "CREATE TABLE t1 (a CHAR DEFAULT CURRENT_USER)"
Syntax error.
Expected to find << unsigned integer >> but found keyword[DEFAULT]
CREATE TABLE t1 (a CHAR DEFAULT CURRENT_USER)
                       ^^^^^^^^^^^^^^^^^^^^^^
(took 32551 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "CREATE TABLE t1 (a CHAR(20) DEFAULT CURRENT_USER)"
Syntax error.
Expected to find << unsigned integer >> but found symbol['(']
CREATE TABLE t1 (a CHAR(20) DEFAULT CURRENT_USER)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(took 29094 nanoseconds)

From what I can tell, lex.next() isn't being called properly or at the right times after successfully parsing delimiter tokens.

Handle NULL row value constructor

Often used in VALUES lists:

INSERT INTO <table> VALUES (NULL)

Tokenizing keywords does not account for length of token

Demonstrating the bug here:

$ ./sqltoaster "create tablea"
OK
statements[0]:
  <statement: CREATE TABLE
   table identifier: a>

(took 17564 nanoseconds)

That should actually result in a parse failure because "tablea" is not actually a keyword. "table" is, not "tablea". However, the sqltoast::token_keyword() function is not properly picking up that the symbol matching should be exact-lengthed.

Support `GRANT` statement

The GRANT SQL statement has the following EBNF grammar:

 <grant statement>    ::=
         GRANT <privileges> ON <object name> TO <grantee> [ { <comma> <grantee> }... ] [ WITH GRANT OPTION ]

<privileges>    ::=   ALL PRIVILEGES | <action list>

<action list>    ::=   <action> [ { <comma> <action> }... ]

<action>    ::=
         SELECT
     |     DELETE
     |     INSERT [ <left paren> <privilege column list> <right paren> ]
     |     UPDATE [ <left paren> <privilege column list> <right paren> ]
     |     REFERENCES [ <left paren> <privilege column list> <right paren> ]
     |     USAGE

<privilege column list>    ::=   <column name list>

<object name>    ::=
         [ TABLE ] <table name>
     |     DOMAIN <domain name>
     |     COLLATION <collation name>
     |     CHARACTER SET <character set name>
     |     TRANSLATION <translation name>

<grantee>    ::=   PUBLIC | <authorization identifier>

Support `UNION`

Need to support the UNION operation which combines the results of two selections

convert statement parsers to accept std::unique_ptr<statement_t>

In order to support recursively calling certain statement parsers (such as parse_select()), we need to convert them to accepting a std::unique_ptr<statement_t> out target for the parser to hang the potential statement from instead of always appending the parsed statement_t to the ctx.statements vector.

Support numeric functions

Numeric factors can be either value expression primaries or one of the following numeric functions:

EXTRACT
POSITION
CHAR_LENGTH or CHARACTER_LENGTH
OCTET_LENGTH
BIT_LENGTH

Support ROLLBACK/COMMIT [WORK] statements

ANSI-92 does not have a BEGIN statement. Instead, it only has a COMMIT [WORK] and ROLLBACK [WORK] statement.

Setup TravisCI jobs

Set up some travis-ci.org jobs for building the library and sqltoaster binary and then executing the test suite.

Support the UPDATE statement

Parse the UPDATE SQL statement

Make `identifier_t` an on-demand string constructor

A lexeme_t struct, which is 16 bytes in size on 64-bit architecture (2 pointers to char) is really all we need to store when we come across identifiers in the tokenization process. Currently, however, we construct identifier_t structs from a lexeme_t struct and construct a std::string by copy constructor in the identifier_t constructor:

typedef struct identifier {
    const std::string name;
    identifier(lexeme_t& lexeme) : name(lexeme.start, lexeme.end)
    {}
} identifier_t;

I don't believe we should do this, however. Instead, we should just store the lexeme_t directly in the parse tree structures and create identifier_t objects on-demand if needed.

For example, if we take a look at the definition of the column_definition_t struct:

typedef struct column_definition {
    identifier_t id;
    bool is_nullable;
    std::unique_ptr<data_type_descriptor_t> data_type;
    std::unique_ptr<default_descriptor_t> default_descriptor;
    std::unique_ptr<identifier_t> collate;
    column_definition(identifier_t& id) :
        id(id),
        is_nullable(true)
    {}
} column_definition_t;

we see that there is an "id" field of type identifier_t. We construct this identifier by copy construction in the column_definition_t constructor, which will construct a new std::string object. We don't actually need to do this, though, and it's a waste of memory since all we really need to store in the id field is the demarcation points for where the column name is. The lexeme_t is the struct that has this demarcation information so we should just use a lexeme_t as the type of "id".

move to a stack-allocated tokenizer

Currently, the sqltoast::tokenize() function performs tokenization on the entire input subject, adding found tokens to a std::deque<token_t> that is embedded in the parse_context_t struct.

In doing some initial performance analysis of the sqltoast::parse() I noted that at least 70% of the time the function is spent in the sqltoast::tokenize() function. The act of pushing token_t objects into the ctx.tokens std::deque is time-consuming and consumes heap allocated memory when most of the parsing functions really just need to examine a small stack of tokens.

Support ALTER TABLE statement

The ALTER TABLE statement uses the following EBNF grammar:

 <alter table statement>    ::=   ALTER TABLE <table name> <alter table action>

<alter table action>    ::=
         <add column definition>
     |     <alter column definition>
     |     <drop column definition>
     |     <add table constraint definition>
     |     <drop table constraint definition>

<add column definition>    ::=   ADD [ COLUMN ] <column definition>

<alter column definition>    ::=   ALTER [ COLUMN ] <column name> <alter column action>

<alter column action>    ::=   <set column default clause> | <drop column default clause>

<set column default clause>    ::=   SET <default clause>

<drop column default clause>    ::=   DROP DEFAULT

<drop column definition>    ::=   DROP [ COLUMN ] <column name> <drop behaviour>

<add table constraint definition>    ::=   ADD <table constraint definition>

<drop table constraint definition>    ::=   DROP CONSTRAINT <constraint name> <drop behaviour>

underscore not supported in identifiers

jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "select * from t1"
OK
statements[0]:
  <statement: SELECT
   selected columns:
     0: literal[*]
   referenced tables:
     0: t1>

(took 9172 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "select * from t1_under"
Syntax error.
Expected to find one of (EOS|';'|')') but found identifier[t1]
select * from t1_under
             ^^^^^^^^^
(took 28577 nanoseconds)

When >1 OR expression, list linking not working properly

This demonstrates the issue:

jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "SELECT * FROM t1 WHERE a = 1 OR b = 2"
OK
statements[0]:
  <statement: SELECT
   selected columns: 
     0: *
   referenced tables: 
     0: t1
   where conditions: 
     column-reference[a] = literal[1]
     OR 
     column-reference[b] = literal[2]>

(took 14494 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "SELECT * FROM t1 WHERE a = 1 OR b = 2 OR c = 3"
OK
statements[0]:
  <statement: SELECT
   selected columns: 
     0: *
   referenced tables: 
     0: t1
   where conditions: 
     column-reference[a] = literal[1]
     OR 
     column-reference[b] = literal[2]
     OR 
     column-reference[c] = literal[3]
     OR 
     column-reference[c] = literal[3]>

(took 18825 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "SELECT * FROM t1 WHERE a = 1 OR b = 2 OR c = 3 OR d = 4"
OK
statements[0]:
  <statement: SELECT
   selected columns: 
     0: *
   referenced tables: 
     0: t1
   where conditions: 
     column-reference[a] = literal[1]
     OR 
     column-reference[b] = literal[2]
     OR 
     column-reference[c] = literal[3]
     OR 
     column-reference[d] = literal[4]
     OR 
     column-reference[d] = literal[4]
     OR 
     column-reference[c] = literal[3]
     OR 
     column-reference[d] = literal[4]
     OR 
     column-reference[d] = literal[4]>

(took 17427 nanoseconds)

Support JOIN parsing in SELECT

Part of Issue #30. Need to parse explicit JOIN clauses.

Support string value expressions

Currently, only unsigned numeric value expressions are being parsed.

<value expression>    ::=
         <numeric value expression>
     | <string value expression>
     | <datetime value expression>
     | <interval value expression>

Identifiers, keywords and literals have a weird off-by-one bug

jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "select distinct t1.a FROM t1 WHERE a = 21"
Syntax error.
Expected to find << identifier >> but found literal[2]
select distinct t1.a FROM t1 WHERE a = 21
                                      ^^^
(took 199640 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "select distinct t1.a FROM t1 WHERE a = n"
OK
statements[0]:
  <statement: SELECT
   distinct: true
   selected columns: 
     0: t1.a
   referenced tables: 
     0: t1
   where conditions: 
     0: a = n1>

(took 280725 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "select distinct t1.a FROM t1 WHERE a = n1"
OK
statements[0]:
  <statement: SELECT
   distinct: true
   selected columns: 
     0: t1.a
   referenced tables: 
     0: t1
   where conditions: 
     0: a = n1>

(took 279234 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "select distinct t1.a FROM t1 WHERE a = n12"
OK
statements[0]:
  <statement: SELECT
   distinct: true
   selected columns: 
     0: t1.a
   referenced tables: 
     0: t1
   where conditions: 
     0: a = n12>

(took 281246 nanoseconds)

Support UNIQUE predicate

The UNIQUE predicate is part of the ANSI SQL-02 standard and is not particularly common to see in real life. However, it is part of the standard and would be good to support it, considering it has a simple grammar:

<unique predicate>    ::=   UNIQUE <table subquery>

Support non-numeric literals

Currently, support for literal tokens is limited to numerics (exact and approximate). We need to support string, datetime and interval literals as well.

support for string literals
support for datetime literals
support for interval literals

Support the INSERT statement

Support both the INSERT ... VALUES and INSERT SELECT variants.

Support `SET` statement

The SET SQL statement is a generic/administrative action statement to set variables and things like the transaction isolation level.

write a super simple test framework

tests should plaintext files in the form of:

# Comments begin with a hash, end in a newline
# Blank lines are ignored.

# SQL statements are placed directly into the test file.
# The string representation of the AST shall come
# directly after the SQL, delimited by the markers
# [AST] and [/AST] separated by newlines:

CREATE DATABASE mydb;
[AST]
<begin> -> <CREATE DATABASE statement> -> <identifier> -> <end>
[/AST]

A simple test runner should be able to be written quickly the reads the non-comment lines and passes the input string to sqltoast::parse(), verifies that sqltoast::parse_result::code == SUCCESS and then calls sqltoast::ast_to_string() on the sqltoast::parse_result.ast and does a simple string assertion of the difference, if any.

Unable to use signed integers for DEFAULT values

./sqltoaster "CREATE TABLE t1 (a INT DEFAULT -1)"
Syntax error.
Expected either a column definition or a constraint but found symbol['-']
CREATE TABLE t1 (a INT DEFAULT -1)
                              ^^^^
(took 34494 nanoseconds)

Implement CREATE TABLE parsing

The CREATE TABLE statement has the following EBNF grammar (for ANSI-92 SQL):

<table definition>    ::=
         CREATE [ { GLOBAL | LOCAL } TEMPORARY ] TABLE <table name> <table element list> [ ON COMMIT { DELETE | PRESERVE } ROWS ]

 <table element list>    ::=   <left paren> <table element> [ { <comma> <table element> }... ] <right paren>

<table element>    ::=   <column definition> | <table constraint definition>

<column definition>    ::=
         <column name> { <data type> | <domain name> } [ <default clause> ] [ <column constraint definition> ... ] [ <collate clause> ]

<column name>    ::=   <identifier>

 <column constraint definition>    ::=
         [ <constraint name definition> ] <column constraint> [ <constraint attributes> ]

<constraint name definition>    ::=   CONSTRAINT <constraint name>

<constraint name>    ::=   <qualified name>

<column constraint>    ::=
         NOT NULL
     |     <unique specification>
     |     <references specification>
     |     <check constraint definition>

<unique specification>    ::=   UNIQUE | PRIMARY KEY

<references specification>    ::=
         REFERENCES <referenced table and columns> [ MATCH <match type> ] [ <referential triggered action> ]

<referenced table and columns>    ::=   <table name> [ <left paren> <reference column list> <right paren> ]

<table name>    ::=   <qualified name> | <qualified local table name>

<reference column list>    ::=   <column name list>

<column name list>    ::=   <column name> [ { <comma> <column name> }... ]

<match type>    ::=   FULL | PARTIAL

<referential triggered action>    ::=
         <update rule> [ <delete rule> ]
     |     <delete rule> [ <update rule> ]

<update rule>    ::=   ON UPDATE <referential action>

<referential action>    ::=   CASCADE | SET NULL | SET DEFAULT | NO ACTION

<delete rule>    ::=   ON DELETE <referential action>

<check constraint definition>    ::=   CHECK <left paren> <search condition> <right paren>

Support interval value expressions

Currently, only unsigned numeric value expressions are being parsed.

<value expression>    ::=
         <numeric value expression>
     | <string value expression>
     | <datetime value expression>
     | <interval value expression>

allow full instrumentation of internal operations

In order to determine where sqltoast is spending its time during tokenization and parsing efforts, I'd like to provide full instrumentation of all primary functions. The instrumentation "interface" should look something like this:

#include <iostream>
#include <cctype>

#include <sqltoast.h>
sqltoast::parse_input_t subject("CREATE SCHEMA s1");
sqltoast::parse_options_t opts;
opts.enable_instrumentation = true;

auto res = sqltoast::parse(subject, opts);
for (auto instr = res.instrumentation) {
    std::cout << instr.block << ": " << instr.total_ns << " nanoseconds" << std::endl;    
}

which would output something like this:

tokenize(): 31880 nanoseconds
 -> token_comment(): 678 nanoseconds
 -> token_keyword(): 23449 nanoseconds
 -> token_identifier(): 6777 nanoseconds
parse_statement(): 1156 nanoseconds
 -> parse_create_schema(): 928 nanoseconds

Support MATCH predicate

The MATCH predicate is not commonly used any more but is part of the ANSI SQL-92 standard and so I'd like to support it, for competeness. Its grammar looks like so:

<match predicate>    ::=   <row value constructor> MATCH [ UNIQUE ] [ PARTIAL | FULL ] <table subquery>

Tokenization or lexer move not working for DEFAULT literal parsing

./sqltoaster "CREATE TABLE t1 (a INT DEFAULT 1)"
OK
statements[0]:
  <statement: CREATE TABLE
    table name: t1
    column definitions:
      a INT DEFAULT ')'>

(took 18826 nanoseconds)

Support CHECK constraints

Support CHECK constraints in both column and table definitions.

Handle explicit `<column> <data type> NULL` constraint specifier

The NULL constraint specifier is default for the CREATE TABLE statement, however we don't support it. We only support the standard NOT NULL constraint specifier:

jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "create table t1 (c1 int not null primary key)"
OK
statements[0]:
  <statement: CREATE TABLE
    table identifier: t1
    column definitions:
      c1 INT NOT NULL
    constraints:
      PRIMARY KEY(c1)>

(took 28831 nanoseconds)
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/_build$ ./sqltoaster "create table t1 (id int not null primary key, name varchar(200) null)"
Syntax error.
Expected to find one of (','|')') but found keyword[NULL]
create table t1 (id int not null primary key, name varchar(200) null)
                                                               ^^^^^^
(took 61148 nanoseconds)

Should be a simple fix to ignore the NULL constraint specifier when NOT isn't present.

Support numeric expressions involving operators

 <numeric value expression>    ::=
         <term>
     | <numeric value expression> <plus sign> <term>
     | <numeric value expression> <minus sign> <term>

<term>    ::=
         <factor>
     | <term> <asterisk> <factor>
     | <term> <solidus> <factor>

<factor>    ::=   [ <sign> ] <numeric primary>

<numeric primary>    ::=   <value expression primary> | <numeric value function>

Support datetime value expressions

Handle datetime value expressions, which are fairly complicated and related to interval value expressions and numeric value expressions.

 <datetime value expression>    ::=
         <datetime term>
     |     <interval value expression> <plus sign> <datetime term>
     |     <datetime value expression> <plus sign> <interval term>
     |     <datetime value expression> <minus sign> <interval term>

<interval term>    ::=
         <interval factor>
     |     <interval term 2> <asterisk> <factor>
     |     <interval term 2> <solidus> <factor>
     |     <term> <asterisk> <interval factor>