codeschool / sqlite-parser Goto Github PK
View Code? Open in Web Editor NEWJavaScript implentation of SQLite 3 query parser
License: MIT License
JavaScript implentation of SQLite 3 query parser
License: MIT License
CREATE TABLE unique_index_data (
index_id integer NOT NULL,
value blob NOT NULL,
object_store_id integer NOT NULL,
object_data_key blob NOT NULL,
FOREIGN KEY (index_id) REFERENCES object_store_index (id),
FOREIGN KEY (object_store_id, object_data_key) REFERENCES object_data (object_store_id, key),
PRIMARY KEY(index_id, value)
) WITHOUT ROWID
And this:
CREATE TABLE foo (id unique, text)
If it is possible to remove the performance penalty of using the --trace
compile time option of peg.js then I want to merge the #tracer branch and make that the official next version. The parse error messages of the current master branch don't give the user any useful information.
Where AST is created from multiple statements, it would be useful to include the entire statement in AST so that it is possible to split multiple statements into individual statements too using this parser.
That is, besides the AST of a statement "SELECT * FROM aaa WHERE bbb = 1", why not have the entire statement under one of the keys.
So this is a bit of a weird one. If you take the following test:
select *
from hats
where
x || y BETWEEN x * 2 and x * 3
and add brackets around the expressions (keeping the operator precidence intact) then it will refuse to parse the right expression of between ((x * 2) and (x * 3))
:
select *
from hats
where
((x || y) BETWEEN ((x * 2) and (x * 3)));
The generator adds brackets to all groupings generated (to make it explicit what the order should be). The first expectation in the suite is that it should parse with no errors. Second is comparing output with input sql.
I am struggling with the browserified version. When including sqlite-parser.js
(taken from the dist
directory) through the same local server as the html-file, I get an error message with Internal error: too much recursion
(Firefox) or RangeError: Maximum call stack size exceeded
(Chrome)
If I do include the script externally (e.g. from rawgit.com
) then it works.
The html I am using is the following:
<html>
<head>
<script src="sqlite-parser.js"></script>
</head>
<body>
<textarea id="in">SELECT a FROM b</textarea>
<button onclick="parse()">Parse</button>
<p/>
<div id="out"></div>
<script>
function parse() {
query=document.getElementById("in").value;
sqliteParser(query, function (err, ast) {
if (err) {
console.error(err);
return;
}
document.getElementById('out').innerHTML = '<pre>'+
JSON.stringify(ast, null, 2) + '</pre>';
});}
</script>
</body>
</html>
if I load the same sqlite-parser.js
externally, e.g. through rawgit.com
, then it works flawlessly and the string is parsed and the AST printed.
<script src="https://rawgit.com/codeschool/sqlite-parser/1599a58dd08bbe55c4d14e1f6844ad13b7c9c421/dist/sqlite-parser.js"></script>
If I go back to older versions of sqlite-parser.js
then the newest which works is the version from commit 8559684 (2016-03-09). The commit after this is not available on github, and the one after this throws above errors.
Is possible ?
I would like to...
dist/
and lib/
folders from the git repolib/
folder for the dist/
folder on npmThis will really help in testing different queries and hence it will be possible to write an extensive and exhaustive test suite with it.
It looks like identifiers like "date", "time", and "timestamp" are reserved.
For example, this will generate an error:
SELECT * FROM quakes WHERE date > 40998
Is there any way to work around that?
In this example there are two named constaints:
CREATE TABLE Bees (
wings integer PRIMARY KEY CONSTRAINT has_enough_wings CHECK (wings >= 2),
legs integer CONSTRAINT too_many_legs CHECK (legs <= 6)
);
The resulting AST has a name for the too_many_legs constraint, but the has_enough_wings constraint loses its name.
It seems that the parser drops the names from the constraints if any other constraint, such as NOT NULL, DEFAULT, PRIMARY KEY, is present on the column.
In the SQL below you notice how I combine methods in the from clause. ONT and ORD are joined in the where and ORD and ORD_LINE are joined in the from. I've tested this on SQL Fiddle(SQLite) and it works however the sqlite-parser does not like this.
select
ord.ordnum
from
ord_note ont,
ord o
left outer join ord_line
on (ord_line.wh_id = o.wh_id and ord_line.ordnum = o.ordnum and ord_line.client_id = o.client_id)
where
ont.wh_id = o.wh_id
and ont.ordnum = o.ordnum
I'm learning this project and I want a document about how AST defined just like https://esprima.readthedocs.io/en/latest/syntax-tree-format.html
This sentence throw an error but it shouldn't
SELECT * FROM sometable WHERE id <> 1
I was trying out your demo and tried the following sql query and get '[2, 1] Syntax error found near INTEGER Datatype Name (FROM Clause)' in your demo site, when I tried it a different project it says something like expecting newline, semi-colon , etc, but found I instead.
SELECT a, b FROM table1
INTERSECT
SELECT * FROM (SELECT a, b FROM table2
EXCEPT
SELECT a, b FROM table3)
however if I change intersect to union as so, then no errors are thrown
SELECT a, b FROM table1
union
SELECT * FROM (SELECT a, b FROM table2
EXCEPT
SELECT a, b FROM table3)
Is this a bug or is intersect something that hasn't been added quite yet?
Instead of publishing this module on npm as a browserified and minified bundle, I want to publish the ES2015 code in src/
and the transpiled ES5 code in lib/
and leave it up to the end user to decide if they want to browserify or minify the library.
If the stream transform stuff makes it into v1.0.0 along with these changes then this would also allow a shim file to be specified so that the node stream library isn't included in a browserified version of this module. The configuration would be defined in package.json
as:
{
"map": {
"./lib/streaming.js": {
"browser": "./lib/streaming-shim.js"
}
}
}
Hi, is there any way to build the query from a generated AST?
I need to append a few values to insert and update scripts every time a query is submitted and obviously editing the AST is easier so is there any way to do that?
Adding location data to nodes in the AST would allow for 'possibly' new languages to be built using the AST as a target (stage 2 of my master plan, think SASS for SQL ๐ ). (This is similar to how ES6 -> ES5 transpilers work, and indeed Java, Scala, etc).
The parser in this case could be thought of as a reference spec for an SQL AST.
Hello, If I execute the following in the sqlite3 CLI -
sqlite> select 1,2 union select 3,4,5;
I receive an error, as expected -
Error: SELECTs to the left and right of UNION do not have the same number of result columns
why does sqlite-parser not recognize it as invalid?
Any suggestions on how I can have sqlite-parser to allow for an "/" in the table names just like "."? DB2 has libraries that can contain '/' in the table and below is a valid statement.
I attempted to adjust the .pegjs however after adjusting it statements that were working stopped. Might be adjusting the wrong item. Was updating
name_char = [a-z0-9\$\_\/]i
Thanks
Add a command line version of the parser when it is installed as a global module, e.g.:
sqlite-parser --output foo.json input.sql
Given statement:
Select column1 from table1 where field1 between 10 AND 20
Will fail parsing. Can be observed on demo site as well
It's probably more an academic problem than a real life issue ... trailing spaces from table or column names do not make it into the AST, e.g.
select "a " from "b "
leads to
{
"type": "statement",
"variant": "list",
"statement": [
{
"type": "statement",
"variant": "select",
"result": [
{
"type": "identifier",
"variant": "column",
"name": "a"
}
],
"from": {
"type": "identifier",
"variant": "table",
"name": "b"
}
}
]
}
Although it doesn't make too much sense to define identifiers with trailing spaces it works in sqlite, therefore the created AST should reflect this.
When parsing a query like:
SELECT *
FROM Bees b
WHERE wings LIKE '1' AND limbs = 'blah'
The AST generated is incorrect.
{
"type": "statement",
"variant": "list",
"statement": [
{
"type": "statement",
"variant": "select",
"result": [
{
"type": "identifier",
"variant": "star",
"name": "*"
}
],
"from": {
"type": "identifier",
"variant": "table",
"name": "bees",
"alias": "b"
},
"where": [
{
"type": "expression",
"format": "binary",
"variant": "operation",
"operation": "like",
"right": {
"type": "expression",
"format": "binary",
"variant": "operation",
"operation": "and",
"left": {
"type": "literal",
"variant": "text",
"value": "1"
},
"right": {
"type": "expression",
"format": "binary",
"variant": "operation",
"operation": "=",
"left": {
"type": "identifier",
"variant": "column",
"name": "limbs"
},
"right": {
"type": "literal",
"variant": "text",
"value": "blah"
}
}
},
"left": {
"type": "identifier",
"variant": "column",
"name": "wings"
}
}
]
}
]
}
In the where clause, the LIKE
node is the root node. However AND
node should be the root node.
I'd be great if after parsing, I could simply reference (for example) results.readFrom and results.writtenTo to know which tables will be touched and how, accurately factoring subqueries into the process.
Given statement:
select fred, barney, wilma, betty, city from flintstones_table where city = 'bedrock' order by fred, barney, wilma
Will fail parsing. Can be observed on demo site as well.
If wilma is removed from order by, query parses correctly
If you alias a column in a select statement with a name that starts with IN (ex: INV), the parser interprets the IN as a expression;
Example:
select INVACT.invact_id INID,
INVACT.trndte REPORT_DATE
from invact INVACT
In this example the parser see IN of INID as the IN operator makes result[0] an
expression. In this case
INVACT.invact_id becomes the left and ID becomes the right.
SELECT
Sub-queries
SELECT *
FROM (
SELECT *
FROM b
) AS z
Functions SUM()
, aggregation *
, etc...
SELECT COUNT(*)
FROM apples
Compound queries
SELECT *
FROM a
UNION
SELECT *
FROM b
Alternate syntax
VALUES (1, 2, 'hat')
ORDER BY id DESC
JOIN
types INNER
, OUTER
, LEFT
Joins on tables and/or sub-queries
USING
SELECT *
FROM bees
JOIN inventory AS i USING i.name, i.type
Query modifiers WHERE
, GROUP BY
, HAVING
WHERE
FROM
ORDER BY
GROUP BY
HAVING
LIMIT
INSERT
Basic
INSERT INTO bees (a, b, c)
VALUES (1, 2, 'hey'), (2, 3, 'yo')
Default values
INSERT INTO apples (a, b, c)
DEFAULT VALUES
Insert into select
INSERT INTO apples (a, b, c)
SELECT * FROM apples
UPDATE
DELETE
DROP
CREATE
PRIMARY KEY
CHECK
FOREIGN KEY
PRIMARY KEY
NOT NULL
, UNIQUE
CHECK
DEFAULT
COLLATE
FOREIGN KEY
AS SELECT
ALTER TABLE
Transaction statement types
BEGIN IMMEDIATE TRANSACTION
CREATE TABLE foods (
id int PRIMARY KEY,
item varchar(50),
size varchar(15),
price int
);
INSERT INTO foods (item, size, id, price)
SELECT 'banana', size, null, price
FROM bananas
WHERE color != 'red'
COMMIT
BEGIN
COMMIT
, END
ROLLBACK
Query plan EXPLAIN QUERY PLAN stmt
Multiple queries in batch
CREATE TABLE Actors (
name varchar(50),
country varchar(50),
salary integer
);
INSERT INTO Actors (name, country, salary) VALUES
('Vivien Leigh', 'IN', 150000),
('Clark Gable', 'USA', 120000),
('Olivia de Havilland', 'Japan', 30000),
('Hattie McDaniel', 'USA', 45000);
Indexed sources in queries
SELECT *
FROM bees AS b INDEXED BY bees_index
Comments
Line comments
SELECT *
FROM hats --happy table
WHERE color = 'black'
Block comments
/*
* This is a /* nested */
* C-style block comment as allowed
* in SQL spec
*/
SELECT *
FROM hats
WHERE color = 'black'
Aliases SELECT * FROM apples AS a
apples AS unquoted_name
apples no_as
apples containsWhereKeyword
and apples AS floatDatatype
INT
, INTEGER
, INT2
but still working for BIGINT
, MEDIUMINT
...
apples AS VARCHAR
, apples AS Join
apples AS [inBrackets]
apples AS
backticks``
apples AS "Double Quoted with Spaces"
Single-quoted aliases are invalid in most SQL dialects
SELECT hat AS 'The Hat'
FROM dinosaurs
Expressions
CAST banana AS INT
CASE
SELECT CASE WHEN apple > 1 THEN 'YES' ELSE 'NO' END
FROM apples
Binary IN
SELECT *
FROM hats
WHERE bees NOT IN (SELECT * FROM apples)
Unary
SELECT NOT bees AS [b]
FROM hats
RAISE
RAISE (ROLLBACK, 'hey there!')
COLLATE
bees COLLATE bees_collation
LIKE
SELECT *
FROM hats
WHERE bees LIKE '%somebees%'
ESCAPE
SELECT bees NOT LIKE '%hive' ESCAPE hat > 1
FROM hats
Binary IS
, IS NOT
SELECT *
FROM hats
WHERE ham IS NOT NULL
BETWEEN
SELECT *
FROM hats
WHERE x BETWEEN 2 AND 3
Expression lists
SELECT expr1, expr2, expr3
FROM hats
Binary operation
SELECT *
FROM hats
WHERE 2 != 3
Functions
SELECT MYFUNC(col, 1.2, 'str')
Table expressions
WITH ham AS (
SELECT type
FROM hams
)
SELECT *
FROM inventory
INNER JOIN ham
ON inventory.variety = ham.type
Logical grouping 1 == 2 AND 2 == 3
Example:
SELECT *
FROM bees
WHERE 1 < 2 AND 3 < 4
AND < / \ versus / \ < < 1 AND / \ / \ / \ 1 2 3 4 2 < / \ 3 4
FIXED: now grouping correctly when using binary AND / OR
Literals
'string'
12
, 1.2
, 1E-9
, 0xe1e3
-2.001
Bind parameters
?
, ?12
@bees
$hey "Hey There"
BLOB X'stuff'
AST
Initial AST Format
{
"statement": [
{
"type": "statement",
"variant": "select"
},
{
"type": "statement",
"variant": "create"
}
]
}
BUG: AST should output normalized (lowercased) values for case-insentive data (e.g.: datatypes, keywords, etc...)
ISSUE: Need to normalize format across all statement types (e.g.: CREATE TABLE
, SELECT
)
CREATE
, SELECT
, INSERT
, UPDATE
, DROP
, DELETE
Datatypes
SQLite
Expression | Resulting Affinity |
---|---|
INT | INTEGER |
INTEGER | INTEGER |
TINYINT | INTEGER |
SMALLINT | INTEGER |
MEDIUMINT | INTEGER |
BIGINT | INTEGER |
UNSIGNED BIG | INTEGER |
INT2 | INTEGER |
INT8 | INTEGER |
CHARACTER(20) | TEXT |
VARCHAR(255) | TEXT |
VARYING CHARACTER(255) | TEXT |
NCHAR(55) | TEXT |
NATIVE CHARACTER(70) | TEXT |
NVARCHAR(100) | TEXT |
TEXT | TEXT |
CLOB | TEXT |
BLOB | NONE |
no datatype specified | NONE |
REAL | REAL |
DOUBLE | REAL |
DOUBLE PRECISION | REAL |
FLOAT | REAL |
NUMERIC | NUMERIC |
DECIMAL(10,5) | NUMERIC |
BOOLEAN | NUMERIC |
DATE | NUMERIC |
DATETIME | NUMERIC |
from what I can tell, you can build an AST from the parser, but theres no way to go back to a string.
Is this a feature contained in another package somewhere? Or are there plans on adding this?
ORDER BY
LIMIT
name
property across node typestype
and variant
combine into single type
propertytarget
SELECT * FROM temptable where value1 = ((value2 * 5) - value3)
The statement above will throw the error.
Syntax error found near Binary Expression
Brackets should be fine.
Any other way to structure the equations?
In a select statement, reconstituting any expression that isn't aliased must take some recursion to drill down and glue it back together. Up at the statement[i].result[i] level, would it be too much to ask to see the full expression here?
For example, in this query:
select
first_name || ' ' || last_name,
grade_level
from
students
I'd love to see this as the return:
{
"statement": [
{
"result": [
{
"type": "expression",
**"fullText": "first_name || ' ' || last_name",**
"left": "etc"
"right": "etc"
},
{
"type": "column",
"name": "grade_level"
}
]
}
]
}
{
"type": "definition",
"variant": "constraint",
"definition": [
{
"type": "constraint",
"variant": "foreign key",
"references": {
"type": "identifier",
"variant": "table",
"name": "hives"
}
}
],
"columns": [
{
"type": "identifier",
"variant": "column",
"name": "hive_id"
}
]
}
FOREIGN KEY (hive_id) REFERENCES Hives
With the current structure of this node and the resulting text it's hard to build a generator that can handle this in a generic way. The generator recurses down the tree. By the time we reach the node to know that its a foreign key we have moved passed the other data necessary to build the sql. The 2 identifiers can be on a lower level but the foreign key part needs to be on a higher level.
To get an idea of the issue if you take a look at this line to see how it recourses down the tree and then this line to see how other constraint types are handled.
The main issue is the columns
property. If we could pull that down into the foreign key node it would mean that I can handle this the same as the other nodes rather than writing code for edge cases if that's ok with you? Im open to other ideas on the generator if you've any (It's still a bit messy so be kind ๐ )?
This is the first node that I've had problems with, there may be more where I run into this problem.
Adding in nodes to handle comments is needed to support other tooling that uses the parser to apply transformations to the entire codebase.
If we were to apply a refactoring using the following pipeline parse current codebase -> kebab-case all table and column names -> generate code from AST
then the user would lose all comments that they had on their code.
It would also allow documentation generators to leverage the parser.
(If the comment nodes were to be based off of the ES6 ast, allowing it to use something like Estraverse, then that would be perfect ๐ ๐ )
You can reproduce it in your demo: http://codeschool.github.io/sqlite-parser/demo/
The query CREATE VIRTUAL TABLE txt1 USING fts4(tokenize=unicode61);
is parsed well.
The same but with tokenizer arguments e.g. CREATE VIRTUAL TABLE txt2 USING fts4(tokenize=unicode61 "remove_diacritics=2");
gets the error Syntax error found near Identifier (CREATE VIRTUAL TABLE Statement)
Both examples have to be valid. I took them here: https://www.sqlite.org/fts3.html
To allow users to parse arbitrarily long SQL files or other readable stream sources, I want to create a stream transform that can accept a readable stream and then push (write) out JSON strings of the ASTs for individual statements, e.g.:
import { createReadStream } from 'fs';
import { SqliteParserTransform } from 'sqlite-parser';
const parser = new SqliteParserTransform();
// A file with hundreds of SQL queries
const fileName = './test/sql/official-suite/randexpr1-1.sql';
const read = createReadStream(fileName);
// Pipe the readable stream to the parser transform
read.pipe(parser);
// Stream the ASTs for individual statements to stdout
parser.pipe(process.stdout);
parser.on('error', function (err) {
console.log(err);
process.exit(1);
});
parser.on('finish', function () {
process.exit(0);
});
If people are actually using this it is probably frustrating that they lose whatever they entered as soon as the page is reloaded, especially if anyone forked it and is running it locally with grunt live
. Every time you make a change to the source code it is reloading the interactive demo and you lose whatever was there before.
The following query is failing to validate.
SELECT x, y, row_number() OVER (ORDER BY y) AS row_number FROM t0 ORDER BY x;
See sqllite window function official documentation.
Hi! In the following example I'd expect "not" to be a first-class member of the AST. But it appears to just be concatenated with the operator after it. This would imply I've got to then do string parsing of the operation field if I'm trying to discover "not" use:
> const { statement: [{ where: [where5] }] } = parser('select 1 where processName = \'/bin/sh\' and id not like \'foo\'');
undefined
> where5
{ type: 'expression',
format: 'binary',
variant: 'operation',
operation: 'and',
left:
{ type: 'expression',
format: 'binary',
variant: 'operation',
operation: '=',
left: { type: 'identifier', variant: 'column', name: 'processname' },
right: { type: 'literal', variant: 'text', value: '/bin/sh' } },
right:
{ type: 'expression',
format: 'binary',
variant: 'operation',
operation: 'not like',
right: { type: 'literal', variant: 'text', value: 'foo' },
left: { type: 'identifier', variant: 'column', name: 'id' } } }
Is this intentional? I'd expect the AST to look more like:
...
right:
{ type: 'expression',
format: 'unary',
variant: 'operation',
operation: 'not',
expression:
{ type: 'expression',
format: 'binary',
variant: 'operation',
operation: 'like',
right: { type: 'literal', variant: 'text', value: 'foo' },
left: { type: 'identifier', variant: 'column', name: 'id' } } } }
I see in #20 that operator precedence has been improved/fixed. Nevertheless I still observe an issue with AND
and OR
.
The following two expressions result in the same AST, but shouldn't.
select x from foo where (a or b) and c
select x from foo where a or b and c
Tested with v1.0.0 and also the online demo
Hi all,
I found today your library and liked it a lot.
Sadly I only found a little bug when I tested it on a bigger database:
When adding \'
into a string in an insert values list the following error is thrown:
SyntaxError: Syntax error found near VALUES Clause (SELECT Statement)
It can be tested with an updated version of the basic-insert.sql test:
INSERT INTO concessions (item, size, id, price)
VALUES ('Lay\'s Nachos', 'Regular', null, null),
('Pizza', null, 8, 2.00);
I tried to fix it in the grammar.pegjs but I gave up. I hope you can fix it soon.
I'm trying to use the parser with a WHERE condition containing the following:
WHERE _PARTITIONTIME > timestamp(DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY))
But it's not working
What will be the best approach to resolve this?
Set proper rules for identifier names, e.g.: [a-z0-9\_\-]+
Interactive demo editor showing SQL and corresponding AST
Missing specs
Expression grouping issues
Grouping with unary and binary expressions
`anger` != null AND NOT `happiness`
`happiness` NOT NULL AND `anger` > 0
`happiness` IS NOT NULL AND `anger` > 0
`happiness` ISNULL AND `anger` >
`anger` > 0 AND `happiness` IS NOT NULL
NOT `happiness` AND `anger` > 0
NOT `happiness` OR ~`ANGER` AND `anger` IS NOT 0
Grouping with parenthesis
SELECT *
FROM hats
WHERE
(1 != 2 OR 3 != 4) AND ( 3 == 3 )
SELECT *
FROM hats
WHERE
hat OR (shirt AND (shoes OR wig) AND pants)
Remove modifier
key from all parts of AST and standardize as conditions
Create INDEX
Create TRIGGER
Create VIEW
Create VIRTUAL
table
Need to display correct error location when there are multiple statements in the input SQL
comment
rules should not use sym_*
rules since you should not be able to put a space between the two symbols at the start and/or end of a comment.
SELECT * - - not valid but is being accepted
Is it possible to use this lib to split a multiple-part statement into an array of statements.
example:
sqliteParser(INSERT INTO storage WHERE id=1; INSERT INTO storage WHERE id=2;
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.